Toward novel optimizers: a Moreau-Yosida view of gradient-based learning

IRIS

Machine Learning (ML) strongly relies on optimization procedures that are based on gradient descent. Several gradient-based update schemes have been proposed in the scientific literature, especially in the context of neural networks, that have become common optimizers in software libraries for ML. In this paper, we re-frame gradient-based update strategies under the unifying lens of a Moreau-Yosida (MY) approximation of the loss function. By means of a first-order Taylor expansion, we make the MY approximation concretely exploitable to generalize the model update. In turn, this makes it easy to evaluate and compare the regularization properties that underlie the most common optimizers, such as gradient descent with momentum, ADAGRAD, RMSprop, and ADAM. The MY-based unifying view opens to the possibility of designing novel update schemes with customizable regularization properties. As case-study we propose to use the network outputs to deform the notion of closeness in the parameter space.

Toward novel optimizers: a Moreau-Yosida view of gradient-based learning

Betti Alessandro;Ciravegna Gabriele;Gori Marco;Melacci Stefano;Mottin Kevin;Precioso Frédéric

2023

Abstract

Machine Learning (ML) strongly relies on optimization procedures that are based on gradient descent. Several gradient-based update schemes have been proposed in the scientific literature, especially in the context of neural networks, that have become common optimizers in software libraries for ML. In this paper, we re-frame gradient-based update strategies under the unifying lens of a Moreau-Yosida (MY) approximation of the loss function. By means of a first-order Taylor expansion, we make the MY approximation concretely exploitable to generalize the model update. In turn, this makes it easy to evaluate and compare the regularization properties that underlie the most common optimizers, such as gradient descent with momentum, ADAGRAD, RMSprop, and ADAM. The MY-based unifying view opens to the possibility of designing novel update schemes with customizable regularization properties. As case-study we propose to use the network outputs to deform the notion of closeness in the parameter space.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Codice ISBN
	
				9783031475450
9783031475467
			
	Parole chiave
	
				Machine learning, Optimization
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
TNO.pdf non disponibili Descrizione: Toward Novel Optimizers: A Moreau-Yosida View of Gradient-Based Learning Tipologia: Versione Editoriale (PDF) Licenza: Copyright dell'editore Dimensione 343.94 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	343.94 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/34885

Citazioni

ND

0

social impact