Exact Gauss-Newton optimization for training deep neural networks

IRIS

We present Exact Gauss-Newton (EGN), a stochastic second-order optimization algorithm that combines the generalized Gauss-Newton (GN) Hessian approximation with low-rank linear algebra to compute the descent direction. Leveraging the Duncan-Guttman matrix identity, the parameter update is obtained by factorizing a matrix which has the size of the mini-batch. This is particularly advantageous for large-scale machine learning problems where the dimension of the neural network parameter vector is several orders of magnitude larger than the batch size. Additionally, we show how improvements such as line search, adaptive regularization, and momentum can be seamlessly added to EGN to further accelerate the algorithm. Moreover, under mild assumptions, we prove that our algorithm converges in expectation to a stationary point of the objective. Finally, our numerical experiments demonstrate that EGN consistently exceeds, or at most matches the generalization performance of well-tuned SGD, Adam, GAF, SQN, and SGN optimizers across various supervised and reinforcement learning tasks.

Exact Gauss-Newton optimization for training deep neural networks

Korbit, Mikalai;Adeoye, Adeyemi D.;Bemporad, Alberto;Zanon, Mario

2025

Abstract

We present Exact Gauss-Newton (EGN), a stochastic second-order optimization algorithm that combines the generalized Gauss-Newton (GN) Hessian approximation with low-rank linear algebra to compute the descent direction. Leveraging the Duncan-Guttman matrix identity, the parameter update is obtained by factorizing a matrix which has the size of the mini-batch. This is particularly advantageous for large-scale machine learning problems where the dimension of the neural network parameter vector is several orders of magnitude larger than the batch size. Additionally, we show how improvements such as line search, adaptive regularization, and momentum can be seamlessly added to EGN to further accelerate the algorithm. Moreover, under mild assumptions, we prove that our algorithm converges in expectation to a stationary point of the objective. Finally, our numerical experiments demonstrate that EGN consistently exceeds, or at most matches the generalization performance of well-tuned SGD, Adam, GAF, SQN, and SGN optimizers across various supervised and reinforcement learning tasks.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Rivista
	
				NEUROCOMPUTING
			
	Parole chiave
	
				Stochastic optimization, Second-order optimization, Gauss-newton hessian approximation, Machine learning, Reinforcement learning
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Exact Gauss-Newton optimization for training deep neural networks.pdf accesso aperto Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 3.24 MB Formato Adobe PDF Visualizza/Apri	3.24 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/36398

Citazioni

ND

ND

social impact