We introduce two algorithms for nonconvex regularized finite sum minimization, where typical Lipschitz differentiability assumptions are relaxed to the notion of relative smoothness. The first one is a Bregman extension of Finito/MISO [A. Defazio and J. Domke, Proc. Mach. Learn. Res. (PMLR), 32 (2014), pp. 1125-1133; J. Mairal, SIAM J. Optim., 25 (2015), pp. 829-855], studied for fully nonconvex problems when the sampling is randomized, or under convexity of the nonsmooth term when it is essentially cyclic. The second algorithm is a low-memory variant, in the spirit of SVRG [R. Johnson and T. Zhang, Advances in Neural Information Processing Systems 26, Curran Associates, Red Hook, NY, 2013, pp. 315-323] and SARAH [L. M. Nguyen et al., Proc. Mach. Learn. Res. (PMLR), 70 (2017), pp. 2613-2621], that also allows for fully nonconvex formulations. Our analysis is made remarkably simple by employing a Bregman-Moreau envelope as the Lyapunov function. In the randomized case, linear convergence is established when the cost function is strongly convex, yet with no convexity requirements on the individual functions in the sum. For the essentially cyclic and low-memory variants, global and linear convergence results are established when the cost function satisfies the Kurdyka- Lojasiewicz property.
Bregman Finito/MISO for nonconvex regularized finite sum minimization without Lipschitz gradient continuity
Latafat P.
;
2022-01-01
Abstract
We introduce two algorithms for nonconvex regularized finite sum minimization, where typical Lipschitz differentiability assumptions are relaxed to the notion of relative smoothness. The first one is a Bregman extension of Finito/MISO [A. Defazio and J. Domke, Proc. Mach. Learn. Res. (PMLR), 32 (2014), pp. 1125-1133; J. Mairal, SIAM J. Optim., 25 (2015), pp. 829-855], studied for fully nonconvex problems when the sampling is randomized, or under convexity of the nonsmooth term when it is essentially cyclic. The second algorithm is a low-memory variant, in the spirit of SVRG [R. Johnson and T. Zhang, Advances in Neural Information Processing Systems 26, Curran Associates, Red Hook, NY, 2013, pp. 315-323] and SARAH [L. M. Nguyen et al., Proc. Mach. Learn. Res. (PMLR), 70 (2017), pp. 2613-2621], that also allows for fully nonconvex formulations. Our analysis is made remarkably simple by employing a Bregman-Moreau envelope as the Lyapunov function. In the randomized case, linear convergence is established when the cost function is strongly convex, yet with no convexity requirements on the individual functions in the sum. For the essentially cyclic and low-memory variants, global and linear convergence results are established when the cost function satisfies the Kurdyka- Lojasiewicz property.File | Dimensione | Formato | |
---|---|---|---|
_2.pdf
non disponibili
Tipologia:
Versione Editoriale (PDF)
Licenza:
Copyright dell'editore
Dimensione
814.09 kB
Formato
Adobe PDF
|
814.09 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.