Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α ≤ 3

Attouch, Hedy; Chbani, Zaki; Riahi, Hassan

doi:10.1051/cocv/2017083

Attouch, Hedy ¹ ; Chbani, Zaki ¹ ; Riahi, Hassan ¹

ESAIM: Control, Optimisation and Calculus of Variations, Tome 25 (2019), article no. 2.

Suite au passage du modèle économique de la revue en S20, le texte intégral des articles des années 2019 et 2020 est accessible uniquement sur le site de l'éditeur et est réservé aux abonnés.

Résumé

In a Hilbert space setting ℋ, given Φ : ℋ → ℝ a convex continuously differentiable function, and α a positive parameter, we consider the inertial dynamic system with Asymptotic Vanishing Damping

Depending on the value of α with respect to 3, we give a complete picture of the convergence properties as t → +∞ of the trajectories generated by (AVD)$$, as well as iterations of the corresponding algorithms. Indeed, as shown by Su-Boyd-Candès, the case α = 3 corresponds to a continuous version of the accelerated gradient method of Nesterov, with the rate of convergence Φ(x(t)) − min Φ = O(t⁻²) for α ≥ 3. Our main result concerns the subcritical case α ≤ 3, where we show that Φ(x(t)) − min Φ = O(t$$). This overall picture shows a continuous variation of the rate of convergence of the values Φ(x(t)) − min$$ Φ = O(t$$) with respect to α > 0: the coefficient p(α) increases linearly up to 2 when α goes from 0 to 3, then displays a plateau. Then we examine the convergence of trajectories to optimal solutions. As a new result, in the one-dimensional framework, for the critical value α = 3, we prove the convergence of the trajectories. In the second part of this paper, we study the convergence properties of the associated forward-backward inertial algorithms. They aim to solve structured convex minimization problems of the form min {Θ := Φ + Ψ}, with Φ smooth and Ψ nonsmooth. The continuous dynamics serves as a guideline for this study. We obtain a similar rate of convergence for the sequence of iterates (x$$): for α ≤ 3 we have Θ(x$$) − min Θ = O(k$$) for all p < 2α/3, and for α > 3 Θ(x$$) − min Θ = o(k⁻²). Finally, we show that the results are robust with respect to external perturbations.

Reçu le : 2017-06-23
Accepté le : 2017-12-17

MR Zbl

DOI : 10.1051/cocv/2017083

Classification : 49M37, 65K05, 90C25
Mots-clés : Accelerated gradient method, FISTA, inertial forward-backward algorithms, Nesterov method, proximal-based methods, structured convex optimization, subcritical case, vanishing damping

Affiliations des auteurs :

Attouch, Hedy ¹ ; Chbani, Zaki ¹ ; Riahi, Hassan ¹

@article{COCV_2019__25__A2_0,
     author = {Attouch, Hedy and Chbani, Zaki and Riahi, Hassan},
     title = {Rate of convergence of the {Nesterov} accelerated gradient method in the subcritical case \ensuremath{\alpha} \ensuremath{\leq} 3},
     journal = {ESAIM: Control, Optimisation and Calculus of Variations},
     publisher = {EDP-Sciences},
     volume = {25},
     year = {2019},
     doi = {10.1051/cocv/2017083},
     mrnumber = {3943364},
     zbl = {1437.49045},
     language = {en},
     url = {https://www.numdam.org/articles/10.1051/cocv/2017083/}
}

TY  - JOUR
AU  - Attouch, Hedy
AU  - Chbani, Zaki
AU  - Riahi, Hassan
TI  - Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α ≤ 3
JO  - ESAIM: Control, Optimisation and Calculus of Variations
PY  - 2019
VL  - 25
PB  - EDP-Sciences
UR  - https://www.numdam.org/articles/10.1051/cocv/2017083/
DO  - 10.1051/cocv/2017083
LA  - en
ID  - COCV_2019__25__A2_0
ER  -

%0 Journal Article
%A Attouch, Hedy
%A Chbani, Zaki
%A Riahi, Hassan
%T Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α ≤ 3
%J ESAIM: Control, Optimisation and Calculus of Variations
%D 2019
%V 25
%I EDP-Sciences
%U https://www.numdam.org/articles/10.1051/cocv/2017083/
%R 10.1051/cocv/2017083
%G en
%F COCV_2019__25__A2_0

Attouch, Hedy; Chbani, Zaki; Riahi, Hassan. Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α ≤ 3. ESAIM: Control, Optimisation and Calculus of Variations, Tome 25 (2019), article no. 2. doi : 10.1051/cocv/2017083. https://www.numdam.org/articles/10.1051/cocv/2017083/

Bibliographie
Cité par

[1] F. Alvarez, On the minimizing property of a second-order dissipative system in Hilbert spaces. SIAM J. Control Optim. 38 (2000) 1102–1119. | DOI | MR | Zbl

[2] H. Attouch and A. Cabot, Asymptotic stabilization of inertial gradient dynamics with time-dependent viscosity. J. Differ. Equ. 263 (2017) 5412–5458. | DOI | MR | Zbl

[3] H. Attouch and A. Cabot, Convergence rates of inertial forward-backward algorithms, HAL Preprint 01453170 (2017). | MR

[4] H. Attouch, A. Cabot and P. Redont, The dynamics of elastic shocks via epigraphical regularization of a differential inclusion. Adv. Math. Sci. Appl. 12 (2002) 273–306. | MR | Zbl

[5] H. Attouch, Z. Chbani, J. Peypouquet and P. Redont, Fast convergence of inertial dynamics and algorithms with asymptotic vanishing damping. Math. Program. 168 (2018) 123–175. | DOI | MR | Zbl

[6] H. Attouch and J. Peypouquet, The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than 1/k². SIAM J. Optim. 26 (2016) 1824–1834. | DOI | MR | Zbl

[7] V. Apidopoulos, J.-F. Aujol and Ch. Dossal, On a second order differential inclusion modeling the FISTA algorithm. (2017). | HAL

[8] V. Apidopoulos, J.-F. Aujol and Ch. Dossal, Convergence rate of inertial forward-backward algorithm beyond Nesterov’s rule. HAL Preprint 01551873 (2017). | MR

[9] J.-F. Aujol and C. Dossal, Stability of over-relaxations for the forward-backward algorithm, application to FISTA. SIAM J. Optim. 25 (2015) 2408–2433. | DOI | MR | Zbl

[10] J.-F. Aujol and C. Dossal, Optimal rate of convergence of an ODE associated to the Fast Gradient Descent schemes for b > 0. (2017). | HAL

[11] H. Bauschke and P.L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert spaces, CMS Books in Mathematics. Springer (2011). | MR | Zbl

[12] A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 (2009) 183–202. | DOI | MR | Zbl

[13] P. Bégout, J. Bolte and M.A. Jendoubi, On damped second-order gradient systems. J. Differ. Equ. 259 (2015) 3115–3143. | DOI | MR | Zbl

[14] H. Brézis, Opérateurs maximaux monotones dans les espaces de Hilbert et équations d’évolution. North Holland (1972).

[15] A. Cabot, H. Engler and S. Gadat, Second-order differential equations with asymptotically small dissipation and piecewise flat potentials. Electron. J. Differ. Eq. 17 (2009) 33–38. | MR | Zbl

[16] A. Chambolle and Ch. Dossal, On the convergence of the iterates of the fast iterative shrinkage thresholding algorithm. J. Optim. Theory Appl. 166 (2015) 968–982. | DOI | MR | Zbl

[17] C.Y. Drori and M. Teboulle, Performance of first-order methods for smooth convex minimization: a novel approach. Math. Progr. Ser. A. 145 (2014) 451–482. | DOI | MR | Zbl

[18] A. Haraux, Systèmes dynamiques dissipatifs et applications. RMA 17, Masson (1991). | MR | Zbl

[19] O. Güler, New proximal point algorithms for convex minimization. SIAM J. Optim. 2 (1992) 649–664. | DOI | MR | Zbl

[20] M.A. Jendoubi and R. May, Asymptotics for a second-order differential equation with nonautonomous damping and an integrable source term. Appl. Anal. 94 (2015) 435–443. | DOI | MR | Zbl

[21] D. Kim and J.A. Fessler, Optimized first-order methods for smooth convex minimization. Math. Progr. Ser. A 159 (2016) 81–107. | DOI | MR | Zbl

[22] J. Liang, J. Fadili and G. Peyré, Local linear convergence of forward-backward under partial smoothness. Adv. Neural Inf. Process. Syst. 2014 1970– 1978.

[23] R. May, Asymptotic for a second order evolution equation with convex potential and vanishing damping term. Preprint (2015). | arXiv | MR

[24] Y. Nesterov, A method of solving a convex programming problem with convergence rate O(1∕k² ). Sov. Math. Dokl. 27 (1983) 372–376. | MR | Zbl

[25] Y. Nesterov, Introductory Lectures On Convex Optimization: A basic Course. Vol. 87 of Applied Optimization. Kluwer Academic Publishers, Boston, MA (2004). | DOI | MR | Zbl

[26] Y. Nesterov, Gradient methods for minimizing composite objective function. CORE Discussion Papers (2007).

[27] Z. Opial, Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Amer. Math. Soc. 73 (1967) 591–597. | DOI | MR | Zbl

[28] N. Parikh and S. Boyd, Proximal algorithms. Found. Trends Optimiz. 1 (2014) 123–231.

[29] J. Peypouquet, Convex optimization in normed spaces: theory, methods and examples. Springer (2015). | DOI | MR

[30] J. Peypouquet and S. Sorin, Evolution equations for maximal monotone operators: asymptotic analysis in continuous and discrete time. J. Convex Anal. 17 (2010) 1113–1163. | MR | Zbl

[31] B.T. Polyak, Introduction to optimization. Optimization software. New York (1987). | MR

[32] B.T. Polyak and P.S. Shcherbakova, Optimisation and asymptotic stability. Int. J. Control 91 (2018) 2404–2410. | DOI | MR | Zbl

[33] M. Schmidt, N. Le Roux and F. Bach, Convergence rates of inexact proximal-gradient methods for convex optimization. NIPS’11 – 25th Annual Conference on Neural Information Processing Systems, Grenada, Spain. Preprint: Hal: inria-00618152 (2011).

[34] M.V. Solodov and B.F. Svaiter, A unified framework for some inexact proximal point algorithms. Numer. Funct. Anal. Optim. 22 (2001) 1013–35. | DOI | MR | Zbl

[35] W. Su, S. Boyd and E.J. Candès, A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. Neural Inf. Process. Syst. 27 (2014) 2510–2518.

[36] S. Villa, S. Salzo, L. Baldassarres and A. Verri, Accelerated and inexact forward-backward. SIAM J. Optim. 23 (2013) 1607–1633. | DOI | MR | Zbl

Cité par Sources :