Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α ≤ 3
ESAIM: Control, Optimisation and Calculus of Variations, Tome 25 (2019), article no. 2.

In a Hilbert space setting ℋ, given Φ : ℋ → ℝ a convex continuously differentiable function, and α a positive parameter, we consider the inertial dynamic system with Asymptotic Vanishing Damping

Depending on the value of α with respect to 3, we give a complete picture of the convergence properties as t → +∞ of the trajectories generated by (AVD)$$, as well as iterations of the corresponding algorithms. Indeed, as shown by Su-Boyd-Candès, the case α = 3 corresponds to a continuous version of the accelerated gradient method of Nesterov, with the rate of convergence Φ(x(t)) − min Φ = O(t−2) for α ≥ 3. Our main result concerns the subcritical case α ≤ 3, where we show that Φ(x(t)) − min Φ = O(t$$). This overall picture shows a continuous variation of the rate of convergence of the values Φ(x(t)) − min$$ Φ = O(t$$) with respect to α > 0: the coefficient p(α) increases linearly up to 2 when α goes from 0 to 3, then displays a plateau. Then we examine the convergence of trajectories to optimal solutions. As a new result, in the one-dimensional framework, for the critical value α = 3, we prove the convergence of the trajectories. In the second part of this paper, we study the convergence properties of the associated forward-backward inertial algorithms. They aim to solve structured convex minimization problems of the form min {Θ := Φ + Ψ}, with Φ smooth and Ψ nonsmooth. The continuous dynamics serves as a guideline for this study. We obtain a similar rate of convergence for the sequence of iterates (x$$): for α ≤ 3 we have Θ(x$$) − min Θ = O(k$$) for all p < 2α/3, and for α > 3 Θ(x$$) − min Θ = o(k−2). Finally, we show that the results are robust with respect to external perturbations.

Reçu le :
Accepté le :
DOI : 10.1051/cocv/2017083
Classification : 49M37, 65K05, 90C25
Mots-clés : Accelerated gradient method, FISTA, inertial forward-backward algorithms, Nesterov method, proximal-based methods, structured convex optimization, subcritical case, vanishing damping
Attouch, Hedy 1 ; Chbani, Zaki 1 ; Riahi, Hassan 1

1
@article{COCV_2019__25__A2_0,
     author = {Attouch, Hedy and Chbani, Zaki and Riahi, Hassan},
     title = {Rate of convergence of the {Nesterov} accelerated gradient method in the subcritical case \ensuremath{\alpha} \ensuremath{\leq} 3},
     journal = {ESAIM: Control, Optimisation and Calculus of Variations},
     publisher = {EDP-Sciences},
     volume = {25},
     year = {2019},
     doi = {10.1051/cocv/2017083},
     mrnumber = {3943364},
     zbl = {1437.49045},
     language = {en},
     url = {http://www.numdam.org/articles/10.1051/cocv/2017083/}
}
TY  - JOUR
AU  - Attouch, Hedy
AU  - Chbani, Zaki
AU  - Riahi, Hassan
TI  - Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α ≤ 3
JO  - ESAIM: Control, Optimisation and Calculus of Variations
PY  - 2019
VL  - 25
PB  - EDP-Sciences
UR  - http://www.numdam.org/articles/10.1051/cocv/2017083/
DO  - 10.1051/cocv/2017083
LA  - en
ID  - COCV_2019__25__A2_0
ER  - 
%0 Journal Article
%A Attouch, Hedy
%A Chbani, Zaki
%A Riahi, Hassan
%T Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α ≤ 3
%J ESAIM: Control, Optimisation and Calculus of Variations
%D 2019
%V 25
%I EDP-Sciences
%U http://www.numdam.org/articles/10.1051/cocv/2017083/
%R 10.1051/cocv/2017083
%G en
%F COCV_2019__25__A2_0
Attouch, Hedy; Chbani, Zaki; Riahi, Hassan. Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α ≤ 3. ESAIM: Control, Optimisation and Calculus of Variations, Tome 25 (2019), article no. 2. doi : 10.1051/cocv/2017083. http://www.numdam.org/articles/10.1051/cocv/2017083/

[1] F. Alvarez, On the minimizing property of a second-order dissipative system in Hilbert spaces. SIAM J. Control Optim. 38 (2000) 1102–1119. | DOI | MR | Zbl

[2] H. Attouch and A. Cabot, Asymptotic stabilization of inertial gradient dynamics with time-dependent viscosity. J. Differ. Equ. 263 (2017) 5412–5458. | DOI | MR | Zbl

[3] H. Attouch and A. Cabot, Convergence rates of inertial forward-backward algorithms, HAL Preprint 01453170 (2017). | MR

[4] H. Attouch, A. Cabot and P. Redont, The dynamics of elastic shocks via epigraphical regularization of a differential inclusion. Adv. Math. Sci. Appl. 12 (2002) 273–306. | MR | Zbl

[5] H. Attouch, Z. Chbani, J. Peypouquet and P. Redont, Fast convergence of inertial dynamics and algorithms with asymptotic vanishing damping. Math. Program. 168 (2018) 123–175. | DOI | MR | Zbl

[6] H. Attouch and J. Peypouquet, The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than 1/k2. SIAM J. Optim. 26 (2016) 1824–1834. | DOI | MR | Zbl

[7] V. Apidopoulos, J.-F. Aujol and Ch. Dossal, On a second order differential inclusion modeling the FISTA algorithm. (2017). | HAL

[8] V. Apidopoulos, J.-F. Aujol and Ch. Dossal, Convergence rate of inertial forward-backward algorithm beyond Nesterov’s rule. HAL Preprint 01551873 (2017). | MR

[9] J.-F. Aujol and C. Dossal, Stability of over-relaxations for the forward-backward algorithm, application to FISTA. SIAM J. Optim. 25 (2015) 2408–2433. | DOI | MR | Zbl

[10] J.-F. Aujol and C. Dossal, Optimal rate of convergence of an ODE associated to the Fast Gradient Descent schemes for b > 0. (2017). | HAL

[11] H. Bauschke and P.L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert spaces, CMS Books in Mathematics. Springer (2011). | MR | Zbl

[12] A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 (2009) 183–202. | DOI | MR | Zbl

[13] P. Bégout, J. Bolte and M.A. Jendoubi, On damped second-order gradient systems. J. Differ. Equ. 259 (2015) 3115–3143. | DOI | MR | Zbl

[14] H. Brézis, Opérateurs maximaux monotones dans les espaces de Hilbert et équations d’évolution. North Holland (1972).

[15] A. Cabot, H. Engler and S. Gadat, Second-order differential equations with asymptotically small dissipation and piecewise flat potentials. Electron. J. Differ. Eq. 17 (2009) 33–38. | MR | Zbl

[16] A. Chambolle and Ch. Dossal, On the convergence of the iterates of the fast iterative shrinkage thresholding algorithm. J. Optim. Theory Appl. 166 (2015) 968–982. | DOI | MR | Zbl

[17] C.Y. Drori and M. Teboulle, Performance of first-order methods for smooth convex minimization: a novel approach. Math. Progr. Ser. A. 145 (2014) 451–482. | DOI | MR | Zbl

[18] A. Haraux, Systèmes dynamiques dissipatifs et applications. RMA 17, Masson (1991). | MR | Zbl

[19] O. Güler, New proximal point algorithms for convex minimization. SIAM J. Optim. 2 (1992) 649–664. | DOI | MR | Zbl

[20] M.A. Jendoubi and R. May, Asymptotics for a second-order differential equation with nonautonomous damping and an integrable source term. Appl. Anal. 94 (2015) 435–443. | DOI | MR | Zbl

[21] D. Kim and J.A. Fessler, Optimized first-order methods for smooth convex minimization. Math. Progr. Ser. A 159 (2016) 81–107. | DOI | MR | Zbl

[22] J. Liang, J. Fadili and G. Peyré, Local linear convergence of forward-backward under partial smoothness. Adv. Neural Inf. Process. Syst. 2014 1970– 1978.

[23] R. May, Asymptotic for a second order evolution equation with convex potential and vanishing damping term. Preprint (2015). | arXiv | MR

[24] Y. Nesterov, A method of solving a convex programming problem with convergence rate O(1∕k2 ). Sov. Math. Dokl. 27 (1983) 372–376. | MR | Zbl

[25] Y. Nesterov, Introductory Lectures On Convex Optimization: A basic Course. Vol. 87 of Applied Optimization. Kluwer Academic Publishers, Boston, MA (2004). | DOI | MR | Zbl

[26] Y. Nesterov, Gradient methods for minimizing composite objective function. CORE Discussion Papers (2007).

[27] Z. Opial, Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Amer. Math. Soc. 73 (1967) 591–597. | DOI | MR | Zbl

[28] N. Parikh and S. Boyd, Proximal algorithms. Found. Trends Optimiz. 1 (2014) 123–231.

[29] J. Peypouquet, Convex optimization in normed spaces: theory, methods and examples. Springer (2015). | DOI | MR

[30] J. Peypouquet and S. Sorin, Evolution equations for maximal monotone operators: asymptotic analysis in continuous and discrete time. J. Convex Anal. 17 (2010) 1113–1163. | MR | Zbl

[31] B.T. Polyak, Introduction to optimization. Optimization software. New York (1987). | MR

[32] B.T. Polyak and P.S. Shcherbakova, Optimisation and asymptotic stability. Int. J. Control 91 (2018) 2404–2410. | DOI | MR | Zbl

[33] M. Schmidt, N. Le Roux and F. Bach, Convergence rates of inexact proximal-gradient methods for convex optimization. NIPS’11 – 25th Annual Conference on Neural Information Processing Systems, Grenada, Spain. Preprint: Hal: inria-00618152 (2011).

[34] M.V. Solodov and B.F. Svaiter, A unified framework for some inexact proximal point algorithms. Numer. Funct. Anal. Optim. 22 (2001) 1013–35. | DOI | MR | Zbl

[35] W. Su, S. Boyd and E.J. Candès, A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. Neural Inf. Process. Syst. 27 (2014) 2510–2518.

[36] S. Villa, S. Salzo, L. Baldassarres and A. Verri, Accelerated and inexact forward-backward. SIAM J. Optim. 23 (2013) 1607–1633. | DOI | MR | Zbl

Cité par Sources :