We study the convergence rate of Bregman gradient methods for convex optimization in the space of measures on a
Accepté le :
Publié le :
Chizat, Lénaïc 1

@article{OJMO_2022__3__A8_0, author = {Chizat, L\'ena{\"\i}c}, title = {Convergence {Rates} of {Gradient} {Methods} for {Convex} {Optimization} in the {Space} of {Measures}}, journal = {Open Journal of Mathematical Optimization}, eid = {8}, pages = {1--19}, publisher = {Universit\'e de Montpellier}, volume = {3}, year = {2022}, doi = {10.5802/ojmo.20}, language = {en}, url = {https://www.numdam.org/articles/10.5802/ojmo.20/} }
TY - JOUR AU - Chizat, Lénaïc TI - Convergence Rates of Gradient Methods for Convex Optimization in the Space of Measures JO - Open Journal of Mathematical Optimization PY - 2022 SP - 1 EP - 19 VL - 3 PB - Université de Montpellier UR - https://www.numdam.org/articles/10.5802/ojmo.20/ DO - 10.5802/ojmo.20 LA - en ID - OJMO_2022__3__A8_0 ER -
%0 Journal Article %A Chizat, Lénaïc %T Convergence Rates of Gradient Methods for Convex Optimization in the Space of Measures %J Open Journal of Mathematical Optimization %D 2022 %P 1-19 %V 3 %I Université de Montpellier %U https://www.numdam.org/articles/10.5802/ojmo.20/ %R 10.5802/ojmo.20 %G en %F OJMO_2022__3__A8_0
Chizat, Lénaïc. Convergence Rates of Gradient Methods for Convex Optimization in the Space of Measures. Open Journal of Mathematical Optimization, Tome 3 (2022), article no. 8, 19 p.. doi: 10.5802/ojmo.20
[1] Winnowing with gradient descent, Conference on Learning Theory, PMLR (2020), pp. 163-182
[2] Interior gradient and proximal methods for convex and conic optimization, SIAM J. Optim., Volume 16 (2006) no. 3, pp. 697-725 | DOI | MR | Zbl
[3] On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent (2021) (https://arxiv.org/abs/2102.09769)
[4] Breaking the curse of dimensionality with convex neural networks, J. Mach. Learn. Theory, Volume 18 (2017) no. 1, pp. 629-681
[5] Learning Theory from First Principles Draft, 2021
[6] Essential smoothness, essential strict convexity, and Legendre functions in Banach spaces, Commun. Contemp. Math., Volume 3 (2001) no. 04, pp. 615-647 | DOI | MR | Zbl
[7] Bregman monotone optimization algorithms, SIAM J. Control Optim., Volume 42 (2003) no. 2, pp. 596-636 | DOI | MR | Zbl
[8] A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., Volume 2 (2009) no. 1, pp. 183-202 | DOI | MR | Zbl
[9] Convex Neural Networks, Advances in Neural Information Processing Systems (Weiss, Y.; Schölkopf, B.; Platt, J., eds.), Volume 18, MIT Press (2006)
[10] The alternating descent conditional gradient method for sparse inverse problems, SIAM J. Optim., Volume 27 (2017) no. 2, pp. 616-639 | DOI | MR | Zbl
[11] Linear convergence of iterative soft-thresholding, J. Fourier Anal. Appl., Volume 14 (2008) no. 5-6, pp. 813-837 | DOI | MR | Zbl
[12] Inverse problems in spaces of measures, ESAIM, Control Optim. Calc. Var., Volume 19 (2013) no. 1, pp. 190-218 | DOI | MR | Numdam | Zbl
[13] Convex Optimization: Algorithms and Complexity, Found. Trends Mach. Learn., Volume 8 (2015) no. 3-4, pp. 231-357 | DOI | Zbl
[14] Towards a mathematical theory of super-resolution, Commun. Pure Appl. Math., Volume 67 (2014) no. 6, pp. 906-956 | DOI | MR | Zbl
[15] "FISTA" in Banach spaces with adaptive discretisations (2021) (https://arxiv.org/abs/2101.09175)
[16] Sparse optimization on measures with over-parameterized gradient descent, Math. Program. (2021), pp. 1-46
[17] Implicit bias of gradient descent for wide two-layer neural networks trained with the logistic loss, Conference on Learning Theory, PMLR (2020), pp. 1305-1338
[18] Fast projection onto the simplex and the
[19] Acceleration methods (2021) (https://arxiv.org/abs/2101.09545)
[20] Exact reconstruction using Beurling minimal extrapolation, J. Math. Anal. Appl., Volume 395 (2012) no. 1, pp. 336-354 | DOI | MR | Zbl
[21] The sliding Frank–Wolfe algorithm and its application to super-resolution microscopy, Inverse Probl., Volume 36 (2019) no. 1, 014001, 42 pages | DOI | MR | Zbl
[22] Stochastic approximation in Hilbert spaces, Ph. D. Thesis, PSL Research University (2017)
[23] A mean-field analysis of two-player zero-sum games, Advances in Neural Information Processing Systems (Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M. F.; Lin, H., eds.), Volume 33, Curran Associates, Inc. (2020), pp. 20215-20226
[24] Thresholding gradient methods in Hilbert spaces: support identification and linear convergence, ESAIM, Control Optim. Calc. Var., Volume 26 (2020), 28, 20 pages | DOI | MR | Zbl
[25] Exponentiated gradient meets gradient descent, Algorithmic Learning Theory, PMLR (2020), pp. 386-407
[26] Riemannian geometry as determined by the volumes of small geodesic balls, Acta Math., Volume 142 (1979) no. 1, pp. 157-198 | DOI | MR | Zbl
[27] Mean-value theorems for Riemannian manifolds, Proc. R. Soc. Edinb., Sect. A, Math., Volume 92 (1982) no. 3-4, pp. 343-364 | DOI | MR | Zbl
[28] Solving large-scale optimization problems with a convergence rate independent of grid size, SIAM J. Numer. Anal., Volume 57 (2019) no. 3, pp. 1100-1123 | DOI | MR | Zbl
[29] The Moreau envelope function and proximal mapping in the sense of the Bregman distance, Nonlinear Anal., Theory Methods Appl., Volume 75 (2012) no. 3, pp. 1385-1399 | MR | Zbl
[30] Exponentiated gradient versus gradient descent for linear predictors, Inf. Comput., Volume 132 (1997) no. 1, pp. 1-63 | DOI | MR | Zbl
[31] Global optimization with polynomials and the problem of moments, SIAM J. Optim., Volume 11 (2001) no. 3, pp. 796-817 | DOI | MR | Zbl
[32] Local linear convergence of forward–backward under partial smoothness, Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2 (2014), pp. 1970-1978
[33] A mean field view of the landscape of two-layer neural networks, Proc. Natl. Acad. Sci. USA, Volume 115 (2018) no. 33, p. E7665-E7671 | MR | Zbl
[34] Problem complexity and method efficiency in optimization, Wiley-Interscience series in discrete mathematics, John Wiley & Sons, 1983
[35] On an approach to the construction of optimal methods of minimization of smooth convex functions, Ekonom. i. Mat. Metody, Volume 24 (1988) no. 3, pp. 509-517
[36] Introductory lectures on convex optimization: A basic course, 87, Springer, 2003
[37] Particle Dual Averaging: Optimization of Mean Field Neural Networks with Global Convergence Rate Analysis (2020) (https://arxiv.org/abs/2012.15477)
[38] The geometry of off-the-grid compressed sensing (2018) (https://arxiv.org/abs/1802.08464)
[39] Integrals which are convex functionals. II, Pac. J. Math., Volume 39 (1971) no. 2, pp. 439-469 | DOI | MR | Zbl
[40] Approximation accuracy, gradient methods, and error bound for structured convex optimization, Math. Program., Volume 125 (2010) no. 2, pp. 263-295 | DOI | MR | Zbl
[41] Implicit Regularization for Optimal Sparse Recovery, Advances in Neural Information Processing Systems, Volume 32, Curran Associates, Inc. (2019)
[42] Can shallow neural networks beat the curse of dimensionality? A mean field training perspective, IEEE Trans. Artif. Intell., Volume 1 (2020) no. 2, pp. 121-129 | DOI
[43] On early stopping in gradient descent learning, Computing, Volume 26 (2007) no. 2, pp. 289-315 | MR | Zbl
[44] The strong convexity of Von Neumann’s entropy, 2013 (Unpublished note)
Cité par Sources :