Statistiques
An elementary analysis of ridge regression with random design
Comptes Rendus. Mathématique, Tome 360 (2022) no. G9, pp. 1055-1063.

In this note, we provide an elementary analysis of the prediction error of ridge regression with random design. The proof is short and self-contained. In particular, it bypasses the use of Rudelson’s deviation inequality for covariance matrices, through a combination of exchangeability arguments, matrix perturbation and operator convexity.

Reçu le :
Accepté le :
Publié le :
DOI : 10.5802/crmath.367
Classification : 62J07, 62G08, 60B20
Mourtada, Jaouad 1 ; Rosasco, Lorenzo 2, 3

1 CREST, ENSAE, Palaiseau, France
2 Universita’ di Genova & Istituto Italiano di Tecnologia, Genova, Italy
3 Center for Brains, Minds and Machines, MIT, Cambridge, United States
@article{CRMATH_2022__360_G9_1055_0,
     author = {Mourtada, Jaouad and Rosasco, Lorenzo},
     title = {An elementary analysis of ridge regression with random design},
     journal = {Comptes Rendus. Math\'ematique},
     pages = {1055--1063},
     publisher = {Acad\'emie des sciences, Paris},
     volume = {360},
     number = {G9},
     year = {2022},
     doi = {10.5802/crmath.367},
     language = {en},
     url = {http://www.numdam.org/articles/10.5802/crmath.367/}
}
TY  - JOUR
AU  - Mourtada, Jaouad
AU  - Rosasco, Lorenzo
TI  - An elementary analysis of ridge regression with random design
JO  - Comptes Rendus. Mathématique
PY  - 2022
SP  - 1055
EP  - 1063
VL  - 360
IS  - G9
PB  - Académie des sciences, Paris
UR  - http://www.numdam.org/articles/10.5802/crmath.367/
DO  - 10.5802/crmath.367
LA  - en
ID  - CRMATH_2022__360_G9_1055_0
ER  - 
%0 Journal Article
%A Mourtada, Jaouad
%A Rosasco, Lorenzo
%T An elementary analysis of ridge regression with random design
%J Comptes Rendus. Mathématique
%D 2022
%P 1055-1063
%V 360
%N G9
%I Académie des sciences, Paris
%U http://www.numdam.org/articles/10.5802/crmath.367/
%R 10.5802/crmath.367
%G en
%F CRMATH_2022__360_G9_1055_0
Mourtada, Jaouad; Rosasco, Lorenzo. An elementary analysis of ridge regression with random design. Comptes Rendus. Mathématique, Tome 360 (2022) no. G9, pp. 1055-1063. doi : 10.5802/crmath.367. http://www.numdam.org/articles/10.5802/crmath.367/

[1] Ahlswede, Rudolf; Winter, Andreas Strong converse for identification via quantum channels, IEEE Trans. Inf. Theory, Volume 48 (2002) no. 3, pp. 569-579 | DOI | MR | Zbl

[2] Aronszajn, Nachman Theory of reproducing kernels, Trans. Am. Math. Soc., Volume 68 (1950) no. 3, pp. 337-404 | DOI | MR | Zbl

[3] Bach, Francis; Moulines, Éric Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n), Advances in Neural Information Processing Systems 26 (2013), pp. 773-781

[4] Bartlett, Peter L.; Bousquet, Olivier; Mendelson, Shahar Local rademacher complexities, Ann. Stat., Volume 33 (2005) no. 4, pp. 1497-1537 | MR | Zbl

[5] Bartlett, Peter L.; Long, Philip M.; Lugosi, Gábor; Tsigler, Alexander Benign overfitting in linear regression, Proc. Natl. Acad. Sci. USA, Volume 117 (2020) no. 48, pp. 30063-30070 | DOI | MR | Zbl

[6] Blanchard, Gilles; Mücke, Nicole Optimal Rates for Regularization of Statistical Inverse Learning Problems, Found. Comput. Math., Volume 18 (2018) no. 4, pp. 971-1013 | DOI | MR | Zbl

[7] Caponnetto, Andrea; De Vito, Ernesto Optimal rates for the regularized least-squares algorithm, Found. Comput. Math., Volume 7 (2007) no. 3, pp. 331-368 | DOI | MR | Zbl

[8] Carlen, Eric Trace inequalities and quantum entropy: an introductory course, Entropy and the quantum (Contemporary Mathematics), Volume 529, American Mathematical Society, 2010, pp. 73-140 | DOI | MR | Zbl

[9] Catoni, Olivier PAC-Bayesian bounds for the Gram matrix and least squares regression with a random design (2016) (https://arxiv.org/abs/1603.05229)

[10] De Vito, Ernesto; Caponnetto, Andrea; Rosasco, Lorenzo Model selection for regularized least-squares algorithm in learning theory, Found. Comput. Math., Volume 5 (2005) no. 1, pp. 59-85 | DOI | MR

[11] Dieuleveut, Aymeric; Bach, Francis Nonparametric stochastic approximation with large step-sizes, Ann. Stat., Volume 44 (2016) no. 4, pp. 1363-1399 | MR | Zbl

[12] van de Geer, Sara Empirical Processes in M-estimation, Cambridge University Press, 1999

[13] Gonen, Alon; Shalev-Shwartz, Shai Average stability is invariant to data preconditioning. Implications to exp-concave empirical risk minimization, J. Mach. Learn. Res., Volume 18 (2018) no. 222, pp. 1-13 | MR | Zbl

[14] Hoerl, Arthur E. Application of Ridge Analysis to Regression Problems, Chem. Eng. Prog., Volume 58 (1962), pp. 54-59

[15] Hsu, Daniel; Kakade, Sham M.; Zhang, Tong Random design analysis of Ridge regression, Found. Comput. Math., Volume 14 (2014) no. 3, pp. 569-600 | MR | Zbl

[16] Jain, Prateek; Kakade, Sham M.; Kidambi, Rahul; Netrapalli, Praneeth; Sidford, Aaron Parallelizing stochastic gradient descent for least squares regression: Mini-batching, averaging, and model misspecification, J. Mach. Learn. Res., Volume 18 (2018) no. 223, pp. 1-42 | MR | Zbl

[17] Koltchinskii, Vladimir Local Rademacher complexities and oracle inequalities in risk minimization, Ann. Stat., Volume 34 (2006) no. 6, pp. 2593-2656 | MR | Zbl

[18] Koltchinskii, Vladimir Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. École d’Été de Probabilités de Saint-Flour, Lecture Notes in Mathematics, 2033, Springer, 2011

[19] Koltchinskii, Vladimir; Lounici, Karim Concentration inequalities and moment bounds for sample covariance operators, Bernoulli, Volume 23 (2017) no. 1, pp. 110-133 | MR | Zbl

[20] Koren, Tomer; Levy, Kfir Fast rates for exp-concave empirical risk minimization, Advances in Neural Information Processing Systems 28 (2015), pp. 1477-1485

[21] Lecué, Guillaume; Mendelson, Shahar Performance of empirical risk minimization in linear aggregation, Bernoulli, Volume 22 (2016) no. 3, pp. 1520-1534 | MR | Zbl

[22] Lust-Piquard, Françoise; Pisier, Gilles Non commutative Khintchine and Paley inequalities, Ark. Mat., Volume 29 (1991) no. 1, pp. 241-260 | DOI | Zbl

[23] Massart, Pascal Some applications of concentration inequalities to statistics, Ann. Fac. Sci. Toulouse, Math., Volume 9 (2000) no. 2, pp. 245-303 | DOI | Numdam | MR | Zbl

[24] Mendelson, Shahar On the performance of kernel classes, J. Mach. Learn. Res., Volume 4 (2003), pp. 759-771 | MR

[25] Mourtada, Jaouad Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices, Ann. Stat., Volume 50 (2022) no. 4, pp. 2157-2178 | MR

[26] Mourtada, Jaouad; Gaïffas, Stéphane An improper estimator with optimal excess risk in misspecified density estimation and logistic regression, J. Mach. Learn. Res., Volume 23 (2022) no. 31, pp. 1-49 | MR

[27] Oliveira, Roberto I. Sums of random Hermitian matrices and an inequality by Rudelson, Electron. Commun. Probab., Volume 15 (2010), pp. 203-212 | MR | Zbl

[28] Oliveira, Roberto I. The lower tail of random quadratic forms with applications to ordinary least squares, Probab. Theory Relat. Fields, Volume 166 (2016) no. 3, pp. 1175-1194 | DOI | MR | Zbl

[29] Rosasco, Lorenzo; Villa, Silvia Learning with incremental iterative regularization, Advances in Neural Information Processing Systems 28 (2015), pp. 1630-1638

[30] Rudelson, Mark Random vectors in the isotropic position, J. Funct. Anal., Volume 164 (1999) no. 1, pp. 60-72 | DOI | MR | Zbl

[31] Smale, Steve; Zhou, Ding-Xuan Shannon sampling II: Connections to learning theory, Appl. Comput. Harmon. Anal., Volume 19 (2005) no. 3, pp. 285-302 | DOI | MR | Zbl

[32] Smale, Steve; Zhou, Ding-Xuan Learning theory estimates via integral operators and their approximations, Constr. Approx., Volume 26 (2007) no. 2, pp. 153-172 | DOI | MR | Zbl

[33] Steinwart, Ingo; Christmann, Andreas Support Vector Machines, Springer, 2008

[34] Steinwart, Ingo; Hush, Don; Scovel, Clint Optimal rates for regularized least squares regression, Proc. 22nd Conference on Learning Theory (2009), pp. 79-93

[35] Tikhonov, Andrey N. Solution of incorrectly formulated problems and the regularization method, Sov. Math., Dokl., Volume 4 (1963), pp. 1035-1038 | Zbl

[36] Tropp, Joel A. User-friendly tail bounds for sums of random matrices, Found. Comput. Math., Volume 12 (2012) no. 4, pp. 389-434 | DOI | MR | Zbl

[37] Tsigler, Alexander; Bartlett, Peter L. Benign overfitting in ridge regression (2020) (https://arxiv.org/abs/2009.14286)

[38] Vaškevičius, Tomas; Zhivotovskiy, Nikita Suboptimality of constrained least squares and improvements via non-linear predictors (2021) (https://arxiv.org/abs/2009.09304)

[39] Vershynin, Roman Introduction to the non-asymptotic analysis of random matrices, Cambridge University Press (2012), pp. 210-268

[40] Vito, Ernesto De; Rosasco, Lorenzo; Caponnetto, Andrea; Giovannini, Umberto De; Odone, Francesca Learning from examples as an inverse problem, J. Mach. Learn. Res., Volume 6 (2005) no. 5, pp. 883-904 | MR | Zbl

[41] Wahba, Grace Spline Models for Observational Data, 59, Society for Industrial and Applied Mathematics, 1990 | DOI

[42] Yao, Yuan; Rosasco, Lorenzo; Caponnetto, Andrea On early stopping in gradient descent learning, Constr. Approx., Volume 26 (2007) no. 2, pp. 289-315 | MR | Zbl

[43] Zhivotovskiy, Nikita Dimension-free bounds for sums of independent matrices and simple tensors via the variational principle (2021) (https://arxiv.org/abs/2108.08198)

Cité par Sources :