In this note, we provide an elementary analysis of the prediction error of ridge regression with random design. The proof is short and self-contained. In particular, it bypasses the use of Rudelson’s deviation inequality for covariance matrices, through a combination of exchangeability arguments, matrix perturbation and operator convexity.
Accepté le :
Publié le :
@article{CRMATH_2022__360_G9_1055_0, author = {Mourtada, Jaouad and Rosasco, Lorenzo}, title = {An elementary analysis of ridge regression with random design}, journal = {Comptes Rendus. Math\'ematique}, pages = {1055--1063}, publisher = {Acad\'emie des sciences, Paris}, volume = {360}, number = {G9}, year = {2022}, doi = {10.5802/crmath.367}, language = {en}, url = {http://www.numdam.org/articles/10.5802/crmath.367/} }
TY - JOUR AU - Mourtada, Jaouad AU - Rosasco, Lorenzo TI - An elementary analysis of ridge regression with random design JO - Comptes Rendus. Mathématique PY - 2022 SP - 1055 EP - 1063 VL - 360 IS - G9 PB - Académie des sciences, Paris UR - http://www.numdam.org/articles/10.5802/crmath.367/ DO - 10.5802/crmath.367 LA - en ID - CRMATH_2022__360_G9_1055_0 ER -
%0 Journal Article %A Mourtada, Jaouad %A Rosasco, Lorenzo %T An elementary analysis of ridge regression with random design %J Comptes Rendus. Mathématique %D 2022 %P 1055-1063 %V 360 %N G9 %I Académie des sciences, Paris %U http://www.numdam.org/articles/10.5802/crmath.367/ %R 10.5802/crmath.367 %G en %F CRMATH_2022__360_G9_1055_0
Mourtada, Jaouad; Rosasco, Lorenzo. An elementary analysis of ridge regression with random design. Comptes Rendus. Mathématique, Tome 360 (2022) no. G9, pp. 1055-1063. doi : 10.5802/crmath.367. http://www.numdam.org/articles/10.5802/crmath.367/
[1] Strong converse for identification via quantum channels, IEEE Trans. Inf. Theory, Volume 48 (2002) no. 3, pp. 569-579 | DOI | MR | Zbl
[2] Theory of reproducing kernels, Trans. Am. Math. Soc., Volume 68 (1950) no. 3, pp. 337-404 | DOI | MR | Zbl
[3] Non-strongly-convex smooth stochastic approximation with convergence rate , Advances in Neural Information Processing Systems 26 (2013), pp. 773-781
[4] Local rademacher complexities, Ann. Stat., Volume 33 (2005) no. 4, pp. 1497-1537 | MR | Zbl
[5] Benign overfitting in linear regression, Proc. Natl. Acad. Sci. USA, Volume 117 (2020) no. 48, pp. 30063-30070 | DOI | MR | Zbl
[6] Optimal Rates for Regularization of Statistical Inverse Learning Problems, Found. Comput. Math., Volume 18 (2018) no. 4, pp. 971-1013 | DOI | MR | Zbl
[7] Optimal rates for the regularized least-squares algorithm, Found. Comput. Math., Volume 7 (2007) no. 3, pp. 331-368 | DOI | MR | Zbl
[8] Trace inequalities and quantum entropy: an introductory course, Entropy and the quantum (Contemporary Mathematics), Volume 529, American Mathematical Society, 2010, pp. 73-140 | DOI | MR | Zbl
[9] PAC-Bayesian bounds for the Gram matrix and least squares regression with a random design (2016) (https://arxiv.org/abs/1603.05229)
[10] Model selection for regularized least-squares algorithm in learning theory, Found. Comput. Math., Volume 5 (2005) no. 1, pp. 59-85 | DOI | MR
[11] Nonparametric stochastic approximation with large step-sizes, Ann. Stat., Volume 44 (2016) no. 4, pp. 1363-1399 | MR | Zbl
[12] Empirical Processes in M-estimation, Cambridge University Press, 1999
[13] Average stability is invariant to data preconditioning. Implications to exp-concave empirical risk minimization, J. Mach. Learn. Res., Volume 18 (2018) no. 222, pp. 1-13 | MR | Zbl
[14] Application of Ridge Analysis to Regression Problems, Chem. Eng. Prog., Volume 58 (1962), pp. 54-59
[15] Random design analysis of Ridge regression, Found. Comput. Math., Volume 14 (2014) no. 3, pp. 569-600 | MR | Zbl
[16] Parallelizing stochastic gradient descent for least squares regression: Mini-batching, averaging, and model misspecification, J. Mach. Learn. Res., Volume 18 (2018) no. 223, pp. 1-42 | MR | Zbl
[17] Local Rademacher complexities and oracle inequalities in risk minimization, Ann. Stat., Volume 34 (2006) no. 6, pp. 2593-2656 | MR | Zbl
[18] Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. École d’Été de Probabilités de Saint-Flour, Lecture Notes in Mathematics, 2033, Springer, 2011
[19] Concentration inequalities and moment bounds for sample covariance operators, Bernoulli, Volume 23 (2017) no. 1, pp. 110-133 | MR | Zbl
[20] Fast rates for exp-concave empirical risk minimization, Advances in Neural Information Processing Systems 28 (2015), pp. 1477-1485
[21] Performance of empirical risk minimization in linear aggregation, Bernoulli, Volume 22 (2016) no. 3, pp. 1520-1534 | MR | Zbl
[22] Non commutative Khintchine and Paley inequalities, Ark. Mat., Volume 29 (1991) no. 1, pp. 241-260 | DOI | Zbl
[23] Some applications of concentration inequalities to statistics, Ann. Fac. Sci. Toulouse, Math., Volume 9 (2000) no. 2, pp. 245-303 | DOI | Numdam | MR | Zbl
[24] On the performance of kernel classes, J. Mach. Learn. Res., Volume 4 (2003), pp. 759-771 | MR
[25] Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices, Ann. Stat., Volume 50 (2022) no. 4, pp. 2157-2178 | MR
[26] An improper estimator with optimal excess risk in misspecified density estimation and logistic regression, J. Mach. Learn. Res., Volume 23 (2022) no. 31, pp. 1-49 | MR
[27] Sums of random Hermitian matrices and an inequality by Rudelson, Electron. Commun. Probab., Volume 15 (2010), pp. 203-212 | MR | Zbl
[28] The lower tail of random quadratic forms with applications to ordinary least squares, Probab. Theory Relat. Fields, Volume 166 (2016) no. 3, pp. 1175-1194 | DOI | MR | Zbl
[29] Learning with incremental iterative regularization, Advances in Neural Information Processing Systems 28 (2015), pp. 1630-1638
[30] Random vectors in the isotropic position, J. Funct. Anal., Volume 164 (1999) no. 1, pp. 60-72 | DOI | MR | Zbl
[31] Shannon sampling II: Connections to learning theory, Appl. Comput. Harmon. Anal., Volume 19 (2005) no. 3, pp. 285-302 | DOI | MR | Zbl
[32] Learning theory estimates via integral operators and their approximations, Constr. Approx., Volume 26 (2007) no. 2, pp. 153-172 | DOI | MR | Zbl
[33] Support Vector Machines, Springer, 2008
[34] Optimal rates for regularized least squares regression, Proc. 22nd Conference on Learning Theory (2009), pp. 79-93
[35] Solution of incorrectly formulated problems and the regularization method, Sov. Math., Dokl., Volume 4 (1963), pp. 1035-1038 | Zbl
[36] User-friendly tail bounds for sums of random matrices, Found. Comput. Math., Volume 12 (2012) no. 4, pp. 389-434 | DOI | MR | Zbl
[37] Benign overfitting in ridge regression (2020) (https://arxiv.org/abs/2009.14286)
[38] Suboptimality of constrained least squares and improvements via non-linear predictors (2021) (https://arxiv.org/abs/2009.09304)
[39] Introduction to the non-asymptotic analysis of random matrices, Cambridge University Press (2012), pp. 210-268
[40] Learning from examples as an inverse problem, J. Mach. Learn. Res., Volume 6 (2005) no. 5, pp. 883-904 | MR | Zbl
[41] Spline Models for Observational Data, 59, Society for Industrial and Applied Mathematics, 1990 | DOI
[42] On early stopping in gradient descent learning, Constr. Approx., Volume 26 (2007) no. 2, pp. 289-315 | MR | Zbl
[43] Dimension-free bounds for sums of independent matrices and simple tensors via the variational principle (2021) (https://arxiv.org/abs/2108.08198)
Cité par Sources :