[Discussion sur « Pénalités minimales et heuristique de pente » par Sylvain Arlot]
commenté par Discussion of “Minimal penalties and the slope heuristics: a survey” by Sylvain Arlot
[Discussion sur « Pénalités minimales et heuristique de pente » par Sylvain Arlot]
commenté par Discussion on “Minimal penalties and the slope heuristic: a survey” by Sylvain Arlot
[Discussion sur « Pénalités minimales et heuristique de pente » par Sylvain Arlot]
commenté par A note on BIC and the slope heuristic
[Discussion sur l’article de Sylvain Arlot : « Pénalités minimales et heuristique de pente »]
commenté par Discussion on “Minimal penalties and the slope heuristic: a survey” by Sylvain Arlot
[Discussion sur « Pénalités minimales et heuristique de pente » par Sylvain Arlot]
commenté par Discussion of “Minimal penalties and the slope heuristics: a survey” by Sylvain Arlot
[Discussion sur « Pénalités minimales et heuristique de pente » par Sylvain Arlot]
commenté par Discussion on “Minimal penalties and the slope heuristic: a survey” by Sylvain Arlot
[Discussion sur « Pénalités minimales et heuristique de pente » par Sylvian Arlot]
complété par Rejoinder on: Minimal penalties and the slope heuristics: a survey
Birgé et Massart ont proposé en 2001 l’heuristique de pente, pour déterminer à l’aide des données une constante multiplicative optimale devant une pénalité en sélection de modèles. Cette heuristique s’appuie sur la notion de pénalité minimale, et elle a depuis été généralisée en “algorithmes à base de pénalités minimales”. Cet article passe en revue les résultats théoriques obtenus sur ces algorithmes, avec une preuve complète dans le cadre le plus simple, des idées de preuves précises pour généraliser ce résultat au-delà des cadres déjà étudiés, et quelques résultats nouveaux. Des liens sont faits avec les méthodes d’estimation de la variance résiduelle (avec une contribution originale sur ce thème, qui démontre que l’heuristique de pente produit un estimateur de la variance quasiment aussi bon qu’un estimateur fondé sur les résidus d’un modèle oracle) ainsi qu’avec plusieurs algorithmes classiques tels que les heuristiques de coude (ou de courbe en L), de Mallows et FPE d’Akaike. Les questions de mise en œuvre pratique sont également étudiées, avec notamment la proposition de deux nouvelles définitions pratiques pour des algorithmes à base de pénalités minimales et leur comparaison aux définitions précédentes sur des données simulées. Enfin, des conjectures et problèmes ouverts sont proposés comme pistes de recherche pour l’avenir.
Birgé and Massart proposed in 2001 the slope heuristics as a way to choose optimally from data an unknown multiplicative constant in front of a penalty. It is built upon the notion of minimal penalty, and it has been generalized since to some “minimal-penalty algorithms”. This article reviews the theoretical results obtained for such algorithms, with a self-contained proof in the simplest framework, precise proof ideas for further generalizations, and a few new results. Explicit connections are made with residual-variance estimators —with an original contribution on this topic, showing that for this task the slope heuristics performs almost as well as a residual-based estimator with the best model choice— and some classical algorithms such as L-curve or elbow heuristics, Mallows’ , and Akaike’s FPE. Practical issues are also addressed, including two new practical definitions of minimal-penalty algorithms that are compared on synthetic data to previously-proposed definitions. Finally, several conjectures and open problems are suggested as future research directions.
Mot clés : sélection de modèles, sélection d’estimateurs, pénalisation, heuristique de pente, pénalité minimale, estimation de la variance résiduelle, heuristique de courbe en L, heuristique de coude, test scree, surpénalisation
@article{JSFS_2019__160_3_1_0, author = {Arlot, Sylvain}, title = {Minimal penalties and the slope heuristics: a survey}, journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique}, pages = {1--106}, publisher = {Soci\'et\'e fran\c{c}aise de statistique}, volume = {160}, number = {3}, year = {2019}, mrnumber = {4021408}, zbl = {1437.62121}, language = {en}, url = {http://www.numdam.org/item/JSFS_2019__160_3_1_0/} }
TY - JOUR AU - Arlot, Sylvain TI - Minimal penalties and the slope heuristics: a survey JO - Journal de la société française de statistique PY - 2019 SP - 1 EP - 106 VL - 160 IS - 3 PB - Société française de statistique UR - http://www.numdam.org/item/JSFS_2019__160_3_1_0/ LA - en ID - JSFS_2019__160_3_1_0 ER -
Arlot, Sylvain. Minimal penalties and the slope heuristics: a survey. Journal de la société française de statistique, Minimal penalties and the slope heuristics: a survey, Tome 160 (2019) no. 3, pp. 1-106. http://www.numdam.org/item/JSFS_2019__160_3_1_0/
[1] Sélection de modèles, 2002 In French. Master 1 report, ENS Paris. Available at https://www.math.u-psud.fr/~arlot/papers/02selection_modeles.pdf. Advisor: Yannick Baraud. Report about the paper “Gaussian model selection” by L. Birgé & P. Massart, JEMS 3(3):203–268, 2001.
[2] Data-driven calibration of linear estimators with minimal penalties, Advances in Neural Information Processing Systems 22 (Bengio, Y.; Schuurmans, D.; Lafferty, J.; Williams, C. K. I.; Culotta, A., eds.) (2009), pp. 46-54
[3] Data-driven calibration of linear estimators with minimal penalties, 2011 | arXiv
[4] A survey of cross-validation procedures for model selection, Statist. Surv., Volume 4 (2010), pp. 40-79 | DOI | MR | Zbl
[5] A Kernel Multiple Change-point Algorithm via Model Selection, J. Mach. Learn. Res. (2019) (To appear. Preliminary version available at arXiv:1202.3878) | MR | Zbl
[6] Estimating a discrete distribution via histogram selection, ESAIM: Probability and Statistics, Volume 15 (2011), pp. 1-29 | DOI | Numdam | MR | Zbl
[7] Fitting autoregressive models for prediction, Ann. Inst. Statist. Math., Volume 21 (1969), pp. 243-247 | MR | Zbl
[8] Statistical predictor identification, Ann. Inst. Statist. Math., Volume 22 (1970), pp. 203-217 | MR | Zbl
[9] Information theory and an extension of the maximum likelihood principle, Second International Symposium on Information Theory (Tsahkadsor, 1971), Akadémiai Kiadó, Budapest, 1973, pp. 267-281 | MR | Zbl
[10] The relationship between variable selection and data augmentation and a method for prediction, Technometrics, Volume 16 (1974), pp. 125-127 | MR | Zbl
[11] Data-driven calibration of penalties for least-squares regression, J. Mach. Learn. Res., Volume 10 (2009), p. 245-279 (electronic) http://www.jmlr.org/papers/volume10/arlot09a/arlot09a.pdf
[12] Resampling and Model Selection, University Paris-Sud 11, December (2007) http://tel.archives-ouvertes.fr/tel-00198803/en/ (Ph. D. Thesis Available at https://tel.archives-ouvertes.fr/tel-00198803v1 )
[13] Model selection by resampling penalization, Electron. J. Stat., Volume 3 (2009), p. 557-624 (electronic) | DOI | MR | Zbl
[14] Critical dimension in profile semiparametric estimation, Electron. J. Statist., Volume 8 (2014) no. 2, pp. 3077-3125 | DOI | MR | Zbl
[15] Model Selection and Multimodel Inference, Springer-Verlag, New York, 2002, xxvi+488 pages (A practical information-theoretic approach) | MR | Zbl
[16] Model selection for regression on a fixed design, Probab. Theory Related Fields, Volume 117 (2000) no. 4, pp. 467-493 | MR | Zbl
[17] Estimator selection with respect to Hellinger-type risks, Probab. Theory Related Fields, Volume 151 (2011) no. 1-2, pp. 353-401 | DOI | MR | Zbl
[18] Model selection for clustering. Choosing the number of classes, University Paris-Sud, December (2009) http://tel.archives-ouvertes.fr/tel-00461550/ (Ph. D. Thesis Available at https://tel.archives-ouvertes.fr/tel-00461550v1 )
[19] Estimation and model selection for model-based clustering with the conditional classification likelihood, Electron. J. Statist., Volume 9 (2015) no. 1, pp. 1041-1077 | DOI | MR | Zbl
[20] Local Rademacher complexities, Ann. Statist., Volume 33 (2005) no. 4, pp. 1497-1537 | MR | Zbl
[21] The discriminative functional mixture model for a comparative analysis of bike sharing systems, Ann. Appl. Stat., Volume 9 (2015) no. 4, pp. 1726-1760 | DOI | MR | Zbl
[22] Pivotal estimation via square-root Lasso in nonparametric regression, Ann. Statist., Volume 42 (2014) no. 2, pp. 757-788 | DOI | MR | Zbl
[23] A graphical method for estimating the residual variance in nonparametric regression, Biometrika, Volume 76 (1989) no. 2, pp. 203-210 | DOI | MR | Zbl
[24] Optimistic lower bounds for convex regularized least-squares, 2017 (arXiv:1703.01332v3)
[25] The noise barrier and the large signal bias of the Lasso and other convex estimators, 2018 (arXiv:1804.01230v4)
[26] Parameter Selection for Principal Curves, IEEE Transactions on Information Theory, Volume 58 (2012) no. 3, pp. 1924-1939 | DOI | MR | Zbl
[27] Kernel discriminant analysis and clustering with parsimonious Gaussian process models, Statistics and Computing, Volume 25 (2015) no. 6, pp. 1143-1162 | DOI | MR | Zbl
[28] Semi-parametric detection of multiple changes in long-range dependent processes, 2018 no. 1801.02515v2 (arXiv:1801.02515v2) | MR | Zbl
[29] Gaussian model selection with an unknown variance, Ann. Statist., Volume 37 (2009) no. 2, pp. 630-672 | DOI | MR | Zbl
[30] Estimator selection in the Gaussian setting, Ann. Inst. Henri Poincaré Probab. Stat., Volume 50 (2014) no. 3, pp. 1092-1119 | DOI | Numdam | MR | Zbl
[31] Spatial CART Classification Trees, 2018 (Available at https://hal.archives-ouvertes.fr/hal-01837065v1 ) | HAL
[32] Multiple breaks detection in general causal time series using penalized quasi-likelihood, Electron. J. Stat., Volume 6 (2012), p. 435-477 (electronic) | DOI | Zbl
[33] Variance estimation in nonparametric regression via the difference sequence method, Ann. Statist., Volume 35 (2007) no. 5, pp. 2219-2232 | DOI | MR | Zbl
[34] Adaptive Dantzig density estimation, Ann. Inst. H. Poincaré Probab. Statist., Volume 47 (2011) no. 1, pp. 43-74 | DOI | Numdam | Zbl
[35] Adaptive pointwise estimation of conditional density function, Ann. Inst. Henri Poincaré Probab. Stat., Volume 52 (2016) no. 2, pp. 939-980 | DOI | Zbl
[36] A generalized Cp criterion for Gaussian model selection (2001) http://massart.pascal.free.fr/Site/publications_files/Cp.pdf (Technical report Prépublication 647, 39 pages. Available at http://massart.pascal.free.fr/Site/publications_files/Cp.pdf )
[37] Gaussian model selection, J. Eur. Math. Soc. (JEMS), Volume 3 (2001) no. 3, pp. 203-268 | MR | Zbl
[38] Discussion: “Local Rademacher complexities and oracle inequalities in risk minimization” [Ann. Statist. 34 (2006), no. 6, 2593–2656] by V. Koltchinskii, Ann. Statist., Volume 34 (2006) no. 6, pp. 2664-2671 | MR
[39] Empirical minimization, Probability Theory and Related Fields, Volume 135 (2006) no. 3, pp. 311-334 | Zbl
[40] Minimal penalties for Gaussian model selection, Probab. Theory Related Fields, Volume 138 (2007) no. 1-2, pp. 33-73 | MR | Zbl
[41] A high dimensional Wilks phenomenon, Probab. Theory Related Fields, Volume 150 (2011) no. 3-4, pp. 405-433 | DOI | Zbl
[42] From model selection to adaptive estimation, Festschrift for Lucien Le Cam, Springer, New York, 1997, pp. 55-87 | MR | Zbl
[43] Slope heuristics: overview and implementation, Statistics and Computing, Volume 22 (2012) no. 2, pp. 455-470 | Zbl
[44] Local bandwidth selection for kernel density estimation in bifurcating Markov chain model, 2017 (arXiv:1706.07034v1)
[45] Submodel Selection and Evaluation in Regression. The X-Random Case, International Statistical Review, Volume 60 (1992) no. 3, pp. 291-319
[46] Clustering and variable selection for categorical multivariate data, Electron. J. Stat., Volume 7 (2013), pp. 2344-2371 | DOI | Zbl
[47] Bounds on the Prediction Error of Penalized Least Squares Estimators with Convex Penalty, Modern Problems of Stochastic Analysis and Statistics (Panov, Vladimir, ed.), Springer International Publishing, Cham (2017), pp. 315-333 | Zbl
[48] Modified Akaike’s criterion for histogram density estimation (1999) no. 1999-61 https://www.math.u-psud.fr/~biblio/pub/1999/abs/ppo1999_61.html (Technical report Available at https://www.math.u-psud.fr/~biblio/pub/1999/abs/ppo1999_61.html )
[49] The scree test for the number of factors, Multivariate Behav. Res., Volume 1 (1966) no. 2, pp. 245-276
[50] A comparison of variance estimators in nonparametric regression, J. Roy. Statist. Soc. Ser. B, Volume 54 (1992) no. 3, pp. 773-780 http://links.jstor.org/sici?sici=0035-9246(1992)54:3<773:ACOVEI>2.0.CO;2-7&origin=MSN | MR
[51] On oracle inequalities related to smoothing splines, Math. Methods Statist., Volume 15 (2006) no. 4, p. 398-414 (2007) | MR
[52] The triangle method for finding the corner of the L-curve, Appl. Numer. Math., Volume 43 (2002) no. 4, pp. 359-373 | DOI | MR | Zbl
[53] A note on the approximate admissibility of regularized estimators in the Gaussian sequence model, Electron. J. Statist., Volume 11 (2017) no. 2, pp. 4746-4768 | DOI | MR | Zbl
[54] Penalization versus Goldenshluger-Lepski strategies in warped bases regression, ESAIM Probab. Stat., Volume 17 (2013), pp. 328-358 | DOI | Zbl
[55] A new perspective on least squares under convex constraint, Ann. Statist., Volume 42 (2014) no. 6, pp. 2340-2381 | DOI | Zbl
[56] High dimensional regression and matrix estimation without tuning parameters, 2015 | arXiv
[57] Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection, Oil Gas Sci. Technol. – Rev. IFP Energies nouvelles, Volume 69 (2014) no. 2, pp. 245-259 | DOI
[58] Model selection for simplicial approximation, Foundations of Computational Mathematics, Volume 11 (2011) no. 6, pp. 707-731 | DOI | Zbl
[59] Calibration d’algorithmes de type Lasso et analyse statistique de données métallurgiques en aéronautique, Université Paris-Sud (2011) (Ph. D. Thesis)
[60] A new algorithm for fixed design regression and denoising, Ann. Inst. Statist. Math., Volume 56 (2004) no. 3, pp. 449-473 | MR | Zbl
[61] A comprehensive trial of the scree and K.G. criteria for determining the number of factors, Multivariate Behav. Res., Volume 12 (1977) no. 3, pp. 289-325 | DOI
[62] Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation, Numer. Math., Volume 31 (1978) no. 4, pp. 377-403 | MR | Zbl
[63] Joint rank and variable selection for parsimonious estimation in a high-dimensional finite mixture regression model, Journal of Multivariate Analysis, Volume 157 (2017), pp. 1-13 | DOI | Zbl
[64] Model-based regression clustering for high-dimensional data: application to functional data, Adv. Data Analysis and Classification, Volume 11 (2017) no. 2, pp. 243-279 | DOI | Zbl
[65] Block-diagonal covariance selection for high-dimensional Gaussian graphical models, Journal of the American Statistical Association (2018), pp. 306-314 | DOI | Zbl
[66] Nonlinear network-based quantitative trait prediction from transcriptomic data, 2017 (arXiv:1701.07899v5)
[67] Clustering electricity consumers using high-dimensional regression mixture models, Applied Stochastic Models in Business and Industry (2019), pp. 1-19 | DOI
[68] Wavelet shrinkage: asymptopia?, J. Roy. Statist. Soc. Ser. B, Volume 57 (1995) no. 2, pp. 301-369 http://links.jstor.org/sici?sici=0035-9246(1995)57:2<301:WSA>2.0.CO;2-S&origin=MSN (With discussion and a reply by the authors) | MR | Zbl
[69] The degrees of freedom of the lasso for general design matrix, Statistica Sinica, Volume 23 (2013) no. 2, pp. 809-828 http://www.jstor.org/stable/24310363 | Zbl
[70] Clustering and Model Selection via Penalized Likelihood for Different-sized Categorical Data Vectors, 2017 (arXiv:1709.02294v1)
[71] Estimating the joint distribution of independent categorical variables via model selection, Bernoulli, Volume 15 (2009) no. 2, pp. 475-507 | DOI | Zbl
[72] Estimating the variance in nonparametric regression—what is a reasonable choice?, J. R. Stat. Soc. Ser. B Stat. Methodol., Volume 60 (1998) no. 4, pp. 751-764 | DOI | MR | Zbl
[73] A covariate-matched estimator of the error variance in nonparametric regression, J. Nonparametr. Stat., Volume 21 (2009) no. 3, pp. 263-285 | DOI | MR | Zbl
[74] The estimation of prediction error: covariance penalties and cross-validation, J. Amer. Statist. Assoc., Volume 99 (2004) no. 467, pp. 619-642 (With comments and a rejoinder by the author) | MR | Zbl
[75] How biased is the apparent error rate of a prediction rule?, J. Amer. Statist. Assoc., Volume 81 (1986) no. 394, pp. 461-470 | MR | Zbl
[76] Using the -curve for determining optimal regularization parameters, Numer. Math., Volume 69 (1994) no. 1, pp. 25-31 | DOI | MR | Zbl
[77] Étude de la décroissance des valeurs propres dans une analyse en composantes principales: Comparaison avec le modèle du bâton brisé, Journal of Experimental Marine Biology and Ecology, Volume 25 (1976) no. 1, pp. 67-75 | DOI
[78] Clustering transformed compositional data using K-means, with applications in gene expression and bicycle sharing system data, Journal of Applied Statistics, Volume 46 (2019) no. 1, pp. 47-65 | DOI
[79] The Optimal Hard Threshold for Singular Values is , IEEE Trans. Inform. Theory, Volume 60 (2014) no. 8, pp. 5040-5053 | DOI | Zbl
[80] Simultaneous estimation of the mean and the variance in heteroscedastic Gaussian regression, Electron. J. Stat., Volume 2 (2008), pp. 1345-1372 | DOI | MR | Zbl
[81] High-dimensional regression with unknown variance, Statist. Sci., Volume 27 (2012) no. 4, pp. 500-518 | DOI | MR | Zbl
[82] Estimation of Gaussian graphs by model selection, Electron. J. Stat., Volume 2 (2008), p. 542-563 (electronic) | DOI | Zbl
[83] Low rank multivariate regression, Electron. J. Stat., Volume 5 (2011), pp. 775-799 | DOI | MR | Zbl
[84] Using CART to Detect Multiple Change Points in the Mean for large samples (2008) no. 12 (Technical report Available at https://hal.archives-ouvertes.fr/hal-00327146v1 )
[85] Oracle approach and slope heuristic in context tree estimation, 2011 (arXiv:1111.2191v1)
[86] Bandwidth selection in kernel density estimation: oracle inequalities and adaptive minimax optimality, Ann. Statist., Volume 39 (2011) no. 3, pp. 1608-1632 | DOI | MR | Zbl
[87] Quantile universal threshold, Electron. J. Statist., Volume 11 (2017) no. 2, pp. 4701-4722 | DOI | Zbl
[88] Consistent order estimation and minimal penalties, IEEE Trans. Inform. Theory, Volume 59 (2013) no. 2, pp. 1115-1128 | DOI | Zbl
[89] Regularization using a parameterized trust region subproblem, Math. Program., Volume 116 (2009) no. 1-2, pp. 193-220 | Zbl
[90] Analysis of discrete ill-posed problems by means of the -curve, SIAM Rev., Volume 34 (1992) no. 4, pp. 561-580 | DOI | MR | Zbl
[91] Limitations of the -curve method in ill-posed problems, BIT, Volume 36 (1996) no. 2, pp. 287-301 | DOI | MR | Zbl
[92] Cattell’s Scree Test In Relation To Bartlett’s Chi-Square Test And Other Observations On The Number Of Factors Problem, Multivariate Behavioral Research, Volume 14 (1979) no. 3, pp. 283-300 | DOI
[93] An adaptive pruning algorithm for the discrete L-curve criterion, J. Comput. Appl. Math., Volume 198 (2007) no. 2, pp. 483-492 | DOI | MR | Zbl
[94] Asymptotically optimal difference-based estimation of variance in nonparametric regression, Biometrika, Volume 77 (1990) no. 3, pp. 521-528 | DOI | MR | Zbl
[95] Model functions in the modified -curve method—case study: the heat flux reconstruction in pool boiling, Inverse Problems, Volume 26 (2010) no. 5, 13 pages | DOI | MR | Zbl
[96] On variance estimation in nonparametric regression, Biometrika, Volume 77 (1990) no. 2, pp. 415-419 | DOI | MR | Zbl
[97] The use of the -curve in the regularization of discrete ill-posed problems, SIAM J. Sci. Comput., Volume 14 (1993) no. 6, pp. 1487-1503 | DOI | MR | Zbl
[98] A rationale and test for the number of factors in factor analysis, Psychometrika, Volume 30 (1965) no. 2, pp. 179-185 | DOI | Zbl
[99] Stopping rules in principal components analysis: a comparison of heuristical and statistical approaches, Ecology, Volume 74 (1993) no. 8
[100] Rademacher penalties and structural risk minimization, IEEE Trans. Inform. Theory, Volume 47 (2001) no. 5, pp. 1902-1914 | MR | Zbl
[101] Local Rademacher complexities and oracle inequalities in risk minimization, Ann. Statist., Volume 34 (2006) no. 6, pp. 2593-2656 | MR | Zbl
[102] Using penalized contrasts for the change-point problem, Signal Proces., Volume 85 (2005) no. 8, pp. 1501-1510 | Zbl
[103] Residual variance estimation using a nearest neighbor statistic, J. Multivariate Anal., Volume 101 (2010) no. 4, pp. 811-823 | DOI | MR | Zbl
[104] Quelques approches pour la détection de ruptures à horizon fini, Université Paris-Sud, July (2002) http://www.theses.fr/2002PA112141 (Ph. D. Thesis)
[105] Detecting multiple change-points in the mean of a Gaussian process by model selection, Signal Proces., Volume 85 (2005), pp. 717-736 | Zbl
[106] State-by-state Minimax Adaptive Estimation for Nonparametric Hidden Markov Models, Journal of Machine Learning Research, Volume 19 (2018) no. 39, pp. 1-46 http://jmlr.org/papers/v19/17-345.html | Zbl
[107] Rééchantillonnage et sélection de modèles optimale pour l’estimation de la densité de variables indépendantes ou mélangeantes, INSA de Toulouse, June (2009) http://lerasle.perso.math.cnrs.fr/docs/these.pdf (Ph. D. Thesis Available at http://lerasle.perso.math.cnrs.fr/docs/these.pdf )
[108] Optimal model selection for stationary data under various mixing conditions, Ann. Statist., Volume 39 (2011) no. 4, pp. 1852-1877 | DOI | Zbl
[109] Optimal model selection in density estimation, Ann. Inst. Henri Poincaré Probab. Stat., Volume 48 (2012) no. 3, pp. 884-908 | DOI | Numdam | MR | Zbl
[110] Modèle de Cox: estimation par sélection de modèle et modèle de chocs bivarié, Université Paris-Sud (2000) (Ph. D. Thesis Available at http://www-ljk.imag.fr/membres/Frederique.Letue/These3.pdf )
[111] From Stein’s unbiased risk estimates to the method of generalized cross validation, Ann. Statist., Volume 13 (1985) no. 4, pp. 1352-1377 | MR | Zbl
[112] Asymptotic optimality of and generalized cross-validation in ridge regression with application to spline smoothing, Ann. Statist., Volume 14 (1986) no. 3, pp. 1101-1112 | DOI | MR | Zbl
[113] Asymptotic optimality for , , cross-validation and generalized cross-validation: discrete index set, Ann. Statist., Volume 15 (1987) no. 3, pp. 958-975 | MR | Zbl
[114] Minimal penalty for Goldenshluger-Lepski method, Stochastic Processes and their Applications, Volume 126 (2016) no. 12, pp. 3774-3789 (In Memoriam: Evarist Giné) | DOI | Zbl
[115] Estimator Selection: a New Method with Applications to Kernel Density Estimation, Sankhya A, Volume 79 (2017) no. 2, pp. 298-335 | DOI | Zbl
[116] Optimal kernel selection for density estimation, High Dimensional Probability VII: The Cargese Volume (Progress in Probability), Volume 71, Springer, 2016, pp. 425-460 (Preliminary version available at arXiv:1511.02112) | DOI | Zbl
[117] Model selection using Rademacher penalization, Proceedings of the 2nd ICSC Symp. on Neural Computation (NC2000). Berlin, Germany, ICSC Academic Press (2000)
[118] An Oracle Approach for Interaction Neighborhood Estimation in Random Fields, Electron. J. Stat., Volume 5 (2011), p. 534-571 (electronic) | DOI | Zbl
[119] Sharp oracle inequalities and slope heuristic for specification probabilities estimation in discrete random fields, Bernoulli, Volume 22 (2016) no. 1, pp. 325-344 | DOI | Zbl
[120] Residual variance estimation in machine learning, Neurocomputing, Volume 72 (2009) no. 16, pp. 3692-3703 Financial Engineering Computational and Ambient Intelligence (IWANN 2007) | DOI
[121] Homogeneity and change-point detection tests for multivariate data using rank statistics, Journal de la SFdS, Volume 156 (2015) no. 4, pp. 133-162 | Numdam | Zbl
[122] Cross-Validation and Penalization for Density Estimation, Université Paris Sud - Paris XI, May (2015) http://tel.archives-ouvertes.fr/tel-01164581 (Ph. D. Thesis Available at https://tel.archives-ouvertes.fr/tel-01164581v1 )
[123] Some comments on , Technometrics, Volume 15 (1973), pp. 661-675 | Zbl
[124] A non-asymptotic theory for model selection, European Congress of Mathematics, Eur. Math. Soc., Zürich, 2005, pp. 309-323 | MR | Zbl
[125] Concentration Inequalities and Model Selection, Lecture Notes in Mathematics, 1896, Springer, Berlin, 2007, xiv+337 pages (Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6–23, 2003, With a foreword by Jean Picard) | MR | Zbl
[126] Sélection de modèles: de la théorie à la pratique, Journal de la SFdS, Volume 149 (2008) no. 4, pp. 5-28 | Zbl
[127] Learning without concentration for general loss functions, Probab. Theory Related Fields, Volume 171 (2018) no. 1-2, pp. 459-502 | DOI | MR | Zbl
[128] Concentration behavior of the penalized least squares estimator, Statistica Neerlandica, Volume 72 (2018) no. 2, pp. 109-125 | DOI
[129] Modélisation de la production d’hydrocarbures dans un bassin pétrolier, Université Paris-Sud, December (2008) http://tel.archives-ouvertes.fr/tel-00345753/ (Ph. D. Thesis Available at http://tel.archives-ouvertes.fr/tel-00345753v1 )
[130] Least squares methods for ill-posed problems with a prescribed bound., SIAM J. Math. Anal., Volume 1 (1970), pp. 52-74 | MR | Zbl
[131] A non asymptotic penalized criterion for gaussian mixture model selection, ESAIM Probab. Stat., Volume 15 (2011), pp. 41-68 | DOI | Numdam | MR | Zbl
[132] Data-driven penalty calibration: A case study for Gaussian model selection, ESAIM Probab. Stat., Volume 15 (2011), pp. 320-339 | DOI | Numdam | MR | Zbl
[133] Statistical clustering of temporal networks through a dynamic stochastic block model, Journal of the Royal Statistical Society: Series B (Statistical Methodology), Volume 79 (2017) no. 4, pp. 1119-1141 | DOI | MR | Zbl
[134] A sparse variable selection procedure in model-based clustering, 2012 (Available at https://hal.inria.fr/hal-00734316v1 )
[135] Risk bounds for statistical learning, Ann. Statist., Volume 34 (2006) no. 5, pp. 2326-2366 | MR | Zbl
[136] Estimating the error variance in nonparametric regression by a covariate-matched -statistic, Statistics, Volume 37 (2003) no. 3, pp. 179-188 | DOI | MR | Zbl
[137] Smooth discrimination analysis, Ann. Statist., Volume 27 (1999) no. 6, pp. 1808-1829 | MR | Zbl
[139] A proportional hazards regression model with change-points in the baseline function, Lifetime Data Analysis, Volume 19 (2013) no. 1, pp. 59-78 | DOI | MR | Zbl
[140] Near optimal thresholding estimation of a Poisson intensity on the real line, Electron. J. Stat., Volume 4 (2010), p. 172-238 (electronic) | DOI | MR | Zbl
[141] Adaptive density estimation: a curse of support?, J. Statist. Plann. Inference, Volume 141 (2011) no. 1, pp. 115-139 | DOI | MR | Zbl
[142] Adaptive estimation for Hawkes processes; application to genome analysis, Ann. Statist., Volume 38 (2010) no. 5, pp. 2781-2822 | DOI | MR | Zbl
[143] A regularization parameter in discrete ill-posed problems, SIAM J. Sci. Comput., Volume 17 (1996) no. 3, pp. 740-749 | DOI | MR | Zbl
[144] Bandwidth choice for nonparametric regression, Ann. Statist., Volume 12 (1984) no. 4, pp. 1215-1230 | DOI | MR | Zbl
[145] Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models, Bioinformatics, Volume 31 (2015) no. 9, pp. 1420-1427 | DOI
[146] Statistical modeling for functional data: non-asymptotic approaches and adaptive methods, Université Montpellier II - Sciences et Techniques du Languedoc, July (2014) (Ph. D. Thesis Available at https://tel.archives-ouvertes.fr/tel-01023919v1 )
[147] Statistical Base Jumping: A simple and fully data-driven answer to penalized model selection (2012) (Séminaire de Statistique du MAP5, February 3rd)
[148] Consistent estimation of residual variance with random forest Out-Of-Bag errors, Statistics & Probability Letters, Volume 151 (2019), pp. 49-57 | DOI | MR | Zbl
[149] Old and new parameter choice rules for discrete ill-posed problems, Numerical Algorithms, Volume 63 (2013) no. 1, pp. 65-87 | DOI | MR | Zbl
[150] A study of error variance estimation in Lasso regression, Statist. Sinica, Volume 26 (2016) no. 1, pp. 35-67 | MR | Zbl
[151] Multi-task Regression using Minimal Penalties, J. Mach. Learn. Res., Volume 13 (2012), p. 2773-2812 (electronic) http://jmlr.csail.mit.edu/papers/v13/solnon12a.html | MR | Zbl
[152] Convergence in sup-norm of least-squares estimators in regression with random design and nonparametric heteroscedastic noise (2010) (Available at http://hal.archives-ouvertes.fr/hal-00528539v2 )
[153] Estimation par Minimum de Contraste Régulier et Heuristique de Pente en Sélection de Modèles, Université de Rennes 1, October (2010) http://tel.archives-ouvertes.fr/tel-00569372/fr/ (Ph. D. Thesis Available at http://tel.archives-ouvertes.fr/tel-00569372v1 )
[154] Nonasymptotic quasi-optimality of AIC and the slope heuristics in maximum likelihood estimation of density using histogram models (2010) (Available at https://hal.archives-ouvertes.fr/hal-00512310v1 )
[155] Optimal upper and lower bounds for the true and empirical excess risks in heteroscedastic least-squares regression, Electron. J. Stat., Volume 6 (2012), pp. 579-655 | DOI | MR | Zbl
[156] Optimal model selection in heteroscedastic regression using piecewise polynomial functions, Electron. J. Stat., Volume 7 (2013), pp. 1184-1223 | DOI | MR | Zbl
[157] A concentration inequality for the excess risk in least-squares regression with random design and heteroscedastic noise, 2017 (arXiv:1702.05063v2)
[158] Estimating the dimension of a model, Ann. Statist., Volume 6 (1978) no. 2, pp. 461-464 | MR | Zbl
[159] Finding the number of clusters in a dataset: an information-theoretic approach, J. Amer. Statist. Assoc., Volume 98 (2003) no. 463, pp. 750-763 | DOI | MR | Zbl
[161] Apprentissage statistique multi-tâches, Université Pierre et Marie Curie - Paris VI, November (2013) http://hal.inria.fr/tel-00911498 (Ph. D. Thesis Available at https://hal.inria.fr/tel-00911498v1)
[162] Minimal penalties for model selection, Université Paris-Saclay, February (2017) (Ph. D. Thesis Available at https://tel.archives-ouvertes.fr/tel-01515957v1 )
[163] Variance estimation for high-dimensional regression models, J. Multivariate Anal., Volume 82 (2002) no. 1, pp. 111-133 | DOI | MR | Zbl
[164] Parametric estimation. Finite sample theory, Ann. Statist., Volume 40 (2012) no. 6, pp. 2877-2909 | DOI | MR | Zbl
[165] Penalized maximum likelihood estimation and effective dimension, Ann. Inst. Henri Poincaré Probab. Stat., Volume 53 (2017) no. 1, pp. 389-429 | DOI | MR | Zbl
[166] Estimation of the mean of a multivariate normal distribution, Ann. Statist., Volume 9 (1981) no. 6, pp. 1135-1151 | MR | Zbl
[167] Cross-validatory choice and assessment of statistical predictions, J. Roy. Statist. Soc. Ser. B, Volume 36 (1974), pp. 111-147 (With discussion by G. A. Barnard, A. C. Atkinson, L. K. Chan, A. P. Dawid, F. Downton, J. Dickey, A. G. Baker, O. Barndorff-Nielsen, D. R. Cox, S. Giesser, D. Hinkley, R. R. Hocking, and A. S. Young, and with a reply by the authors) | MR | Zbl
[168] Optimal variance estimation without estimating the mean function, Bernoulli, Volume 19 (2013) no. 5A, pp. 1839-1854 | DOI | MR | Zbl
[169] Degrees of freedom in lasso problems, Ann. Statist., Volume 40 (2012) no. 2, pp. 1198-1232 | DOI | MR | Zbl
[170] Estimating the number of clusters in a data set via the gap statistic, J. R. Stat. Soc. Ser. B Stat. Methodol., Volume 63 (2001) no. 2, pp. 411-423 | DOI | MR | Zbl
[171] On the minimal penalty for Markov order estimation, Probability Theory and Related Fields, Volume 150 (2011) no. 3, pp. 709-738 | DOI | MR | Zbl
[172] On concentration for (regularized) empirical risk minimization, Sankhya A, Volume 79 (2017) no. 2, pp. 159-200 | DOI | MR | Zbl
[173] The Degrees of Freedom of the Group Lasso, International Conference on Machine Learning Workshop (ICML), Edinburgh, United Kingdom (2012) (Available at https://hal.archives-ouvertes.fr/hal-00695292.) | HAL
[174] Data-driven neighborhood selection of a Gaussian field, Comput. Statist. Data Anal., Volume 54 (2010) no. 5, pp. 1355-1371 | DOI | MR | Zbl
[175] Catching up faster by switching sooner: a predictive approach to adaptive estimation with an application to the AIC-BIC dilemma, Journal of the Royal Statistical Society: Series B (Statistical Methodology), Volume 74 (2012) no. 3, pp. 361-417 | DOI | MR | Zbl
[176] Numerical performance of Penalized Comparison to Overfitting for multivariate kernel density estimation (2019) (Technical report arXiv:1902.01075v1)
[177] Non-convergence of the -curve regularization parameter selection method, Inverse Problems, Volume 12 (1996) no. 4, pp. 535-547 | DOI | MR | Zbl
[178] A survey of some smoothing problems and the method of generalized cross-validation for solving them, Applications of statistics (Proc. Sympos., Wright State Univ., Dayton, Ohio, 1976), North-Holland, Amsterdam, 1977, pp. 507-523 | MR | Zbl
[179] The large-sample distribution of the likelihood ratio for testing composite hypotheses, Ann. Math. Statistics, Volume 9 (1938), pp. 60-62 | JFM | Zbl
[180] Can the strengths of AIC and BIC be shared? A conflict between model indentification and regression estimation, Biometrika, Volume 92 (2005) no. 4, pp. 937-950 | MR | Zbl
[181] Statistical performances of learning algorithm : Kernel Projection Machine and Kernel Principal Component Analysis, Université Paris Sud, November (2005) http://tel.archives-ouvertes.fr/tel-00012011/fr/ (Ph. D. Thesis Available at http://tel.archives-ouvertes.fr/tel-00012011v1 )