Inference robust to outliers with <i>ℓ</i><sub>1</sub>-norm penalization

Beyhum, Jad

doi:10.1051/ps/2020014

Inference robust to outliers with ℓ₁-norm penalization

Beyhum, Jad

ESAIM: Probability and Statistics, Tome 24 (2020), pp. 688-702.

Suite au passage du modèle économique de la revue en S20, le texte intégral des articles des années 2019 et 2020 est accessible uniquement sur le site de l'éditeur et est réservé aux abonnés.

Résumé

This paper considers inference in a linear regression model with outliers in which the number of outliers can grow with sample size while their proportion goes to 0. We propose a square-root lasso ℓ₁-norm penalized estimator. We derive rates of convergence and establish asymptotic normality. Our estimator has the same asymptotic variance as the OLS estimator in the standard linear model. This enables us to build tests and confidence sets in the usual and simple manner. The proposed procedure is also computationally advantageous, it amounts to solving a convex optimization program. Overall, the suggested approach offers a practical robust alternative to the ordinary least squares estimator.

Reçu le : 2019-08-09
Accepté le : 2020-04-02
Première publication : 2020-11-12
Publié le : 2020-11-04

MR Zbl

DOI : 10.1051/ps/2020014

Classification : 62F35, 62J05, 62J07
Mots-clés : Robust regression, $$1-norm penalization, unknown variance

@article{PS_2020__24_1_688_0,
     author = {Beyhum, Jad},
     title = {Inference robust to outliers with \protect\emph{\ensuremath{\ell}}\protect\textsubscript{1}-norm penalization},
     journal = {ESAIM: Probability and Statistics},
     pages = {688--702},
     publisher = {EDP-Sciences},
     volume = {24},
     year = {2020},
     doi = {10.1051/ps/2020014},
     mrnumber = {4170179},
     zbl = {1455.62065},
     language = {en},
     url = {https://www.numdam.org/articles/10.1051/ps/2020014/}
}

TY  - JOUR
AU  - Beyhum, Jad
TI  - Inference robust to outliers with ℓ1-norm penalization
JO  - ESAIM: Probability and Statistics
PY  - 2020
SP  - 688
EP  - 702
VL  - 24
PB  - EDP-Sciences
UR  - https://www.numdam.org/articles/10.1051/ps/2020014/
DO  - 10.1051/ps/2020014
LA  - en
ID  - PS_2020__24_1_688_0
ER  -

%0 Journal Article
%A Beyhum, Jad
%T Inference robust to outliers with ℓ1-norm penalization
%J ESAIM: Probability and Statistics
%D 2020
%P 688-702
%V 24
%I EDP-Sciences
%U https://www.numdam.org/articles/10.1051/ps/2020014/
%R 10.1051/ps/2020014
%G en
%F PS_2020__24_1_688_0

Beyhum, Jad. Inference robust to outliers with ℓ₁-norm penalization. ESAIM: Probability and Statistics, Tome 24 (2020), pp. 688-702. doi : 10.1051/ps/2020014. https://www.numdam.org/articles/10.1051/ps/2020014/

Bibliographie
Cité par

[1] A. Alfons, C. Croux and S. Gelper. Sparse least trimmed squares regression for analyzing high-dimensional large data sets. Ann. Appl. Stat. 7 (2013) 226–248. | DOI | MR | Zbl

[2] A. Belloni, V. Chernozhukov, et al. Least squares after model selection in high-dimensional sparse models. Bernoulli 19 (2013) 521–547. | DOI | MR | Zbl

[3] A. Belloni, V. Chernozhukov and L. Wang, Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika 98 (2011) 791–806. | DOI | MR | Zbl

[4] M. Chen, C. Gao, Z. Ren, et al. Robust covariance and scatter matrix estimation under huber’s contamination model. Ann. Stat. 46 (2018) 1932–1960. | DOI | MR | Zbl

[5] O. Collier and A.S. Dalalyan. Rate-optimal estimation of p-dimensional linear functionals in a sparse gaussian model. Preprint (2017). | arXiv

[6] A.S. Dalalyan, SOCP based variance free Dantzig selector with application to robust estimation. C. R. Math. 350 (2012) 785–788. | DOI | MR | Zbl

[7] J. Fan, Q. Li and Y. Wang, Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. J. R. Stat. Soc. 79 (2017) 247–265. | DOI | MR | Zbl

[8] I. Gannaz, Robust estimation and wavelet thresholding in partially linear models. Stat. Comput. 17 (2007) 293–310. | DOI | MR

[9] C. Giraud, Introduction to high-dimensional statistics. Chapman and Hall/CRC, Boca Raton (2014) | DOI

[10] F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw and W.A. Stahel. Robust statistics: the approach based on influence functions, Vol. 196. John Wiley & Sons, New Jersey (2011) | MR | Zbl

[11] P.J. Huber et al., Robust estimation of a location parameter. Ann. Math. Stat. 35 (1964) 73–101. | DOI | MR | Zbl

[12] S. Lambert-Lacroix, L. Zwald, et al., Robust regression through the huber’s criterion and adaptive lasso penalty. Electron. J. Stat. 5 (2011) 1015–1053. | DOI | MR | Zbl

[13] Y. Lee, S.N. Maceachern, Y. Jung, et al., Regularization of case-specific parameters for robustness and efficiency. Stat. Sci. 27 (2012) 350–372. | MR | Zbl

[14] W. Li. Simultaneous variable selection and outlier detection using LASSO with applications to aircraft landing data analysis. Ph.D. thesis, Rutgers University-Graduate School-New Brunswick (2012). | MR

[15] R.A. Maronna, R.D. Martin, V.J. Yohai and M. Salibián-Barrera, Robust statistics: theory and methods (with R). John Wiley & Sons, New Jersey (2018) | DOI | MR | Zbl

[16] A.B. Owen, A robust hybrid of lasso and ridge regression. Contemp. Math. 443 (2007) 59–72. | DOI | MR | Zbl

[17] P.J. Rousseeuw and A.M. Leroy. Robust regression and outlier detection, Vol. 589. John Wiley & Sons, New Jersey (2005) | MR | Zbl

[18] Y. She and A.B. Owen. Outlier detection using nonconvex penalized regression. J. Am. Stat. Assoc. 106 (2011) 626–639. | DOI | MR | Zbl

[19] T. Sun and C.-H. Zhang. Scaled sparse linear regression. Biometrika 99 (2012) 879–898. | DOI | MR | Zbl

[20] R. Vershynin, High-dimensional probability: An introduction with applications in data science, Vol. 47. Cambridge University Press, Cambridge (2018) | MR

Cité par Sources :

I thank my PhD supervisor Professor Eric Gautier for his availability and valuable help. I am also grateful to Anne Ruiz-Gazen, Jean-Pierre Florens, Thierry Magnac, Nour Meddahi, two anonymous referees and an associate editor of ESAIM: Probability & Statistics for their useful comments. I acknowledge financial support from the ERC POEMH 337665 grant.