This paper considers inference in a linear regression model with outliers in which the number of outliers can grow with sample size while their proportion goes to 0. We propose a square-root lasso ℓ1-norm penalized estimator. We derive rates of convergence and establish asymptotic normality. Our estimator has the same asymptotic variance as the OLS estimator in the standard linear model. This enables us to build tests and confidence sets in the usual and simple manner. The proposed procedure is also computationally advantageous, it amounts to solving a convex optimization program. Overall, the suggested approach offers a practical robust alternative to the ordinary least squares estimator.
Accepté le :
Première publication :
Publié le :
DOI : 10.1051/ps/2020014
Mots-clés : Robust regression, $$1-norm penalization, unknown variance
@article{PS_2020__24_1_688_0, author = {Beyhum, Jad}, title = {Inference robust to outliers with \protect\emph{\ensuremath{\ell}}\protect\textsubscript{1}-norm penalization}, journal = {ESAIM: Probability and Statistics}, pages = {688--702}, publisher = {EDP-Sciences}, volume = {24}, year = {2020}, doi = {10.1051/ps/2020014}, mrnumber = {4170179}, zbl = {1455.62065}, language = {en}, url = {http://www.numdam.org/articles/10.1051/ps/2020014/} }
Beyhum, Jad. Inference robust to outliers with ℓ1-norm penalization. ESAIM: Probability and Statistics, Tome 24 (2020), pp. 688-702. doi : 10.1051/ps/2020014. http://www.numdam.org/articles/10.1051/ps/2020014/
[1] Sparse least trimmed squares regression for analyzing high-dimensional large data sets. Ann. Appl. Stat. 7 (2013) 226–248. | DOI | MR | Zbl
, and[2] et al. Least squares after model selection in high-dimensional sparse models. Bernoulli 19 (2013) 521–547. | DOI | MR | Zbl
, ,[3] Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika 98 (2011) 791–806. | DOI | MR | Zbl
, and ,[4] et al. Robust covariance and scatter matrix estimation under huber’s contamination model. Ann. Stat. 46 (2018) 1932–1960. | DOI | MR | Zbl
, , ,[5] Rate-optimal estimation of p-dimensional linear functionals in a sparse gaussian model. Preprint (2017). | arXiv
and[6] SOCP based variance free Dantzig selector with application to robust estimation. C. R. Math. 350 (2012) 785–788. | DOI | MR | Zbl
,[7] Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. J. R. Stat. Soc. 79 (2017) 247–265. | DOI | MR | Zbl
, and ,[8] Robust estimation and wavelet thresholding in partially linear models. Stat. Comput. 17 (2007) 293–310. | DOI | MR
,[9] Introduction to high-dimensional statistics. Chapman and Hall/CRC, Boca Raton (2014) | DOI
,[10] Robust statistics: the approach based on influence functions, Vol. 196. John Wiley & Sons, New Jersey (2011) | MR | Zbl
, , and[11] et al., Robust estimation of a location parameter. Ann. Math. Stat. 35 (1964) 73–101. | DOI | MR | Zbl
[12] et al., Robust regression through the huber’s criterion and adaptive lasso penalty. Electron. J. Stat. 5 (2011) 1015–1053. | DOI | MR | Zbl
, ,[13] et al., Regularization of case-specific parameters for robustness and efficiency. Stat. Sci. 27 (2012) 350–372. | MR | Zbl
, , ,[14] Simultaneous variable selection and outlier detection using LASSO with applications to aircraft landing data analysis. Ph.D. thesis, Rutgers University-Graduate School-New Brunswick (2012). | MR
.[15] Robust statistics: theory and methods (with R). John Wiley & Sons, New Jersey (2018) | DOI | MR | Zbl
, , and ,[16] A robust hybrid of lasso and ridge regression. Contemp. Math. 443 (2007) 59–72. | DOI | MR | Zbl
,[17] Robust regression and outlier detection, Vol. 589. John Wiley & Sons, New Jersey (2005) | MR | Zbl
and[18] Outlier detection using nonconvex penalized regression. J. Am. Stat. Assoc. 106 (2011) 626–639. | DOI | MR | Zbl
and[19] Scaled sparse linear regression. Biometrika 99 (2012) 879–898. | DOI | MR | Zbl
and .[20] High-dimensional probability: An introduction with applications in data science, Vol. 47. Cambridge University Press, Cambridge (2018) | MR
,Cité par Sources :
I thank my PhD supervisor Professor Eric Gautier for his availability and valuable help. I am also grateful to Anne Ruiz-Gazen, Jean-Pierre Florens, Thierry Magnac, Nour Meddahi, two anonymous referees and an associate editor of ESAIM: Probability & Statistics for their useful comments. I acknowledge financial support from the ERC POEMH 337665 grant.