Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix

Butucea, Cristina; Ingster, Yuri I.; Suslina, Irina A.

doi:10.1051/ps/2014017

Butucea, Cristina ^1,² ; Ingster, Yuri I. ; Suslina, Irina A.³

¹ Université Paris-Est, LAMA (UMR 8050), UPEMLV, UPEC, CNRS, 77454, Marne-la-Vallée, France
² CREST, Timbre J340 3, av. Pierre Larousse, 92240 Malakoff Cedex, France
³ St. Petersburg National Research University of Information Technologies, Mechanics and Optics, 49 Kronverkskiy pr., 197101 St. Petersburg, Russia

ESAIM: Probability and Statistics, Tome 19 (2015), pp. 115-134.

Résumé

We observe a $N \times M$ matrix of independent, identically distributed Gaussian random variables which are centered except for elements of some submatrix of size $n \times m$ where the mean is larger than some $a > 0$ . The submatrix is sparse in the sense that $n / N$ and $m / M$ tend to 0, whereas $n, m, N$ and $M$ tend to infinity. We consider the problem of selecting the random variables with significantly large mean values, as was also considered by [M. Kolar, S. Balakrishnan, A. Rinaldo and A. Singh, NIPS (2011)]. We give sufficient conditions on $a$ as a function of $n, m, N$ and $M$ and construct a uniformly consistent procedure in order to do sharp variable selection. We also prove the minimax lower bounds under necessary conditions which are complementary to the previous conditions. The critical values $a^{*}$ separating the necessary and sufficient conditions are sharp (we show exact constants), whereas [M. Kolar, S. Balakrishnan, A. Rinaldo and A. Singh, NIPS (2011)] only prove rate optimality and focus on suboptimal computationally feasible selectors. Note that rate optimality in this problem leaves out a large set of possible parameters, where we do not know whether consistent selection is possible.

MR Zbl

DOI : 10.1051/ps/2014017

Classification : 62G05, 62G20
Mots-clés : Estimation, minimax testing, large matrices, selection of sparse signal, sharp selection bounds, variable selection

Affiliations des auteurs :

Butucea, Cristina ^{1, 2} ; Ingster, Yuri I. ; Suslina, Irina A. ³

@article{PS_2015__19__115_0,
     author = {Butucea, Cristina and Ingster, Yuri I. and Suslina, Irina A.},
     title = {Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix},
     journal = {ESAIM: Probability and Statistics},
     pages = {115--134},
     publisher = {EDP-Sciences},
     volume = {19},
     year = {2015},
     doi = {10.1051/ps/2014017},
     mrnumber = {3374872},
     zbl = {1330.62169},
     language = {en},
     url = {http://www.numdam.org/articles/10.1051/ps/2014017/}
}

TY  - JOUR
AU  - Butucea, Cristina
AU  - Ingster, Yuri I.
AU  - Suslina, Irina A.
TI  - Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix
JO  - ESAIM: Probability and Statistics
PY  - 2015
SP  - 115
EP  - 134
VL  - 19
PB  - EDP-Sciences
UR  - http://www.numdam.org/articles/10.1051/ps/2014017/
DO  - 10.1051/ps/2014017
LA  - en
ID  - PS_2015__19__115_0
ER  -

%0 Journal Article
%A Butucea, Cristina
%A Ingster, Yuri I.
%A Suslina, Irina A.
%T Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix
%J ESAIM: Probability and Statistics
%D 2015
%P 115-134
%V 19
%I EDP-Sciences
%U http://www.numdam.org/articles/10.1051/ps/2014017/
%R 10.1051/ps/2014017
%G en
%F PS_2015__19__115_0

Butucea, Cristina; Ingster, Yuri I.; Suslina, Irina A. Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix. ESAIM: Probability and Statistics, Tome 19 (2015), pp. 115-134. doi : 10.1051/ps/2014017. http://www.numdam.org/articles/10.1051/ps/2014017/

Bibliographie
Cité par

F. Abramovich, Y. Benjamini, D.L. Donoho and I.M. Johnstone, Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 (2006) 559–1047. | MR | Zbl

E. Arias-Castro, E.J. Candès and A. Durand, Detection of an anomalous clusters in a network. Ann. Statist. 39 (2011) 278–304. | MR | Zbl

E. Arias-Castro, E.J. Candès and Y. Plan, Global Testing and Sparse Alternatives: ANOVA, Multiple Comparisons and the Higher Criticism. Preprint arXiv:1007.1434 (2010). | MR | Zbl

E. Arias-Castro, D.L. Donoho and X. Huo, Near-optimal detection of geometric objects by fast multiscale methods. IEEE Trans. Inform. Theory 51 (2005) 2402–2425. | MR | Zbl

E. Arias-Castro and K. Lounici, Variable selection with exponential weights and $ℓ_{0}$ -penalization. Preprint arxiv:1208.2635 (2012).

Y. Benjamini and Y. Hochberg, Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 (1995) 289–3300. | MR | Zbl

K. Bertin and G. Lecué, Selection of variables and dimension reduction in high-dimensional non-parametric regression. Electron. J. Stat. 2 (2008) 1224–1241. | MR | Zbl

P.J. Bickel, Y. Ritov and A.B. Tsybakov, Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 (2009) 1705–1732. | MR | Zbl

C. Butucea and Yu.I. Ingster, Detection of a sparse submatrix of a high-dimensional noisy matrix. Bernoulli 19 (2013) 2652–2688. | MR | Zbl

C. Butucea and G. Gayraud, Sharp detection of smooth signals in a high-dimensional sparse matrix with indirect observations. Preprint arxiv:1301.4660 (2013). | MR

T. Cai, J. Jin and M. Low, Estimation and confidence sets for sparse normal mixtures. Ann. Statist. 35 (2007) 2421–2449. | MR | Zbl

E.J. Candès and Y. Plan, Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. IEEE Trans. Inform. Theory. 57 (2011) 2342–2359 | MR | Zbl

E.J. Candès and B. Recht, Exact matrix completion via convex optimization. Found. Comput. Math. 9 (2009) 717–772. | MR | Zbl

L. Comminges and A.S. Dalalyan, Tight conditions for consistency of variable selection in the context of high dimensionality. Ann. Statist. 40 (2012) 2359–2763. | MR | Zbl

D.L. Donoho and J. Jin, Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 (2004) 962–994. | MR | Zbl

D.L. Donoho, I.M. Johnstone, C. Hoch and A. Stern, Maximum entropy and the nearly black object. With Discussion. J. Roy. Statist. Soc., Ser. B. 54 (1992) 41-81. | MR | Zbl

P. Embrechts, C. Klüppelberg and T. Mikosch, Modelling extremal events: for insurance and finance.Springer (1997). | MR | Zbl

D. Gross, Recovering low-rank matrices from few coefficients in any basis IEEE, Information theory 57 (2010) 1548–1566. | MR | Zbl

Yu.I. Ingster, Some problems of hypothesis testing leading to infinitely divisible distributions. Math. Methods Stat. 6 (1997) 47–69. | MR | Zbl

Yu.I. Ingster and N.A. Stepanova, Adaptive selection of sparse regression function components. Zapiski Nauchn. Sem. POMI ZAI (2012).

Yu.I. Ingster and I.A. Suslina, Nonparametric goodness-of-fit testing under gaussian models. Vol. 169 of Lect. Notes in Statist. Springer-Verlag, New York (2003). | MR | Zbl

Yu.I. Ingster and I.A. Suslina, On a detection of a signal of known shape in multichannel system. Zapiski Nauchn. Sem. POMI 294 (2002) 88–112, Transl. J. Math. Sci. 127 (2002) 1723–1736. | MR | Zbl

R.H. Keshavan, A. Montanari and S. Oh, Matrix completion from noisy entries. J. Mach. Learn. Res. 11 (2010) 2057–2078. | MR | Zbl

M. Kolar, S. Balakrishnan, A. Rinaldo and A. Singh, Minimax localization of structural information in large noisy matrices. NIPS (2011).

V. Koltchinskii, K. Lounici and A.B. Tsybakov, Nuclear norm penalization and optimal rates for noisy low rank matrix completion. Ann. Statist. 39 (2011) 2302–2329. | MR | Zbl

J. Lafferty and L. Wasserman, Rodeo: sparse, greedy nonparametric regression. Ann. Statist. 36 (2008) 28–63. | MR | Zbl

B. Recht, A simpler approach to matrix completion. J. Machine Learning 12 (2011) 3413–3430. | MR | Zbl

A. Rohde and A.B. Tsybakov, Estimation of high-dimensional low-rank matrices, Ann. Statist. 39 (2011) 887–930. | MR | Zbl

X. Sun and A.B. Nobel, On the maximal size of Large-Average and ANOVA-fit Submatrices in a Gaussian Random Matrix. Bernoulli 19 (2013) 275–294. | MR | Zbl

A.A. Shabalin, V.J. Weigman, C.M. Perou and A.B. Nobel, Finding Large Average Submatrices in High Dimensional Data. Ann. Appl. Statist. 3 (2009) 985–1012. | MR | Zbl

A.B. Tsybakov, Introduction to nonparametric statistics. Springer Ser. Stat. Springer, New-York (2009). | MR | Zbl

N. Verzelen, Minimax risks for sparse regressions: Ultra-high dimensional phenomenons. Electron. J. Stat. 6 (2012) 38-90. | MR | Zbl

Cité par Sources :