Randomized pick-freeze for sparse Sobol indices estimation in high dimension
ESAIM: Probability and Statistics, Tome 19 (2015), pp. 725-745.

This article investigates selection of variables in high-dimension from a non-parametric regression model. In many concrete situations, we are concerned with estimating a non-parametric regression function f that may depend on a large number p of inputs variables. Unlike standard procedures, we do not assume that f belongs to a class of regular functions (Hölder, Sobolev, ...), yet we assume that f is a square-integrable function with respect to a known product measure. Furthermore, observe that, in some situations, only a small number s of the coordinates actually affects f in an additive manner. In this context, we prove that, with only 𝒪(slogp) random evaluations of f, one can find which are the relevant input variables with overwhelming probability. Our proposed method is an unconstrained 1 -minimization procedure based on the Sobol’s method. One step of this procedure relies on support recovery using 1 -minimization and thresholding. More precisely, we use a thresholded-LASSO to faithfully uncover the significant input variables. In this frame, we prove that one can relax the mutual incoherence property (known to require 𝒪(s 2 logp) observations) and still ensure faithful recovery from 𝒪(s α logp) observations for any 1α2.

DOI : 10.1051/ps/2015013
Classification : 62G08, 62G35, 65H10, 93A30, 93B35
Mots-clés : Sensitivity analysis, Sobol indices, high-dimensional statistics, LASSO, Monte-Carlo method
Castro, Yohann de 1 ; Janon, Alexandre 2

1 UniversitéParis-Sud, Laboratoire de Mathématiques d’Orsay, Bâtiment 425, Université Paris-Sud, 91405 Orsay, France
2 UniversitéParis-Sud, Laboratoire de Mathématiques d’Orsay, Bâtiment 425, Université Paris-Sud, 91405 Orsay, France
@article{PS_2015__19__725_0,
     author = {Castro, Yohann de and Janon, Alexandre},
     title = {Randomized pick-freeze for sparse {Sobol} indices estimation in high dimension},
     journal = {ESAIM: Probability and Statistics},
     pages = {725--745},
     publisher = {EDP-Sciences},
     volume = {19},
     year = {2015},
     doi = {10.1051/ps/2015013},
     zbl = {1392.62111},
     language = {en},
     url = {http://www.numdam.org/articles/10.1051/ps/2015013/}
}
TY  - JOUR
AU  - Castro, Yohann de
AU  - Janon, Alexandre
TI  - Randomized pick-freeze for sparse Sobol indices estimation in high dimension
JO  - ESAIM: Probability and Statistics
PY  - 2015
SP  - 725
EP  - 745
VL  - 19
PB  - EDP-Sciences
UR  - http://www.numdam.org/articles/10.1051/ps/2015013/
DO  - 10.1051/ps/2015013
LA  - en
ID  - PS_2015__19__725_0
ER  - 
%0 Journal Article
%A Castro, Yohann de
%A Janon, Alexandre
%T Randomized pick-freeze for sparse Sobol indices estimation in high dimension
%J ESAIM: Probability and Statistics
%D 2015
%P 725-745
%V 19
%I EDP-Sciences
%U http://www.numdam.org/articles/10.1051/ps/2015013/
%R 10.1051/ps/2015013
%G en
%F PS_2015__19__725_0
Castro, Yohann de; Janon, Alexandre. Randomized pick-freeze for sparse Sobol indices estimation in high dimension. ESAIM: Probability and Statistics, Tome 19 (2015), pp. 725-745. doi : 10.1051/ps/2015013. http://www.numdam.org/articles/10.1051/ps/2015013/

P.J. Bickel, Y. Ritov and A.B. Tsybakov, Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 37 (2009) 1705–1732. | Zbl

P.J. Bickel, Y. Ritov and A.B. Tsybakov, Simultaneous analysis of lasso and Dantzig selector. Ann. Stat. 37 (2009) 1705–1732. | Zbl

P.L. Bühlmann and S.A. van de Geer, Statistics for High-Dimensional Data. Springer (2011). | Zbl

E.J. Candès and Y. Plan, Near-ideal model selection by L1 minimization. Ann. Stat. 37 (2009) 2145–2177. | Zbl

E.J. Candes and T. Tao, The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35 (2007) 2313–2351. | Zbl

E.J. Candes, J.K. Romberg and T. Tao, Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59 (2006) 1207–1223. | Zbl

D. Chafaı, O. Guédon, G. Lecué and A. Pajor, Interactions Between Compressed Sensing, Random Matrices, and High Dimensional Geometry. Panoramas et Synthèses. SMF (2012). | Zbl

Y. de Castro, A remark on the lasso and the Dantzig selector. Stat. Probab. Lett. (2012). | Zbl

D.L. Donoho, M. Elad and V.N. Temlyakov, Stable recovery of sparse overcomplete representations in the presence of noise. Inf. Theory IEEE Trans. 52 (2006) 6–18. | Zbl

B. Efron, T. Hastie, I. Johnstone and R.J. Tibshirani, Least angle regression. Ann. Stat. 32 (2004) 407–499. | Zbl

Jianqing Fan, Yang Feng and Rui Song, Nonparametric independence screening in sparse ultra-high-dimensional additive models. J. Am. Stat. Assoc. 106 (2011). | Zbl

J.-J. Fuchs, On sparse representations in arbitrary redundant bases. Inf. Theory IEEE Trans. 50 (2004) 1341–1344. | Zbl

F. Gamboa, A. Janon, T. Klein, A. Lagnoux-Renaudie and C. Prieur, Statistical inference for Sobol pick freeze Monte Carlo method. Preprint (2013). | arXiv

R. Gray, Toeplitz and Circulant Matrices: A Review. Now Publishers Inc. (2006). | Zbl

W. Hoeffding, Probability Inequalities for Sums of Bounded Random Variables. J. Am. Statist. Assoc. 58 (1963) 13–30. | Zbl

A. Janon, T. Klein, A. Lagnoux, M. Nodet and C. Prieur, Asymptotic normality and efficiency of two Sobol index estimators. Preprint available at (2012). | HAL | Zbl

A. Juditsky and A. Nemirovski, Accuracy Guarantees for-Recovery. Inf. Theory IEEE Trans. 57 (2011) 7818–7839. | Zbl

R. Liu and A.B. Owen, Estimating Mean Dimensionality. Department of Statistics, Stanford University (2003).

K. Lounici, Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Stat. 2 (2008) 90–102. | Zbl

H. Monod, C. Naud and D. Makowski, Uncertainty and sensitivity analysis for crop models. In Chap. 4. Working with Dynamic Crop Models: Evaluation, Analysis, Parameterization, and Applications. Edited by D. Wallach, D. Makowski and J.W. Jones. Elsevier (2006) 55–99.

M.D. Morris, Factorial sampling plans for preliminary computational experiments. Technometrics 33 (1991) 161–174.

A. Saltelli, M. Ratto, T. Andres, F. Campolongo, J. Cariboni, D. Gatelli, M. Saisana and S. Tarantola, Global Sensitivity Analysis: The Primer. Wiley Online Library (2008). | Zbl

I.M. Sobol, Sensitivity estimates for nonlinear mathematical models. Math. Model. Comput. Experiment 1 (1993) 407–414. | Zbl

I.M. Sobol, Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Math. Comput. Simul. 55 (2001) 271–280. | Zbl

S. Tarantola, et al., Estimating the approximation error when fixing unessential factors in global sensitivity analysis. Reliab. Eng. Syst. Safety 92 (2007) 957–960.

R.J. Tibshirani, Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Methodological (1996) 267–288. | Zbl

J.-Y. Tissot and C. Prieur, Bias correction for the estimation of sensitivity indices based on random balance designs. Reliab. Eng. Syst. Safety 107 (2012) 205–213.

J.-Y. Tissot and C. Prieur, Estimating Sobol’Indices Combining Monte Carlo Estimators and Latin Hypercube Sampling (2012).

J.A. Tropp, Just relax: Convex programming methods for identifying sparse signals in noise. Inf. Theory IEEE Trans. 52 (2006) 1030–1051. | Zbl

S.A. Van De Geer, and P. Bühlmann, On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 (2009) 1360–1392. | Zbl

L. Welch, Lower bounds on the maximum cross correlation of signals (Corresp.). Inf. Theory, IEEE Trans. 20 (1974) 397–399. | Zbl

P. Zhao and B. Yu, On model selection consistency of Lasso. J. Mach. Learn. Res. 7 (2006) 2541–2563. | Zbl

Cité par Sources :