[Comparaison d’approches de type SIR (régression inverse par tranches) pour les cas sous-déterminés ()]
Parmi les méthodes pour analyser des données de grande dimension, la régression inverse par tranches (sliced inverse regression ou SIR en anglais) est particulièrement intéressante si des relations non-linéaires existent entre la variable à expliquer et des combinaisons linéaires des prédicteurs (appelées indices). Lorsque la dimension de ces prédicteurs est plus grande que le nombre d’observations, les versions classiques de SIR ne peuvent plus être utilisées. Des améliorations diverses comme RSIR et SR-SIR (pour regularized SIR et sparse ridge SIR) ont été proposées dans la litérature pour résoudre ce problème, estimer les paramètres du modèle sous-jacent et enfin réaliser une sélection des prédicteurs les plus pertinents (en un certain sens). Dans cet article, nous introduisons deux nouvelles procédures d’estimation basées respectivement sur l’algorithme QZ et sur l’inverse généralisé de Moore-Penrose. Nous décrivons également une méthode qui repose sur un critère de proximité entre des sous-modèles et le modèle initial pour sélectionner les prédicteurs les plus pertinents. Ces approches sont ensuite comparées avec RSIR et SR-SIR par le biais de simulations. Enfin, nous illustrons, sur un jeu de données génétiques, l’intérêt de l’approche SIR-QZ proposée et de l’algorithme de sélection de prédicteurs associé pour trouver des marqueurs liés á l’expression d’un gène. De tels marqueurs sont appelés expression quantitative trait loci ou eQTL.
Among methods to analyze high-dimensional data, the sliced inverse regression (SIR) is of particular interest for non-linear relations between the dependent variable and some indices of the covariate. When the dimension of the covariate is greater than the number of observations, classical versions of SIR cannot be applied. Various upgrades were then proposed to tackle this issue such as regularized SIR (RSIR) and sparse ridge SIR (SR-SIR), to estimate the parameters of the underlying model and to select variables of interest. In this paper, we introduce two new estimation methods respectively based on the QZ algorithm and on the Moore-Penrose pseudo-inverse. We also describe a new selection procedure of the most relevant components of the covariate that relies on a proximity criterion between submodels and the initial one. These approaches are compared with RSIR and SR-SIR in a simulation study. Finally we applied SIR-QZ and the associated selection procedure to a genetic dataset in order to find markers that are linked to the expression of a gene. These markers are called expression quantitative trait loci (eQTL).
Mot clés : grande dimension, regression semi-paramétrique, réduction de dimension, sparsité
@article{JSFS_2014__155_2_72_0, author = {Coudret, Rapha\"el and Liquet, Benoit and Saracco, J\'er\^ome}, title = {Comparison of sliced inverse regression approaches for underdetermined cases}, journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique}, pages = {72--96}, publisher = {Soci\'et\'e fran\c{c}aise de statistique}, volume = {155}, number = {2}, year = {2014}, zbl = {1316.62068}, language = {en}, url = {http://www.numdam.org/item/JSFS_2014__155_2_72_0/} }
TY - JOUR AU - Coudret, Raphaël AU - Liquet, Benoit AU - Saracco, Jérôme TI - Comparison of sliced inverse regression approaches for underdetermined cases JO - Journal de la société française de statistique PY - 2014 SP - 72 EP - 96 VL - 155 IS - 2 PB - Société française de statistique UR - http://www.numdam.org/item/JSFS_2014__155_2_72_0/ LA - en ID - JSFS_2014__155_2_72_0 ER -
%0 Journal Article %A Coudret, Raphaël %A Liquet, Benoit %A Saracco, Jérôme %T Comparison of sliced inverse regression approaches for underdetermined cases %J Journal de la société française de statistique %D 2014 %P 72-96 %V 155 %N 2 %I Société française de statistique %U http://www.numdam.org/item/JSFS_2014__155_2_72_0/ %G en %F JSFS_2014__155_2_72_0
Coudret, Raphaël; Liquet, Benoit; Saracco, Jérôme. Comparison of sliced inverse regression approaches for underdetermined cases. Journal de la société française de statistique, Tome 155 (2014) no. 2, pp. 72-96. http://www.numdam.org/item/JSFS_2014__155_2_72_0/
[1] Dimension reduction in functional regression with applications., Comput. Stat. Data Anal., Volume 50 (9) (2006), pp. 2422-2446 | Zbl
[2] Optimal quantization applied to sliced inverse regression., Journal of Statistical Planning and Inference, Volume 142 (2012) no. 2, pp. 481-492 | Zbl
[3] Sliced Inverse Regression (SIR): an appraisal of small sample alternatives to slicing., Computational Statistics, Volume 12 (1997), pp. 109-130 | Zbl
[4] Estimating the structural dimension of regressions via parametric inverse regression., Journal of the Royal Statistical Society. Series B. Statistical Methodology, Volume 63 (2001), pp. 393-410 | Zbl
[5] ESS: a C objected-oriented algorithm for Bayesian stochastic search model exploration, Bioinformatics, Volume 27 (2011) no. 4, pp. 587-588
[6] Exploration statistique multidimensionnelle (2012) http://www.math.univ-toulouse.fr/~besse/pub/Explo_stat.pdf ( http://www.math.univ-toulouse.fr/~besse/pub/Explo_stat.pdf )
[7] A chi-square test for dimensionality for non-Gaussian data., Journal of Multivariate Analysis, Volume 88 (2004), pp. 109-117 | Zbl
[8] A Note on Sliced Inverse Regression with Regularizations, Biometrics, Volume 64 (2008), pp. 982-986 | Zbl
[9] A new approach of recursive and non recursive SIR methods., Journal of the Korean Statistical Society, Volume 41 (2011), pp. 17-36 | Zbl
[10] Evolutionary Stochastic Search for Bayesian model exploration, Bayesian Analysis, Volume 5 (2010) no. 3, pp. 583-618 | Zbl
[11] Dimension reduction via parametric inverse regression, Dodge, Y. (ed), -Statistical Procedures and Related Topics, Institute of Mathematical Statistics, Hayward (1997), pp. 215-228 | Zbl
[12] Measurement error regression with unknown link: dimension reduction and data visualization, Journal of the American Statistical Association, Volume 87 (1992) no. 420, pp. 1040-1050 | Zbl
[13] Can SIR be as popular as multiple linear regression?, Statistica Sinica, Volume 8 (1998) no. 2, pp. 289-316 | Zbl
[14] Testing predictor contributions in sufficient dimension reduction, Annals of Statistics, Volume 32 (2004) no. 3, pp. 1062-1092 | Zbl
[15] Principal Hessian directions revisited (with discussion)., Journal of the American Statistical Association, Volume 93 (1998), pp. 84-100 | Zbl
[16] Slicing regression: a link-free regression method., Annals of Statistics, Volume 19 (1991), pp. 505-530 | Zbl
[17] Determining the dimension in Sliced Inverse Regression and related methods., Journal of the American Statistical Association, Volume 93 (1998), pp. 132-140 | Zbl
[18] Reply to the paper : “A note on Smoothed Functional Inverse Regression”., Statistica Sinica, Volume 17 (2007), pp. 1683-1687 | Zbl
[19] Matrix computations, Johns Hopkins Series in the Mathematical Sciences, 3, Johns Hopkins University Press, Baltimore, MD, 1983, xvi+476 pages | MR | Zbl
[20] Sliced inverse regression, Schimek, M. G. (ed), Smoothing and regression. Approaches, computation and application., Wiley, Chichester (2000), pp. 497-512 | Zbl
[21] Sparse canonical methods for biological data integration: application to a cross-platform study, BMC Bioinformatics, Volume 10 (2009) no. 1 http://www.biomedcentral.com/1471-2105/10/34 | DOI
[22] Sliced inverse regression for dimension reduction, with discussion., Journal of the American Statistical Association, Volume 86 (1991), pp. 316-342 | Zbl
[23] Application of the bootstrap approach to the choice of dimension and the parameter in the SIR method., Communications in statistics - Simulation and Computation, Volume 37 (2008) no. 6, pp. 1198-1218 | Zbl
[24] Sliced inverse regression with regularizations., Biometrics, Volume 64 (2008) no. 1, pp. 124-131 | Zbl
[25] Asymptotics for sliced average variance estimation., Annals of Statistics, Volume 35 (2007), pp. 41-69 | Zbl
[26] An algorithm for generalized matrix eigenvalue problems, SIAM Journal on Numerical Analysis, Volume 10 (1973) no. 2, pp. 241-256 | Zbl
[27] New Insights into the Genetic Control of Gene Expression using a Bayesian Multi-tissue Approach, PLoS Comput Biol, Volume 6 (2010) no. 4
[28] Pooled slicing methods versus slicing methods., Communications in statistics - Simulation and Computation, Volume 30 (2001), pp. 489-511 | Zbl
[29] An asymptotic theory for Sliced Inverse Regression., Communications in statistics - Theory and methods, Volume 26 (1997), pp. 2141-2171 | Zbl
[30] Determining the dimensionality in sliced inverse regression., Journal of the American Statistical Association, Volume 89 (1994), pp. 141-148 | Zbl
[31] Regression Shrinkage and Selection via the Lasso, Journal of the Royal Statistical Society. Series B (Methodological), Volume 58 (1996) no. 1, pp. 267-288 | Zbl
[32] Asymptotic distributions for dimension reduction in the SIR-II method., Statistica Sinica, Volume 15 (2007), pp. 1069-1079 | Zbl
[33] Asymptotics for kernel estimate of sliced inverse regression., Annals of Statistics, Volume 24 (1996), pp. 1053-1068 | Zbl
[34] On hybrid methods of inverse regression-based algorithms., Computational Statistics, Volume 51 (2007), pp. 2621-2635 | Zbl
[35] On kernel method for sliced average variance estimation., Journal of Multivariate Analysis, Volume 98 (2007), pp. 970-991 | Zbl
[36] RSIR: regularized sliced inverse regression for motif discovery, Bioinformatics, Volume 21 (2005) no. 22, pp. 4169-4175