Décrire, prendre en compte, imputer et évaluer les valeurs manquantes dans les études statistiques : une revue des approches existantes

Imbert, Alyssa; Vialaneix, Nathalie

Décrire, prendre en compte, imputer et évaluer les valeurs manquantes dans les études statistiques : une revue des approches existantes

Imbert, Alyssa ; Vialaneix, Nathalie

Journal de la société française de statistique, Tome 159 (2018) no. 2, pp. 1-55.

Résumé
Abstract

Le problème des données manquantes est intimement lié à l’analyse statistique, au fait de collecter et préparer les données pour l’analyse statistique. Nous proposons ici une revue des approches permettant de diagnostiquer et d’imputer les données manquantes, ainsi que de contrôler les conséquences de l’imputation dans les analyses statistiques. Nous décrivons également les implémentations disponibles, dans des packages R, des diverses approches décrites.

Missing data is strongly connected to statistics that is concerned with the collect and pre-processing of data. In this article, we review the different methods that can be used to diagnose and impute missing data. We also present approaches aiming at evaluating the impact of imputation on subsequent analyses. Finally, we describe available implementations, in R packages, of the presented methods.

Zbl

Mot clés : données manquantes, imputation
Keywords: missing data, imputation

@article{JSFS_2018__159_2_1_0,
     author = {Imbert, Alyssa and Vialaneix, Nathalie},
     title = {D\'ecrire, prendre en compte, imputer et \'evaluer les valeurs manquantes dans les \'etudes statistiques~: une revue des approches existantes},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {1--55},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {159},
     number = {2},
     year = {2018},
     zbl = {1406.62104},
     language = {fr},
     url = {http://www.numdam.org/item/JSFS_2018__159_2_1_0/}
}

TY  - JOUR
AU  - Imbert, Alyssa
AU  - Vialaneix, Nathalie
TI  - Décrire, prendre en compte, imputer et évaluer les valeurs manquantes dans les études statistiques : une revue des approches existantes
JO  - Journal de la société française de statistique
PY  - 2018
SP  - 1
EP  - 55
VL  - 159
IS  - 2
PB  - Société française de statistique
UR  - http://www.numdam.org/item/JSFS_2018__159_2_1_0/
LA  - fr
ID  - JSFS_2018__159_2_1_0
ER  -

%0 Journal Article
%A Imbert, Alyssa
%A Vialaneix, Nathalie
%T Décrire, prendre en compte, imputer et évaluer les valeurs manquantes dans les études statistiques : une revue des approches existantes
%J Journal de la société française de statistique
%D 2018
%P 1-55
%V 159
%N 2
%I Société française de statistique
%U http://www.numdam.org/item/JSFS_2018__159_2_1_0/
%G fr
%F JSFS_2018__159_2_1_0

Imbert, Alyssa; Vialaneix, Nathalie. Décrire, prendre en compte, imputer et évaluer les valeurs manquantes dans les études statistiques : une revue des approches existantes. Journal de la société française de statistique, Tome 159 (2018) no. 2, pp. 1-55. http://www.numdam.org/item/JSFS_2018__159_2_1_0/

Bibliographie
Cité par

[Abayomi et al., 2008] Abayomi, K., Gelman, A. et Levy, M. (2008). Diagnostics for multivariate imputations. Journal of the Royal Statistical Society, Series C (Applied Statistics), 57(3) :273–291. | Zbl

[Albert et Follmann, 2000] Albert, P. et Follmann, D. (2000). Modeling repeated count data subject to informative dropout. Biometrics, 56(3) :667–677. | Zbl

[Allison, 2001] Allison, P. (2001). Missing Data. Quantitative Applications in the Social Sciences. Sage Publications, Thousand Oaks, CA, USA. | Zbl

[Andridge et Little, 2010] Andridge, R. et Little, R. (2010). A review of hot deck imputation for survey non-response. International Statistical Review, 78(1) :40–64.

[Audigier et al., 2015] Audigier, V., Husson, F. et Josse, J. (2015). Multiple imputation for continuous variables using a Bayesian principal component analysis. Journal of Statistical Computation and Simulation, 86(11) :2140–2156.

[Audigier et al., 2016a] Audigier, V., Husson, F. et Josse, J. (2016a). MIMCA : multiple imputation for categorical variables with multiple correspondence analysis. Statistics and Computing, 27(2) :1–18.

[Audigier et al., 2016b] Audigier, V., Husson, F. et Josse, J. (2016b). A principal component method to impute missing values for mixed data. Advances in Data Analysis and Classification, 10(1) :5–26.

[Baraldi et Enders, 2010] Baraldi, A. et Enders, C. (2010). An introduction to modern missing data analysis. Journal of School Psychology, 48(1) :5–37.

[Baretta et Santaniello, 2016] Baretta, L. et Santaniello, A. (2016). Nearest neighbor imputation algorithms : a critical evaluation. BMC Medical Informatics and Decision Making, 16(Supp. 3) :74.

[Breiman, 2001] Breiman, L. (2001). Random forests. Machine Learning, 45(1) :5–32. | Zbl

[Breiman et al., 1984] Breiman, L., Friedman, J., Olsen, R. et Stone, C. (1984). Classification and Regression Trees. Chapman and Hall, Boca Raton, Florida, USA.

[Burns, 1990] Burns, R. (1990). Multiple and replicate item imputation in a complex sample survey. In of the Census, B., éditeur : Proceedings of the 6th Annual Research Conference, pages 655–665, Washington DC, USA.

[Candès et al., 2013] Candès, E., Sing-Long, C. et Trzasko, J. (2013). Unbiased risk estimates for singular value thresholding and spectral estimators. IEEE Transactions on Signal Processing, 61(19) :4643–4657.

[Carpenter et Kenward, 2013] Carpenter, J. et Kenward, M. (2013). Multiple Imputation and its Application. Wiley.

[Caussinus, 1986] Caussinus, H. (1986). Models and uses of principal component analysis (with discussion). In de Leeuw, J., Heiser, W., Meulman, J. et Critchley, F., éditeurs : Multidimensional Data Analysis. Proceedings of a Workshop, Pembroke College, Cambridge University, England, pages 149–178, Leiden, The Netherlands. DSWO Press.

[Chen et Shao, 2000] Chen, J. et Shao, J. (2000). Nearest neighbor imputation for survey data. Journal of Official Statistics, 16(2) :113–131.

[Chessel et al., 2004] Chessel, D., Dufour, A. et Thioulouse, J. (2004). The ade4 package – I : one-table methods. R News, 4(1) :5–10.

[Cleveland et Devlin, 1988] Cleveland, W. et Devlin, S. (1988). Locally weighted regression : an approach to regression analysis by local fitting. Journal of the American Statistical Association, 83(403) :596–610. | Zbl

[Cohen et al., 1985] Cohen, J., Cohen, P., West, S. et Aiken, L. (1985). Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, Mahwah, NJ, USA, 2nd édition.

[Collins et al., 2007] Collins, L. M., Schafer, J. L. et Chi-Ming, K. (2007). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6(4) :330–351.

[Cook et Swayne, 2007] Cook, D. et Swayne, D. (2007). Interactive and Dynamic Graphics for Data Analysis. Use R ! Springer-Verlag, New York, NY, USA.

[Cranmer et Gill, 2012] Cranmer, S. et Gill, J. (2012). We have to be discrete about this : a non-parametric imputation technique for missing categorical data. British Journal of Political Science, 43 :425–449.

[Crookston et Finley, 2008] Crookston, N. et Finley, A. (2008). yaImpute : an R package for kNN imputation. Journal of Statistical Software, 23 :10.

[Dempster et al., 1977] Dempster, A., Laird, N. et Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 39(1) :1–38. | Zbl

[Diggle et Kenward, 1994] Diggle, P. et Kenward, M. (1994). Informative drop-out in longitudinal data analysis. Journal of the Royal Statistical Society, Series C (Applied Statistics), 43(1) :49–93. | Zbl

[Ding et Simonoff, 2010] Ding, Y. et Simonoff, J. (2010). An investigation of missing data methods for classification trees applied to binary response data. Journal of Machine Learning Research, 11 :131–170. | Zbl

[Dong et Peng, 2013] Dong, Y. et Peng, C.-Y. J. (2013). Principled missing data methods for researchers. SpringerPlus, 2 :222.

[Enders, 2001] Enders, C. (2001). A primer on maximum likelihood algorithms available for use with missing data. Structural Equation Modeling, 8(1) :128–141.

[Enders, 2010] Enders, C. (2010). Applied Missing Data Analysis. Guilford Press.

[Escofier et Pagès, 1994] Escofier, B. et Pagès, J. (1994). Multiple factor analysis (AFMULT package). Computational Statistics and Data Analysis, 18(1) :121–140. | Zbl

[Escoufier, 1973] Escoufier, Y. (1973). Le traitement des variables vectorielles. Biometrics, 29(4) :751–760.

[Fay, 1996] Fay, R. (1996). Alternative paradigms for the analysis of imputed survey data. Journal of the American Statistical Association, 91(434) :490–498. | Zbl

[Fellegi et Holt, 1976] Fellegi, I. et Holt, D. (1976). A systematic approach to automatic edit and imputation. Journal of the American Statistical Association, 71(353) :17–35.

[Ferrari et al., 2011] Ferrari, P. A., Annoni, P., Barbiero, A. et Manzi, G. (2011). An imputation method for categorical variables with application to nonlinear principal component analysis. Computational Statistics & Data Analysis, 55(7) :2410–2420.

[Finkbeiner, 1979] Finkbeiner, C. (1979). Estimation for the multiple factor model when data are missing. Psychometrika, 44(4) :409–420. | Zbl

[Follmann et Wu, 1995] Follmann, D. et Wu, M. (1995). An approximate generalized linear model with random effects for informative missing data. Biometrics, 51(1) :151–168. | Zbl

[Friedman, 1977] Friedman, J. (1977). A recursive partitioning decision rule for nonparametric classification. IEEE Transactions on Computers, C-26(4) :404–408. | Zbl

[Gad et Darwish, 2013] Gad, A. et Darwish, N. (2013). A shared parameter model for longitudinal data with missing values. American Journal of Applied Mathematics and Statistics, 1(2) :30–35.

[Gelman et al., 2013] Gelman, A., Carlin, J., Stern, H. et Rubin, D. (2013). Bayesian Data Analysis. Chapman and Hall/CRC, Boca Raton, FL, USA, 3rd edition édition.

[Gelman et Hill, 2007] Gelman, A. et Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierachical Models. Cambridge University Press, New York, NY, USA.

[Gower, 1971] Gower, J. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27(4) :857–874.

[Graham, 2009] Graham, J. (2009). Missing data analysis : making it work in the real world. Annual Review of Psychology, 60 :549–576.

[Graham et al., 2007] Graham, J. W., Olchowski, A. E. et Gilreath, T. E. (2007). How many imputations are really needed ? some practical clarifications of multiple imputation theory. Prevention Science, 8(3) :206–213.

[Heckman, 1976] Heckman, J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement, 5(4) :475–492.

[Heckman, 1979] Heckman, J. (1979). Sample selection bias as a specification error. Econometrica, 47(1) :153–161. | Zbl

[Hocking, 1976] Hocking, R. (1976). The analysis and selection of variables in linear regression. Biometrics, 32(1) :1–49. | Zbl

[Hoerl et Kennard, 1970] Hoerl, A. et Kennard, R. (1970). Ridge regression : biased estimation for nonorthogonal problems. Technometrics, 12(1) :55–67. | Zbl

[Hogan et Laird, 1997] Hogan, J. et Laird, N. (1997). Mixture models for the joint distribution of repeated measures and event times. Statistics in Medecine, 16(1-3) :239–257.

[Honaker et al., 2011] Honaker, J., King, G. et Blackwell, M. (2011). Amelia II : a program for missing data. Journal of Statistical Software, 45(7).

[Hubert et Ronchetti, 2009] Hubert, P. et Ronchetti, E. (2009). Robust Statistics. Wiley Series in Probability and Statistics. Wiley, Hoboken, NJ, USA. | Zbl

[Huisman, 2000] Huisman, M. (2000). Imputation of missing item responses : some simple techniques. Quality & Quantity, 34(4) :331–351.

[Ilin et Raiko, 2010] Ilin, A. et Raiko, T. (2010). Practical approaches to Principal Component Analysis in the presence of missing values. Journal of Machine Learning Research, 11 :1957–2000. | Zbl

[Imbert et al., 2018] Imbert, A., Valsesia, A., Le Gall, C., Armenise, C., Lefebvre, G., Gourraud, P., Viguerie, N. et Villa-Vialaneix, N. (2018). Multiple hot-deck imputation for network inference from RNA sequencing data. Bioinformatics, 34(10) :1726–1732.

[Jamshidian et Jalal, 2010] Jamshidian, M. et Jalal, S. (2010). Tests of homoscedasticity, normality, and missing completely at random for incomplete multivariate data. Psychometrika, 75(4) :649–674. | Zbl

[Jamshidian et al., 2014] Jamshidian, M., Jalal, S. et Jansen, C. (2014). MissMech : an R package for testing homoscedasticity, multivariate normality, and missing completely at random (MCAR). Journal of Statistical Software, 56(6) :1–31.

[Joenssen et Bankhofer, 2012] Joenssen, D. et Bankhofer, U. (2012). Donor limited hot deck imputation : effect on parameter estimation. Journal of Theoretical and Applied Computer Science, 6(3) :58–70.

[Jönsson et Wohlin, 2004] Jönsson, P. et Wohlin, C. (2004). An evaluation of k-nearest neighbour imputation using likert data. In Proceedings of the 10th International Symposium on Software Metrics, pages 1530–1435, Chicago, IL, USA. IEEE.

[Josse et al., 2012] Josse, J., Chavent, M., Liquet, B. et Husson, F. (2012). Handling missing values with regularized iterative multiple correspondance analysis. Journal of Classification, 29(1) :91–116.

[Josse et Husson, 2012] Josse, J. et Husson, F. (2012). Handling missing values in exploratory multivariate data analysis methods. Journal de la Société Française de Statistique, 153(2) :79–99.

[Josse et al., 2009] Josse, J., Husson, F. et Pagès, J. (2009). Gestion des données manquantes en Analyse en Composantes Principales. Journal de la Société Française de Statistique, 150(2) :28–51.

[Josse et al., 2011] Josse, J., Pagès, J. et Husson, F. (2011). Multiple imputation in principal component analysis. Advances in Data Analysis and Classification, 5(3) :231–246. | Zbl

[Kaiser, 2014] Kaiser, J. (2014). Dealing with missing values in data. Journal of Systems Integration, 5(1) :42–51.

[Kalton et Kasprzyk, 1986] Kalton, G. et Kasprzyk, D. (1986). The treatment of missing survey data. Survey Methodology, 12(1) :1–16.

[Kiers, 1997] Kiers, H. (1997). Weighted least squares fitting using ordinary least squares algorithms. Psychometrika, 62(2) :251–266. | Zbl

[Kohn et Ansley, 1986] Kohn, R. et Ansley, C. F. (1986). Estimation, prediction, and interpolation for ARIMA models with missing data. Journal of the American Statistical Association, 81(395) :751–761. | Zbl

[Kowarik et Templ, 2016] Kowarik, A. et Templ, M. (2016). Imputation with the R package VIM. Journal of Statistical Software, 74(7) :1–16.

[Lavit et al., 1994] Lavit, C., Escoufier, Y., Sabatier, R. et Traissac, P. (1994). The ACT (STATIS method). Computational Statistics and Data Analysis, 18(1) :97–119. | Zbl

[Lê Cao et al., 2009] Lê Cao, K., González, I. et Déjean, S. (2009). *****Omics : an R package to unravel relationships between two omics data sets. Bioinformatics, 25(21) :2855–2856.

[Little, 1988] Little, R. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association, 83(404) :1198–1202.

[Little, 1993] Little, R. (1993). Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association, 88(421) :125–134. | Zbl

[Little, 1995] Little, R. (1995). Modeling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association, 90(431) :1112–1121. | Zbl

[Little et Rubin, 2002] Little, R. et Rubin, D. (2002). Statistical Analysis with Missing Data. Wiley.

[Little, 1992] Little, R. J. (1992). Regression with missing X’s : a review. Journal of the American Statistical Association, 87(420) :1227–1237.

[Meng et Rubin, 1993] Meng, S. et Rubin, D. (1993). Maximum likelihood estimation via the ECM algorithm : a general framework. Biometrika, 80(2) :267–278. | Zbl

[Meng et Rubin, 1991] Meng, X. et Rubin, D. (1991). Using EM to obtain asymptotic variance-covariance matrices : the SEM algorithm. Journal of the American Statistical Association, 86(416) :899–909.

[Moeur et Stage, 1995] Moeur, M. et Stage, A. (1995). Most similar neighbor : an improved sampling inference procedure for natural resources planning. Forest Science, 42(1) :337–359.

[Molenberghs et al., 1998] Molenberghs, G., Michiels, B., Kenward, M. et Diggle, P. (1998). Monotone missing data and pattern-mixture models. Statistica Neerlandica, 52(2) :153–161. | Zbl

[Molnar et al., 2008] Molnar, F., Hutton, B. et Fergusson, D. (2008). Does analysis using “last observation carried forward” introduce bias in dementia research ? Canadian Medical Association Journal, 179(8) :751–753.

[Moritz et Bartz-Beielstein, 2017] Moritz, S. et Bartz-Beielstein, T. (2017). imputeTS : time series missing value imputation in R. The R Journal, 9(1) :207–218.

[Moritz et al., 2015] Moritz, S., Sardá, A., Bartz-Beielstein, T., Zaefferer, M. et Stork, J. (2015). Comparison of different methods for univariate time series imputation in R. Prepint arXiv 1510.03924.

[Pebesma, 2012] Pebesma, E. (2012). spacetime : spatio-temporal data in R. Journal of Statistical Software, 51(7) :1–30.

[Pigott, 2001] Pigott, T. (2001). A review of methods for missing data. Educational Research and Evaluation, 7(4) :353–383.

[Rao et Shao, 1992] Rao, J. et Shao, J. (1992). Jackknife variance estimation with survey data under hot deck imputation. Biometrika, 79(4) :811–822. | Zbl

[Reilly et Pepe, 1997] Reilly, M. et Pepe, M. (1997). The relationship between hot-deck multiple imputation and weighted likelihood. Statistics in Medecine, 16(1-3) :5–19.

[Robins et al., 1995] Robins, J., Rotnitzky, A. et Zhao, L. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association, 90(429) :106–121. | Zbl

[Robins et Wang, 2000] Robins, J. et Wang, N. (2000). Inference for imputation estimators. Biometrika, 87(1) :113–124. | Zbl

[Rosseel, 2012] Rosseel, Y. (2012). lavaan : an R package for structural equation modeling. Journal of Statistical Software, 48(2).

[Rotnitzky et al., 1998] Rotnitzky, A., Robins, J. et Scharfstein, D. (1998). Semiparametric regression for repeated outcomes with nonignorable nonresponse. Journal of the American Statistical Association, 93(444) :1321–1339. | Zbl

[Rubin, 1976] Rubin, D. (1976). Inference and missing data. Biometrika, 63(3) :581–592. | Zbl

[Rubin, 1977] Rubin, D. (1977). Formalizing subjective notions about the effect of nonrespondents in sample surveys. Journal of the American Statistical Association, 72(359) :538–543. | Zbl

[Rubin, 1987] Rubin, D. (1987). Multlipe Imputation for Nonresponse in Surveys. Wiley.

[Rubin, 2012] Rubin, D. (2012). Multiple imputation after 18+ years. Journal of the American Statistical Association, 91(434) :473–489. | Zbl

[Schafer, 1997] Schafer, J. (1997). Analysis of Incomplete Multivariate Data. CRC Monographs on Statistics & Applied Probability. Chapman and Hall/CRC, Boca Raton, FL, USA.

[Schafer, 1999] Schafer, J. (1999). Multiple imputation : a primer. Statistical Methods in Medical Research, 8(1) :3–15.

[Schafer et Graham, 2002] Schafer, J. et Graham, J. (2002). Missing data : our view of the state of the art. Psychological Methods, 7(2) :147–177.

[Schafer et Olsen, 1998] Schafer, J. et Olsen, M. (1998). Multiple imputation for multivariate missing-data problems : a data analyst’s perspective. Multivariate Behavioral Research, 33(4) :545–571.

[Seaman et White, 2011] Seaman, S. et White, I. (2011). Review of inverse probability weighting for dealing with missing data. Statistical Methods in Medical Research, 22(3) :278–295.

[Simon et Simonoff, 1986] Simon, G. et Simonoff, J. (1986). Diagnostic plots for missing data in least squares regression. Journal of the American Statistical Association, 81(394) :501–509. | Zbl

[Stacklies et al., 2007] Stacklies, W., Redestig, H., Scholz, M., Walther, D. et Selbig, J. (2007). pcaMethods – a bioconductor package providing PCA methods for incomplete data. Bioconductor, 23(9) :1164–1167.

[Stage et Crookston, 2007] Stage, A. et Crookston, N. (2007). Partitioning error components for accuracy-assessment of near-neighbor methods of imputation. Forest Science, 53(1) :62–72.

[Stekhoven et Bühlmann, 2012] Stekhoven, D. et Bühlmann, P. (2012). Missforest-non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1) :112–118.

[Stuart et al., 2009] Stuart, E., Azur, M., Frangakis, C. et Leaf, P. (2009). Multiple imputation with large data sets : a case study of the children’s mental health initiative. American Journal of Epidemiology, 169(9) :1133–1139.

[Su et al., 2011] Su, Y., Gelman, A., Hill, J. et Yajima, M. (2011). Multiple imputation with diagnostics (mi) in R : opening windows into the black box. Journal of Statistical Software, 45 :2.

[Tanner et Wong, 1987] Tanner, M. et Wong, W. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82(398) :528–540. | Zbl

[Templ et al., 2012] Templ, M., Alfons, A. et Filzmoser, P. (2012). Exploring incomplete data using visualization techniques. Advances in Data Analysis and Classification, 6(1) :29–47.

[Tenenhaus, 1998] Tenenhaus, M. (1998). La Régression PLS : Théorie et Pratique. TECHNIP. | Zbl

[Thijs et al., 2002] Thijs, H., Molenberghs, G., Michiels, B., Verbeke, G. et Curran, D. (2002). Strategies to fit pattern-mixture models. Biostatistics, 3(2) :245–265. | Zbl

[Tibshirani, 1996] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B (Methodological), 58(1) :267–288. | Zbl

[Tierney et al., 2015] Tierney, N., Harden, F., Harden, M. et Mengersen, K. (2015). Using decision trees to understand structure in missing data. BMJ Open, 5(6) :e007450.

[Tipping et Bishop, 1999] Tipping, M. et Bishop, C. (1999). Probabilistic principal component analysis. Journal of the Royal Statistical Association, Series B (Statistical Methodology), 61 :611–622. | Zbl

[Torgo, 2010] Torgo, L. (2010). Data Mining with R : Learning with Case Studies. CRC Data Mining and Knowledge Discovery Series. Chapman and Hall, Boca Raton, Florida, USA.

[Troyanskaya et al., 2001] Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D. et Altman, R. (2001). Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6) :520–525.

[Unnebrink et Windeler, 2001] Unnebrink, K. et Windeler, J. (2001). Intention-to-treat : methods for dealing with missing values in clinical trials of progressively deteriorating diseases. Statistics in Medecine, 20(24) :3931–3946.

[van Buuren, 2007] van Buuren, S. (2007). Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research, 16 :219–242. | Zbl

[van Buuren, 2012] van Buuren, S. (2012). Flexible Imputation of Missing Data. Chapman and Hall/CRC, Leiden, The Netherlands.

[van Buuren et Groothuis-Oudshoorn, 2011] van Buuren, S. et Groothuis-Oudshoorn, K. (2011). MICE : multivariate imputation by chained equations in R. Journal of Statistical Software, 45 :3.

[van der Wal et Geskus, 2011] van der Wal, W. M. et Geskus, R. B. (2011). ipw : an R package for inverse probability weighting. Journal of Statistical Software, 43(13).

[Verbanck et al., 2015] Verbanck, M., Josse, J. et Husson, F. (2015). Regularised PCA to denoise and visualise data. Statistics and Computing, 25(2) :471–486.

[Verbeke et al., 2001] Verbeke, G., Molenberghs, G., Thijs, H., Lesaffre, E. et Kenward, M. (2001). Sensitivity analysis for nonrandom dropout : a local influence approach. Biometrics, 57(1) :7–14. | Zbl

[Voillet et al., 2016] Voillet, V., Besse, P., Liaubet, L., San Cristobal, M. et Gonzáles, I. (2016). Handling missing rows in multi-omics data integration : multiple imputation in multiple factor analysis framework. BMC Bioinformatics, 17(402). Forthcoming.

[Wold, 1966] Wold, H. (1966). Estimation of principal components and related models by iterative least squares. In Krishnaiah, éditeur : Multivariate Analysis, pages 1391–1420. Academic Press, New York, USA. | Zbl

[Wu et Carroll, 1988] Wu, M. et Carroll, R. (1988). Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics, 44(1) :175–188. | Zbl

[Zeileis et Grothendieck, 2005] Zeileis, A. et Grothendieck, G. (2005). zoo : S3 infrastructure for regular and irregular time series. Journal of Statistical Software, 14(6) :1–27.

[Zhang, 2012] Zhang, S. (2012). Nearest neighbor selection for iterative kNN imputation. Journal of Systems and Software, 85(11) :2541–2552.

[Zou et Hastie, 2005] Zou, H. et Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, series B, 67(2) :301–320. | Zbl