In a multiple testing context, we consider a semiparametric mixture model with two components where one component is known and corresponds to the distribution of p-values under the null hypothesis and the other component f is nonparametric and stands for the distribution under the alternative hypothesis. Motivated by the issue of local false discovery rate estimation, we focus here on the estimation of the nonparametric unknown component f in the mixture, relying on a preliminary estimator of the unknown proportion θ of true null hypotheses. We propose and study the asymptotic properties of two different estimators for this unknown component. The first estimator is a randomly weighted kernel estimator. We establish an upper bound for its pointwise quadratic risk, exhibiting the classical nonparametric rate of convergence over a class of Hölder densities. To our knowledge, this is the first result establishing convergence as well as corresponding rate for the estimation of the unknown component in this nonparametric mixture. The second estimator is a maximum smoothed likelihood estimator. It is computed through an iterative algorithm, for which we establish a descent property. In addition, these estimators are used in a multiple testing procedure in order to estimate the local false discovery rate. Their respective performances are then compared on synthetic data.
Mots clés : false discovery rate, kernel estimation, local false discovery rate, maximum smoothed likelihood, multiple testing, p-values, semiparametric mixture model
@article{PS_2014__18__584_0, author = {Nguyen, Van Hanh and Matias, Catherine}, title = {Nonparametric estimation of the density of the alternative hypothesis in a multiple testing setup. {Application} to local false discovery rate estimation}, journal = {ESAIM: Probability and Statistics}, pages = {584--612}, publisher = {EDP-Sciences}, volume = {18}, year = {2014}, doi = {10.1051/ps/2013041}, language = {en}, url = {http://www.numdam.org/articles/10.1051/ps/2013041/} }
TY - JOUR AU - Nguyen, Van Hanh AU - Matias, Catherine TI - Nonparametric estimation of the density of the alternative hypothesis in a multiple testing setup. Application to local false discovery rate estimation JO - ESAIM: Probability and Statistics PY - 2014 SP - 584 EP - 612 VL - 18 PB - EDP-Sciences UR - http://www.numdam.org/articles/10.1051/ps/2013041/ DO - 10.1051/ps/2013041 LA - en ID - PS_2014__18__584_0 ER -
%0 Journal Article %A Nguyen, Van Hanh %A Matias, Catherine %T Nonparametric estimation of the density of the alternative hypothesis in a multiple testing setup. Application to local false discovery rate estimation %J ESAIM: Probability and Statistics %D 2014 %P 584-612 %V 18 %I EDP-Sciences %U http://www.numdam.org/articles/10.1051/ps/2013041/ %R 10.1051/ps/2013041 %G en %F PS_2014__18__584_0
Nguyen, Van Hanh; Matias, Catherine. Nonparametric estimation of the density of the alternative hypothesis in a multiple testing setup. Application to local false discovery rate estimation. ESAIM: Probability and Statistics, Tome 18 (2014), pp. 584-612. doi : 10.1051/ps/2013041. http://www.numdam.org/articles/10.1051/ps/2013041/
[1] A mixture model approach for the analysis of microarray gene expression data. Comput. Stat. Data Anal. 39 (2002) 1-20. | MR | Zbl
, , , , , and ,[2] Determination of the differentially expressed genes in microarray experiments using local fdr. BMC Bioinformatics 5 (2004) 125.
, , and ,[3] Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 (1995) 289-300. | MR | Zbl
and ,[4] A cross-validation based estimation of the proportion of true null hypotheses. J. Statist. Plann. Inference 140 (2010) 3132-3147. | MR | Zbl
, and ,[5] Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 (1977) 1-38. | MR | Zbl
, and ,[6] Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 (2001) 1151-1160. | MR | Zbl
, , and ,[7] Maximum smoothed likelihood density estimation for inverse problems. Ann. Statist. 23 (1995) 199-220. | MR | Zbl
and ,[8] Maximum penalized likelihood estimation. Vol. 1: Density estimation. Springer Ser. Statist. Springer, New York (2001). | MR | Zbl
and ,[9] Nonlinear smoothing and the EM algorithm for positive integral equations of the first kind. Appl. Math. Optim. 39 (1999) 75-91. | MR | Zbl
,[10] Kerfdr: a semi-parametric kernel-based approach to local false discovery rate estimation. BMC Bioinformatics 10 (2009) 84.
, , and ,[11] Estimating the proportion of true null hypotheses, with application to DNA microarray data. J.R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005) 555-572. | MR | Zbl
, and ,[12] Maximum smoothed likelihood for multivariate mixtures. Biometrika 98 (2011) 403-416. | MR | Zbl
, and ,[13] A mixture model for estimating the local false discovery rate in DNA microarray analysis. Bioinformatics 20 (2004) 2694-2701.
, , and ,[14] A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays. Bioinformatics 22 (2006) 1608-1615.
, and ,[15] Intrinsic bounds and false discovery rate control in multiple testing problems. Technical report (2010). arXiv:1003.0747.
,[16] On efficient estimators of the proportion of true null hypotheses in a multiple testing setup. Technical report (2012). Preprint arXiv:1205.4097. | MR
and ,[17] Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics 19 (2003) 1236-1242.
and ,[18] A semi-parametric approach for mixture models: application to local false discovery rate estimation. Comput. Statist. Data Anal. 51 (2007) 5483-5493. | MR
, , and ,[19] Plots of p-values to evaluate many tests simultaneously. Biometrika 69 (1982) 493-502.
, and ,[20] Density estimation for statistics and data analysis. Monogr. Statist. Appl. Prob. Chapman & Hall, London (1986). | MR | Zbl
,[21] A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 (2002) 479-498. | MR | Zbl
,[22] The positive false discovery rate: a Bayesian interpretation and the q-value. Ann. Statist. 31 (2003) 2013-2035. | MR | Zbl
,[23] A unified approach to false discovery rate estimation. BMC Bioinformatics 9 (2008) 303.
,[24] Oracle and adaptive compound decision rules for false discovery rate control. J. Am. Stat. Assoc. 102 (2007) 901-912. | MR
and ,[25] Large-scale multiple testing under dependence. J. Royal Stat. Soc. Series B (Statistical Methodology) 71 (2009) 393-424. | MR | Zbl
and ,[26] Introduction to nonparametric estimation. Springer Ser. Statist. Springer, New York (2009). | MR | Zbl
,[27] Consistency of the kernel density estimator: a survey. Stat. Papers 53 (2012) 1-21. | MR | Zbl
and ,Cité par Sources :