[Comment identifier un mélange gaussien en pratique ? Une étude comparative de tests]
Après une présentation générale de la problématique des mélanges, dans le but de déterminer leur nombre de composantes, nous envisageons plus précisément les mélanges gaussiens univariés. Une abondante littérature a été consacrée à ce domaine. Mais les procédures de mise en œuvre des résultats théoriques et les études comparatives des diverses procédures font cruellement défaut. Nous souhaitons apporter une contribution en ce sens, afin de faciliter les applications. Pour tester une hypothèse d’homogénéité contre une hypothèse de mélange à deux composantes, nous avons retenu deux grandes familles de tests : les tests du rapport des vraisemblances (LRT) et les tests EM. Nous proposons notamment pour le LRT une approche par plug-in de certains paramètres supposés connus dans la théorie asymptotique, ce qui rend ces tests utilisables en pratique. Pour les quatre cas de mélanges envisagés ici, nous fournissons les valeurs critiques et comparons les performances de ces tests en termes de puissance. Nous illustrons leur mise en œuvre sur des données réelles qui se rapportent au temps qui sépare les périodes d’ovulation et d’agnelage chez des brebis dans le cadre d’un projet en Région Centre.
We consider univariate Gaussian mixtures theory and applications, and particularly the problem of testing the null hypothesis of homogeneity (one component) against two components. Several approaches have been proposed in the literature during the last decades. We focus on two different techniques, one based on the Likelihood-Ratio Test (LRT), and another one based on estimation of the parameters of the mixture grounded on some specific adaptation of the well-known EM algorithm often called the EM-test. We propose in particular a novel methodology allowing application of the LRT in actual situations, by plugging-in estimates that are assumed known in asymptotic setup. We aim to provide useful comparisons between different techniques, together with guidelines for practitioners in order to enable them to use theoretical advances for analyzing actual data of realistic sample sizes. We finally illustrate these methods in an application to real data corresponding to the number of days between two events concerning ovarian response and lambing for ewes.
Mot clés : Modèle de mélange, Test du rapport de vraisemblance, Test EM, Processus Gaussien
@article{JSFS_2019__160_1_86_0, author = {Chauveau, Didier and Garel, Bernard and Mercier, Sabine}, title = {Testing for univariate two-component {Gaussian} mixture in practice}, journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique}, pages = {86--113}, publisher = {Soci\'et\'e fran\c{c}aise de statistique}, volume = {160}, number = {1}, year = {2019}, mrnumber = {3928541}, zbl = {1417.62033}, language = {en}, url = {http://www.numdam.org/item/JSFS_2019__160_1_86_0/} }
TY - JOUR AU - Chauveau, Didier AU - Garel, Bernard AU - Mercier, Sabine TI - Testing for univariate two-component Gaussian mixture in practice JO - Journal de la société française de statistique PY - 2019 SP - 86 EP - 113 VL - 160 IS - 1 PB - Société française de statistique UR - http://www.numdam.org/item/JSFS_2019__160_1_86_0/ LA - en ID - JSFS_2019__160_1_86_0 ER -
%0 Journal Article %A Chauveau, Didier %A Garel, Bernard %A Mercier, Sabine %T Testing for univariate two-component Gaussian mixture in practice %J Journal de la société française de statistique %D 2019 %P 86-113 %V 160 %N 1 %I Société française de statistique %U http://www.numdam.org/item/JSFS_2019__160_1_86_0/ %G en %F JSFS_2019__160_1_86_0
Chauveau, Didier; Garel, Bernard; Mercier, Sabine. Testing for univariate two-component Gaussian mixture in practice. Journal de la société française de statistique, Numéro spécial : analyse de mélanges, Tome 160 (2019) no. 1, pp. 86-113. http://www.numdam.org/item/JSFS_2019__160_1_86_0/
[1] Stochastic EM algorithms for parametric and semiparametric mixture models for right-censored lifetime data, Computational Statistics, Volume 31 (2016) no. 4, pp. 1513-1538 | MR | Zbl
[2] Asymptotic distribution of the likelihood ratio statistic in a prototypical non regular problem, Ghosh, J.K. et al. (Eds), Wiley Eastern Limited (1993), pp. 83-96
[3] mixtools: An R Package for Analyzing Finite Mixture Models, Journal of Statistical Software, Volume 32 (2009) no. 6, pp. 1-29 http://www.jstatsoft.org/v32/i06/
[4] Démographie figurée de la France, Masson, Paris, 1874
[5] Moyenne. Dictionnaire encyclopédique des sciences médicales, Masson, Paris (1876), pp. 296-324
[6] A simple method for resolution of a distribution into its Gaussian components, Biometrics, Volume 23 (1967), pp. 115-135
[7] Computer-assisted analysis of mixtures and applications, Monographs on Statistics and Applied Probability 81, Chapman & Hall, 2000 | MR | Zbl
[8] Large sample distribution of the likelihood ratio test for normal mixtures, Statistics and Probability Letters, Volume 52 (2001), pp. 125-133 | MR | Zbl
[9] A modified likelihood ratio test for homogeneity in finite mixture models, Journal Royal Statistical Society, Volume 63 (2001), pp. 19-29 | MR | Zbl
[10] Testing for a finite mixture model with two components, Journal Royal Statistical Society, Volume 66 (2004), pp. 95-115 | MR | Zbl
[11] ECM and MM algorithm for mixtures with constrained parameters (2013) no. hal-00625285, version 2 (Technical report) | HAL
[12] On the distribution of the likelihood ratio, Annals of Mathematical Statistics, Volume 25 (1954), pp. 573-578 | MR | Zbl
[13] Hypothesis test for normal mixture models: the EM approach, The Annals of Statistics, Volume 37 (2009), pp. 2523-2542 | MR | Zbl
[14] Tuning the EM-test for finite mixture models, The Canadian Journal of Statistics, Volume 39 (2011) no. 3, pp. 389-404 | MR | Zbl
[15] Inference on the order of a normal mixture, Journal of the American Statistical Association, Volume 107 (2012), pp. 1096-1115 | MR
[16] Testing in locally conic models and application to mixture models, Esaim Prob. Statistics, Volume 1 (1997), pp. 285-317 | Numdam | MR | Zbl
[17] Higher criticism for detecting sparse heterogeneous mixtures, Annals of Statistics (2004), pp. 962-994 | MR | Zbl
[18] Finite mixture distributions, Chapman and Hall, London, 1981 | MR | Zbl
[19] Using bootstrap likelihood ratio in finite mixture models, Journal of the Royal Statistical Society B (1996), pp. 609-617 | Zbl
[20] The “Ram Effect”: A “Non-Classical”Mechanism for Inducing LH Surges in Sheep, PLoS ONE, Volume 11 (2016) no. 7 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4934854/ | DOI
[21] Finite mixture and Markov switching models, Springer-Verlag, New-York, 2006 | MR | Zbl
[22] Likelihood ratio test for univariate Gaussian mixture, Journal of Statistical Planning and Inference, Volume 96 (2001), pp. 325-350 | MR | Zbl
[23] Asymptotic theory of the likelihood ratio test for the identification of a mixture, Journal of Statistical Planning and Inference, Volume 131 (2005), pp. 272-296 | MR | Zbl
[24] Percentiles of the supremum of a nonstationary Gaussian Process, Proceedings of the Fifth Workshop on Simulation (Ermakov, S.M.; Melas, V.B.; Pepelyshev, A.N., eds.), Springer (2005), pp. 267-272
[25] 3, Modèles de mélanges : le nombre de composants, Technip (2013), pp. 57-84
[26] Removing separation conditions in a 1 against 3-components Gaussian mixture problem, Classification, Clustering and Data Analysis (Jajuga, K.; Sokolowski, A.; Bock, H.-H., eds.), Springer (2002), pp. 61-73 | MR | Zbl
[27] On the asymptotic performance of the log likelihood ratio statistic for the mixture model and related results, Proc. Berkeley Conf. in honor of Jerzy Neyman and Jack Kiefer (LeCam, L.M.; Olshen, R.A., eds.), Monterey, Wadsworth (1985), pp. 789-806 | MR
[28] A failure of likelihood asymptotics for normal mixtures, Proc. Berkeley Conf. in honor of Jerzy Neyman and Jack Kiefer (LeCam, L.M.; Olshen, R.A., eds.), Monterey, Wadsworth (1985), pp. 807-810 | MR
[29] A constrained formulation of maximum-likelihood estimation for normal mixture distributions, Annals of Statistics, Volume 13 (1985), pp. 795-800 | MR | Zbl
[30] MixtureInf: Inference for Finite Mixture Models (2016) https://CRAN.R-project.org/package=MixtureInf (R package version 1.1)
[31] Non-finite Fisher information and homogeneity: an EM approach, Biometrika, Volume 96 (2009), pp. 411-426 | MR | Zbl
[32] Mixture models: theory, geometry and applications, NSF-CBMS Regional Conference Series in Probability and Statistics, 5, Institute of Mathematical Statistics, Hayward, 1995 | Zbl
[33] Likelihood ratio tests in contamination models, Bernoulli, Volume 5 (1999), pp. 705-719 | MR | Zbl
[34] Asymptotics for likelihood ratio tests under loss of identifiability, Annals of Statistics, Volume 31 (2003), pp. 807-832 | MR | Zbl
[35] Asymptotics for the likelihood ratio test in a two-component normal mixture model, Journal Statistical Planning and Inference, Volume 123 (2004), pp. 61-81 | MR | Zbl
[36] Assessing the number of components in a normal mixture: an alternative approach (2013) no. 50303 https://ideas.repec.org/p/pra/mprapa/50303.html (MPRA Paper)
[37] Mixture Models: Inference and Aplications to Clustering, Marcel Dekker, New-York, 1988 | MR | Zbl
[38] On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture, Appli. Statist., Volume 36 (1987), pp. 318-324
[39] The EM algorithm and extensions, Wiley and Sons, New-York, 1997 | MR
[40] Finite mixture models, Wiley Series in Probability and Statistics: Applied Probability and Statistics, Wiley-Interscience, New York, 2000, xxii+419 pages | MR | Zbl
[41] Discussion and results of observations on transits of Mercury from 1677 to 1881, Astr. Papers, Volume 1 (1882), pp. 363-487
[42] A generalized theory of the combination of observations so as to obtain the best result, American Journal of mathematics, Volume 8 (1886), pp. 343-366 | JFM | MR
[43] Testing homogeneity in a multivariate mixture model, Philosophical Transactions of the Royal Society of London, A, Volume 185 (1894), pp. 71-110 | JFM
[44] Lettres à S.A.R. le Duc régnant de Saxe-Cobourg et Gotha, sur la théorie des probabilités appliquée aux sciences morales et politiques, Hayez, Bruxelles, 1846
[45] R: A Language and Environment for Statistical Computing (2016) https://www.R-project.org/
[46] Note on the consistency of the maximum likelihood estimate for non identifiable distributions, Ann. Statistics, Volume 9 (1981), pp. 225-228 | MR | Zbl
[47] Identification du nombre de composants d’un mélange gaussien par maximum de vraisemblance dans le cas univarié (2003) (Technical report)
[48] Medical applications of finite mixture models, Statistics for Biology and Health, Springer-Verlag, Berlin, Heidelberg, 2009 | Zbl
[49] Simulated percentage points for the null distribution of the likelihood ratio for a mixture of two normals, Biometrics, Volume 44 (1988), pp. 1195-1201 | MR | Zbl
[50] Statistical analysis of finite mixture distributions, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, John Wiley & Sons Ltd., Chichester, 1985, x+243 pages | MR | Zbl
[51] The large sample distribution of the likelihood ratio for testing composite hypotheses, Annals of Mathematical Statistics, Volume 9 (1938), pp. 60-62 | JFM | Zbl
[52] A Monte-Carlo study of the sampling distribution of the likelihood ratio for mixtures of multinormal distributions, Technical Bulletin STB, 72-2, U.S. Nav. Pers. and Train. Res. Lab., San Diego, 1971