Assessing the number of clusters of a statistical population is one of the essential issues of unsupervised learning. Given independent observations drawn from an unknown multivariate probability density , we propose a new approach to estimate the number of connected components, or clusters, of the -level set . The basic idea is to form a rough skeleton of the set using any preliminary estimator of , and to count the number of connected components of the resulting graph. Under mild analytic conditions on , and using tools from differential geometry, we establish the consistency of our method.
Mots clés : cluster analysis, connected component, level set, graph, tubular neighborhood
@article{PS_2007__11__272_0, author = {Biau, G\'erard and Cadre, Beno{\^\i}t and Pelletier, Bruno}, title = {A graph-based estimator of the number of clusters}, journal = {ESAIM: Probability and Statistics}, pages = {272--280}, publisher = {EDP-Sciences}, volume = {11}, year = {2007}, doi = {10.1051/ps:2007019}, mrnumber = {2320821}, zbl = {1187.62114}, language = {en}, url = {http://www.numdam.org/articles/10.1051/ps:2007019/} }
TY - JOUR AU - Biau, Gérard AU - Cadre, Benoît AU - Pelletier, Bruno TI - A graph-based estimator of the number of clusters JO - ESAIM: Probability and Statistics PY - 2007 SP - 272 EP - 280 VL - 11 PB - EDP-Sciences UR - http://www.numdam.org/articles/10.1051/ps:2007019/ DO - 10.1051/ps:2007019 LA - en ID - PS_2007__11__272_0 ER -
%0 Journal Article %A Biau, Gérard %A Cadre, Benoît %A Pelletier, Bruno %T A graph-based estimator of the number of clusters %J ESAIM: Probability and Statistics %D 2007 %P 272-280 %V 11 %I EDP-Sciences %U http://www.numdam.org/articles/10.1051/ps:2007019/ %R 10.1051/ps:2007019 %G en %F PS_2007__11__272_0
Biau, Gérard; Cadre, Benoît; Pelletier, Bruno. A graph-based estimator of the number of clusters. ESAIM: Probability and Statistics, Tome 11 (2007), pp. 272-280. doi : 10.1051/ps:2007019. http://www.numdam.org/articles/10.1051/ps:2007019/
[1] Topology and Geometry, Springer-Verlag, New York, Graduate Texts in Mathematics 139 (1993). | MR | Zbl
,[2] Connectivity of the mutual -nearest neighbor graph in clustering and outlier detection. Statist. Probab. Lett. 35 (1997) 33-42. | Zbl
, , and ,[3] Kernel estimation of density level sets. J. Multivariate Anal. 97 (2006) 999-1023. | Zbl
,[4] Riemannian Geometry: A Modern Introduction. Cambridge University Press, Cambridge (1993). | MR | Zbl
,[5] Introduction to Algorithms. The MIT Press, Cambridge (1990). | MR | Zbl
, and ,[6] Estimating the number of clusters. Canad. J. Statist. 28 (2000) 367-382. | Zbl
, and ,[7] Cluster analysis: a further approach based on density estimation. Comput. Statist. Data Anal. 36 (2001) 441-459. | Zbl
, and ,[8] Detection of abnormal behavior via nonparametric estimation of the support. SIAM J. Appl. Math. 38 (1980) 480-488. | Zbl
and ,[9] Pattern Classification, 2nd edition. Wiley-Interscience, New York (2000). | MR | Zbl
, and ,[10] A Distribution-Free Theory of Nonparametric Regression. Springer-Verlag, New York (2002). | MR | Zbl
, , and ,[11] Clustering Algorithms. John Wiley, New York (1975). | MR | Zbl
,[12] The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2001). | MR | Zbl
, and ,[13] Foundations of Differential Geometry, Vol. I & II, 2nd edition. Wiley, New York (1996). | MR | Zbl
and ,[14] Towards a statistical theory of clustering. PASCAL Workshop on Statistics and Optimization of Clustering (2005).
and ,[15] A strong law for the longest edge of the minimal spanning tree. Ann. Probab. 27 (1999) 246-260. | Zbl
,[16] Measuring mass concentrations and estimating density contour clusters-an excess mass approach. Ann. Statist. 23 (1995) 855-881. | Zbl
,[17] Nonparametric Functional Estimation. Academic Press, Orlando (1983). | MR | Zbl
,[18] On nonparametric estimation of density level sets. Ann. Statist. 25 (1997) 948-969. | Zbl
,Cité par Sources :