A graph-based estimator of the number of clusters
ESAIM: Probability and Statistics, Tome 11 (2007), pp. 272-280.

Assessing the number of clusters of a statistical population is one of the essential issues of unsupervised learning. Given n independent observations X1,,Xn drawn from an unknown multivariate probability density f, we propose a new approach to estimate the number of connected components, or clusters, of the t-level set (t)={x:f(x)t}. The basic idea is to form a rough skeleton of the set (t) using any preliminary estimator of f, and to count the number of connected components of the resulting graph. Under mild analytic conditions on f, and using tools from differential geometry, we establish the consistency of our method.

DOI : 10.1051/ps:2007019
Classification : 62G05, 62G20
Mots-clés : cluster analysis, connected component, level set, graph, tubular neighborhood
@article{PS_2007__11__272_0,
     author = {Biau, G\'erard and Cadre, Beno{\^\i}t and Pelletier, Bruno},
     title = {A graph-based estimator of the number of clusters},
     journal = {ESAIM: Probability and Statistics},
     pages = {272--280},
     publisher = {EDP-Sciences},
     volume = {11},
     year = {2007},
     doi = {10.1051/ps:2007019},
     mrnumber = {2320821},
     zbl = {1187.62114},
     language = {en},
     url = {https://numdam.org/articles/10.1051/ps:2007019/}
}
TY  - JOUR
AU  - Biau, Gérard
AU  - Cadre, Benoît
AU  - Pelletier, Bruno
TI  - A graph-based estimator of the number of clusters
JO  - ESAIM: Probability and Statistics
PY  - 2007
SP  - 272
EP  - 280
VL  - 11
PB  - EDP-Sciences
UR  - https://numdam.org/articles/10.1051/ps:2007019/
DO  - 10.1051/ps:2007019
LA  - en
ID  - PS_2007__11__272_0
ER  - 
%0 Journal Article
%A Biau, Gérard
%A Cadre, Benoît
%A Pelletier, Bruno
%T A graph-based estimator of the number of clusters
%J ESAIM: Probability and Statistics
%D 2007
%P 272-280
%V 11
%I EDP-Sciences
%U https://numdam.org/articles/10.1051/ps:2007019/
%R 10.1051/ps:2007019
%G en
%F PS_2007__11__272_0
Biau, Gérard; Cadre, Benoît; Pelletier, Bruno. A graph-based estimator of the number of clusters. ESAIM: Probability and Statistics, Tome 11 (2007), pp. 272-280. doi : 10.1051/ps:2007019. https://numdam.org/articles/10.1051/ps:2007019/

[1] G.E. Bredon, Topology and Geometry, Springer-Verlag, New York, Graduate Texts in Mathematics 139 (1993). | MR | Zbl

[2] M.R. Brito, E.L. Chavez, A.J. Quiroz and J.E. Yukich, Connectivity of the mutual k-nearest neighbor graph in clustering and outlier detection. Statist. Probab. Lett. 35 (1997) 33-42. | Zbl

[3] B. Cadre, Kernel estimation of density level sets. J. Multivariate Anal. 97 (2006) 999-1023. | Zbl

[4] I. Chavel, Riemannian Geometry: A Modern Introduction. Cambridge University Press, Cambridge (1993). | MR | Zbl

[5] T.H. Cormen, C.E. Leiserson and R.L. Rivest, Introduction to Algorithms. The MIT Press, Cambridge (1990). | MR | Zbl

[6] A. Cuevas, M. Febrero and R. Fraiman, Estimating the number of clusters. Canad. J. Statist. 28 (2000) 367-382. | Zbl

[7] A. Cuevas, M. Febrero and R. Fraiman, Cluster analysis: a further approach based on density estimation. Comput. Statist. Data Anal. 36 (2001) 441-459. | Zbl

[8] L. Devroye and G. Wise, Detection of abnormal behavior via nonparametric estimation of the support. SIAM J. Appl. Math. 38 (1980) 480-488. | Zbl

[9] R.O. Duda, P.E. Hart and D.G. Stork, Pattern Classification, 2nd edition. Wiley-Interscience, New York (2000). | MR | Zbl

[10] L. Györfi, M. Kohler, A. Krzyżak and H. Walk, A Distribution-Free Theory of Nonparametric Regression. Springer-Verlag, New York (2002). | MR | Zbl

[11] J.A. Hartigan, Clustering Algorithms. John Wiley, New York (1975). | MR | Zbl

[12] T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2001). | MR | Zbl

[13] S. Kobayashi and K. Nomizu, Foundations of Differential Geometry, Vol. I & II, 2nd edition. Wiley, New York (1996). | MR | Zbl

[14] U. Von Luxburg and S. Ben-David, Towards a statistical theory of clustering. PASCAL Workshop on Statistics and Optimization of Clustering (2005).

[15] M.D. Penrose, A strong law for the longest edge of the minimal spanning tree. Ann. Probab. 27 (1999) 246-260. | Zbl

[16] A. Polonik, Measuring mass concentrations and estimating density contour clusters-an excess mass approach. Ann. Statist. 23 (1995) 855-881. | Zbl

[17] B.L.S. Prakasa Rao, Nonparametric Functional Estimation. Academic Press, Orlando (1983). | MR | Zbl

[18] A.B. Tsybakov, On nonparametric estimation of density level sets. Ann. Statist. 25 (1997) 948-969. | Zbl

  • Saavedra-Nieves, Paula; Fernández-Pérez, Martín Directional density-based clustering, Advances in Data Analysis and Classification (2024) | DOI:10.1007/s11634-024-00616-3
  • Saavedra-Nieves, Paula; Crujeiras, Rosa M. Nonparametric estimation of directional highest density regions, Advances in Data Analysis and Classification, Volume 16 (2022) no. 3, p. 761 | DOI:10.1007/s11634-021-00457-4
  • Klutchnikoff, Nicolas; Poterie, Audrey; Rouvière, Laurent Statistical analysis of a hierarchical clustering algorithm with outliers, Journal of Multivariate Analysis, Volume 192 (2022), p. 105075 | DOI:10.1016/j.jmva.2022.105075
  • Qiao, Wanli Asymptotics and optimal bandwidth for nonparametric estimation of density level sets, Electronic Journal of Statistics, Volume 14 (2020) no. 1 | DOI:10.1214/19-ejs1668
  • Qiao, Wanli; Polonik, Wolfgang Nonparametric confidence regions for level sets: Statistical properties and geometry, Electronic Journal of Statistics, Volume 13 (2019) no. 1 | DOI:10.1214/19-ejs1543
  • Kumar, Dheeraj; Ghafoori, Zahra; Bezdek, James C.; Leckie, Christopher; Ramamohanarao, Kotagiri; Palaniswami, Marimuthu Dealing with Inliers in Feature Vector Data, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Volume 26 (2018) no. Suppl. 2, p. 25 | DOI:10.1142/s021848851840010x
  • Binder, Patricia; Muma, Michael; Zoubir, Abdelhak M. Gravitational Clustering: A simple, robust and adaptive approach for distributed networks, Signal Processing, Volume 149 (2018), p. 36 | DOI:10.1016/j.sigpro.2018.02.034
  • Klemelä, Jussi Level set tree methods, WIREs Computational Statistics, Volume 10 (2018) no. 5 | DOI:10.1002/wics.1436
  • Arias-Castro, Ery; Rodríguez-Casal, Alberto On estimating the perimeter using the alpha-shape, Annales de l'Institut Henri Poincaré, Probabilités et Statistiques, Volume 53 (2017) no. 3 | DOI:10.1214/16-aihp747
  • Holmström, Lasse; Karttunen, Kyösti; Klemelä, Jussi Estimation of level set trees using adaptive partitions, Computational Statistics, Volume 32 (2017) no. 3, p. 1139 | DOI:10.1007/s00180-016-0702-2
  • Auray, Stéphane; Klutchnikoff, Nicolas; Rouvière, Laurent On clustering procedures and nonparametric mixture estimation, Electronic Journal of Statistics, Volume 9 (2015) no. 1 | DOI:10.1214/15-ejs995
  • Laloë, T.; Servien, R. Nonparametric estimation of regression level sets using kernel plug-in estimator, Journal of the Korean Statistical Society, Volume 42 (2013) no. 3, p. 301 | DOI:10.1016/j.jkss.2012.10.001
  • Arias-Castro, Ery; Pelletier, Bruno; Pudlo, Pierre The Normalized Graph Cut and Cheeger Constant: From Discrete to Continuous, Advances in Applied Probability, Volume 44 (2012) no. 4, p. 907 | DOI:10.1239/aap/1354716583
  • Arias-Castro, Ery Clustering Based on Pairwise Distances When the Data is of Mixed Dimensions, IEEE Transactions on Information Theory, Volume 57 (2011) no. 3, p. 1692 | DOI:10.1109/tit.2011.2104630
  • Koepke, Hoyt A.; Clarke, Bertrand S. On the limits of clustering in high dimensions via cost functions, Statistical Analysis and Data Mining: The ASA Data Science Journal, Volume 4 (2011) no. 1, p. 30 | DOI:10.1002/sam.10095
  • Carlsson, Gunnar; Mémoli, Facundo Multiparameter Hierarchical Clustering Methods, Classification as a Tool for Research (2010), p. 63 | DOI:10.1007/978-3-642-10745-0_6
  • Rinaldo, Alessandro; Wasserman, Larry Generalized density clustering, The Annals of Statistics, Volume 38 (2010) no. 5 | DOI:10.1214/10-aos797
  • Maier, Markus; Hein, Matthias; von Luxburg, Ulrike Optimal construction of k-nearest-neighbor graphs for identifying noisy clusters, Theoretical Computer Science, Volume 410 (2009) no. 19, p. 1749 | DOI:10.1016/j.tcs.2009.01.009

Cité par 18 documents. Sources : Crossref