An adaptive multiclass nearest neighbor classifier
ESAIM: Probability and Statistics, Tome 24 (2020), pp. 69-99.

We consider a problem of multiclass classification, where the training sample $$ is generated from the model ℙ(Y = m|X = x) = η$$(x), 1 ≤ mM, and η1(x), …, η$$(x) are unknown α-Holder continuous functions. Given a test point X, our goal is to predict its label. A widely used k-nearest-neighbors classifier constructs estimates of η1(X), …, η$$(X) and uses a plug-in rule for the prediction. However, it requires a proper choice of the smoothing parameter k, which may become tricky in some situations. We fix several integers n1, …, n$$, compute corresponding n$$-nearest-neighbor estimates for each m and each n$$ and apply an aggregation procedure. We study an algorithm, which constructs a convex combination of these estimates such that the aggregated estimate behaves approximately as well as an oracle choice. We also provide a non-asymptotic analysis of the procedure, prove its adaptation to the unknown smoothness parameter α and to the margin and establish rates of convergence under mild assumptions.

Reçu le :
Accepté le :
Première publication :
Publié le :
DOI : 10.1051/ps/2019021
Classification : 62H30, 62G08
Mots-clés : Multiclass classification, k nearest neighbors, adaptive procedures
@article{PS_2020__24_1_69_0,
     author = {Puchkin, Nikita and Spokoiny, Vladimir},
     title = {An adaptive multiclass nearest neighbor classifier},
     journal = {ESAIM: Probability and Statistics},
     pages = {69--99},
     publisher = {EDP-Sciences},
     volume = {24},
     year = {2020},
     doi = {10.1051/ps/2019021},
     mrnumber = {4069295},
     zbl = {1440.62250},
     language = {en},
     url = {http://www.numdam.org/articles/10.1051/ps/2019021/}
}
TY  - JOUR
AU  - Puchkin, Nikita
AU  - Spokoiny, Vladimir
TI  - An adaptive multiclass nearest neighbor classifier
JO  - ESAIM: Probability and Statistics
PY  - 2020
SP  - 69
EP  - 99
VL  - 24
PB  - EDP-Sciences
UR  - http://www.numdam.org/articles/10.1051/ps/2019021/
DO  - 10.1051/ps/2019021
LA  - en
ID  - PS_2020__24_1_69_0
ER  - 
%0 Journal Article
%A Puchkin, Nikita
%A Spokoiny, Vladimir
%T An adaptive multiclass nearest neighbor classifier
%J ESAIM: Probability and Statistics
%D 2020
%P 69-99
%V 24
%I EDP-Sciences
%U http://www.numdam.org/articles/10.1051/ps/2019021/
%R 10.1051/ps/2019021
%G en
%F PS_2020__24_1_69_0
Puchkin, Nikita; Spokoiny, Vladimir. An adaptive multiclass nearest neighbor classifier. ESAIM: Probability and Statistics, Tome 24 (2020), pp. 69-99. doi : 10.1051/ps/2019021. http://www.numdam.org/articles/10.1051/ps/2019021/

[1] A. Agarwal, Selective sampling algorithms for cost-sensitive multiclass prediction, in Proceedings of the 30th International Conference on Machine Learning, edited by S. Dasgupta and D. Mcallester, Vol. 28 of Proceedings of Machine Learning Research. Atlanta, Georgia, USA, (2013) 1220–1228.

[2] H. Ahn and K.-J. Kim, Corporate credit rating using multiclass classification models with order information. World Acad. Sci. Eng. Technol. 60 (2011) 95–100.

[3] E.L. Allwein, R.E. Schapire and Y. Singer, Reducing multiclass to binary: a unifying approach for margin classifiers. J. Mach. Learn. Res. 1 (2001) 113–141. | MR | Zbl

[4] J.-Y. Audibert and A.B. Tsybakov, Fast learning rates for plug-in classifiers. Ann. Stat. 35 (2007) 608–633. | MR | Zbl

[5] D. Belomestny and V. Spokoiny, Spatial aggregation of local likelihood estimates with applications to classification. Ann. Stat. 35 (2007) 2287–2311. | DOI | MR | Zbl

[6] C. Butucea, J.-F. Delmas, A. Dutfoy and R. Fischer, Optimal exponential bounds for aggregation of estimators for the Kullback-Leibler loss. Electron. J. Stat. 11 (2017) 2258–2294. | DOI | MR | Zbl

[7] T.I. Cannings, T.B. Berrett and R.J. Samworth, Local nearest neighbour classification with applications to semi-supervisedlearning (2017), . | arXiv

[8] K. Chaudhuri and S. Dasgupta, Rates of convergence for nearest neighbor classification, in Vol. 2 of Proceedings of the 27th International Conference on Neural Information Processing Systems, NIPS’14, Cambridge, MA, USA (2014) 3437–3445.

[9] K. Crammer and Y. Singer, On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2 (2002) 265–292. | Zbl

[10] D. Dai, P. Rigollet and T. Zhang, Deviation optimal learning using greedy Q-aggregation. Ann. Stat. 40 (2012) 1878–1905. | MR | Zbl

[11] A. Daniely, S. Sabato and . S.S. Shwartz, Multiclass learning approaches: A theoretical comparison with implications, in Advancesin Neural Information Processing Systems 25, edited by F. Pereira, C.J.C. Burges, L. Bottou, and K.Q. Weinberger, Curran Associates Inc. (2012), 485–493.

[12] D. Dheeru and E. Karra Taniskidou UCI machine learning repository, 2017.

[13] T.G. Dietterich and G. Bakiri, Solving multiclass learning problems via error-correcting output codes. J. Artif. Int. Res. 2 (1995) 263–286. | Zbl

[14] C.H.Q. Ding and I. Dubchak, Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17 (2001) 349–358.

[15] V. Dinh, L.S.T. Ho, N.V. Cuong, D. Nguyen and B.T. Nguyen, in Theory and Applications of Models of Computation, Vol. 9076 of Lecture Notes in Computer Sciences. Springer, Cham (2015) 375–387. | MR

[16] M. Döring, L. Györfi and H. Walk, Rate of convergence of k-nearest-neighbor classification rule. J. Mach. Learn. Res., 18 (2017) 16. | MR | Zbl

[17] S. Gadat, T. Klein and C. Marteau, Classification in general finite dimensional spaces with the k-nearest neighbor rule. Ann. Stat. 44 (2016) 982–1009. | MR | Zbl

[18] A. Ganapathiraju, J.E. Hamaker and J. Picone, Application of support vector machines to speech recognition. IEEE Trans. Signal Process. 52 (2004) 2348–2355.

[19] A. Juditsky, P. Rigollet and A.B. Tsybakov, Learning by mirror averaging. Ann. Stat. 36 (2008) 2183–2206. | MR | Zbl

[20] J. Kittler, R. Ghaderi, T. Windeatt and J. Matas. Face verification via error correcting output codes. Image Vis. Comput. 21 (2003) 1163–1169. | DOI

[21] G. Lecué, Optimal rates of aggregation in classification under low noise assumption. Bernoulli 13 (2007) 1000–1022. | DOI | MR | Zbl

[22] G. Lecué, Empirical risk minimization is optimal for the convex aggregation problem. Bernoulli 19 (2013) 2153–2166. | DOI | MR | Zbl

[23] G. Lecué and P. Rigollet, Optimal learning with Q-aggregation. Ann. Stat. 42 (2014) 211–224. | DOI | MR | Zbl

[24] E. Mammen and A.B. Tsybakov, Smooth discrimination analysis. Ann. Stat. 27 (1999) 1808–1829. | DOI | MR | Zbl

[25] V. Perchet and P. Rigollet. The multi-armed bandit problem with covariates. Ann. Stat. 41 (2013) 693–721. | DOI | MR | Zbl

[26] R. Rifkin and A. Klautau, In defense of one-vs-all classification. J. Mach. Learn. Res. 5 (2003/04) 101–141. | MR | Zbl

[27] P. Rigollet, Kullback-Leibler aggregation and misspecified generalized linear models. Ann. Stat. 40 (2012) 639–665. | DOI | MR | Zbl

[28] P. Rigollet and A.B. Tsybakov, Sparse estimation by exponential weighting. Stat. Sci. 27 (2012) 558–575. | DOI | MR | Zbl

[29] B.I. Rubinstein, P.L. Bartlett and J.H. Rubinstein, Shifting, one-inclusion mistake bounds and tight multiclass expected risk bounds, in Advances in Neural Information Processing Systems 19, edited by B. Schölkopf, J.C. Platt, and T. Hoffman, MIT Press, Cambridge (2007) 1193–1200.

[30] R.J. Samworth, Optimal weighted nearest neighbour classifiers. Ann. Stat. 40 (2012) 2733–2763. | DOI | MR | Zbl

[31] V. Spokoiny and C. Vial, Parameter tuning in pointwise adaptation using a propagation approach. Ann. Stat. 37 (2009) 2783–2807. | DOI | MR | Zbl

[32] R. Tibshirani, T. Hastie, B. Narasimhan and G. Chu, Class prediction by nearest shrunken centroids, with applications to dna microarrays. Stat. Sci. 18 (2003) 02. | DOI | MR | Zbl

[33] A.B. Tsybakov, Optimal Rates of Aggregation, Springer, Berlin (2003) 303–313. | Zbl

[34] A.B. Yuditskiĭ, A.V. Nazin, A.B. Tsybakov and N. Vayatis, Recursive aggregation of estimators by the mirror descent method with averaging. Problemy Peredachi Informatsii 41 (2005) 78–96. | Zbl

Cité par Sources :

Financial support by the Russian Academic Excellence Project 5-100 and by the German Research Foundation (DFG) through the Collaborative Research Center 1294 is gratefully acknowledged.