Pour l’algorithme de classification des plus proches voisins (NN), une expression explicite de l’estimateur du taux d’erreur de classification par validation croisée Leave Out (LO) est proposée. Cette expression explicite est d’abord utilisée dans le cadre de l’apprentissage passif pour étudier l’impact du choix du paramètre du LO sur le choix de dans l’algorithme NN. On s’intéresse ensuite au problème de l’apprentissage actif (active learning). Une procédure de sélection des exemples basée sur la recommandation du comité des classificateurs LO est considérée. L’influence du paramètre sur le choix des nouveaux exemples et sur le choix du paramètre à chaque étape de l’apprentissage actif est étudiée. En particulier, il est montré que l’évolution de la valeur du paramètre choisie par LO en apprentissage actif est différente de celle observée en apprentissage passif.
In the binary classification framework, a closed form expression of the cross-validation Leave--Out (LO) risk estimator for the Nearest Neighbor algorithm (NN) is derived. It is first used to study the LO risk minimization strategy for choosing in the passive learning setting. The impact of on the choice of and the LO estimation of the risk are inferred. In the active learning setting, a procedure is proposed that selects new examples using a LO committee of NN classifiers. The influence of on the choice of new examples and the tuning of at each step is investigated. The behavior of chosen by LO is shown to be different from what is observed in passive learning.
Mot clés : Classification, Valildation-croisée, $k$NN, Apprentissage actif
@article{JSFS_2011__152_3_83_0, author = {Celisse, Alain and Mary-Huard, Tristan}, title = {Exact {Cross-Validation} for $k${NN} : application to passive and active learning in classification}, journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique}, pages = {83--97}, publisher = {Soci\'et\'e fran\c{c}aise de statistique}, volume = {152}, number = {3}, year = {2011}, mrnumber = {2871178}, zbl = {1316.62084}, language = {en}, url = {http://www.numdam.org/item/JSFS_2011__152_3_83_0/} }
TY - JOUR AU - Celisse, Alain AU - Mary-Huard, Tristan TI - Exact Cross-Validation for $k$NN : application to passive and active learning in classification JO - Journal de la société française de statistique PY - 2011 SP - 83 EP - 97 VL - 152 IS - 3 PB - Société française de statistique UR - http://www.numdam.org/item/JSFS_2011__152_3_83_0/ LA - en ID - JSFS_2011__152_3_83_0 ER -
%0 Journal Article %A Celisse, Alain %A Mary-Huard, Tristan %T Exact Cross-Validation for $k$NN : application to passive and active learning in classification %J Journal de la société française de statistique %D 2011 %P 83-97 %V 152 %N 3 %I Société française de statistique %U http://www.numdam.org/item/JSFS_2011__152_3_83_0/ %G en %F JSFS_2011__152_3_83_0
Celisse, Alain; Mary-Huard, Tristan. Exact Cross-Validation for $k$NN : application to passive and active learning in classification. Journal de la société française de statistique, Tome 152 (2011) no. 3, pp. 83-97. http://www.numdam.org/item/JSFS_2011__152_3_83_0/
[1] A survey of cross-validation procedures for model selection, Statist. Surv., Volume 4 (2010), pp. 40-79 | MR | Zbl
[2] Nonparametric density estimation by exact leave--out cross-validation, Comput. Statist. Data Anal., Volume 52 (2008) no. 5, pp. 2350-2368 | DOI | MR | Zbl
[3] A probabilistic theory of pattern recognition, Springer, 1996 | MR | Zbl
[4] Discriminatory analysis- nonparametric discrimination: Consistency principles, Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques, IEEE Computer Society Press, Los Alamitos, CA (1991) (Reprint of original work from 1952)
[5] Nonparametric Discrimination: small sample performance, Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques, IEEE Computer Society Press, Los Alamitos, CA (1991) (Reprint of original work from 1952)
[6] Class prediction and discovery using gene expression data, Science, Volume 286 (1999), pp. 531-537
[7] The Elements of Statistical Learning: Data Mining, Inference, and Prediction., Springer, New York, 2001 | MR | Zbl
[8] Employing EM and Pool-Based Active Learning for Text Classification, Mach. Learn.: Proc. of the Fifteenth Intern. Conf. (ICML ’98) (1998), pp. 359-367
[9] Active Learning with Real Annotation Costs, Proceedings of the NIPS Workshop on Cost-Sensitive Learning (2008), pp. 1-10
[10] Active Learning Literature Survey (2009) no. 1648 (Comp. Sci. Tech. Report)
[11] Transformation Invariance in Pattern Recognition – Tangent Distance and Tangent Propagation, International Journal of Imaging Systems and Technology, Volume 11 (2001) no. 3
[12] Query by committee, Annual Workshop on Computational Learning Theory (1992), pp. 287-294
[13] Cross-validatory choice and assessment of statistical predictions, J. Roy. Statist. Soc. Ser. B, Volume 36 (1974), pp. 111-147 | MR | Zbl