The new estimates of the conditional Shannon entropy are introduced in the framework of the model describing a discrete response variable depending on a vector of d factors having a density w.r.t. the Lebesgue measure in ℝ$$. Namely, the mixed-pair model (X, Y ) is considered where X and Y take values in ℝ$$ and an arbitrary finite set, respectively. Such models include, for instance, the famous logistic regression. In contrast to the well-known Kozachenko–Leonenko estimates of unconditional entropy the proposed estimates are constructed by means of the certain spacial order statistics (or k-nearest neighbor statistics where k = k$$ depends on amount of observations n) and a random number of i.i.d. observations contained in the balls of specified random radii. The asymptotic unbiasedness and L2-consistency of the new estimates are established under simple conditions. The obtained results can be applied to the feature selection problem which is important, e.g., for medical and biological investigations.
Accepté le :
DOI : 10.1051/ps/2018026
Mots-clés : Shannon entropy, conditional entropy estimates, asymptotic unbiasedness, L2-consistency, logistic regression, Gaussian model
@article{PS_2019__23__350_0, author = {Bulinski, Alexander and Kozhevin, Alexey}, title = {Statistical estimation of conditional {Shannon} entropy}, journal = {ESAIM: Probability and Statistics}, pages = {350--386}, publisher = {EDP-Sciences}, volume = {23}, year = {2019}, doi = {10.1051/ps/2018026}, zbl = {1418.60026}, mrnumber = {3975702}, language = {en}, url = {http://www.numdam.org/articles/10.1051/ps/2018026/} }
TY - JOUR AU - Bulinski, Alexander AU - Kozhevin, Alexey TI - Statistical estimation of conditional Shannon entropy JO - ESAIM: Probability and Statistics PY - 2019 SP - 350 EP - 386 VL - 23 PB - EDP-Sciences UR - http://www.numdam.org/articles/10.1051/ps/2018026/ DO - 10.1051/ps/2018026 LA - en ID - PS_2019__23__350_0 ER -
%0 Journal Article %A Bulinski, Alexander %A Kozhevin, Alexey %T Statistical estimation of conditional Shannon entropy %J ESAIM: Probability and Statistics %D 2019 %P 350-386 %V 23 %I EDP-Sciences %U http://www.numdam.org/articles/10.1051/ps/2018026/ %R 10.1051/ps/2018026 %G en %F PS_2019__23__350_0
Bulinski, Alexander; Kozhevin, Alexey. Statistical estimation of conditional Shannon entropy. ESAIM: Probability and Statistics, Tome 23 (2019), pp. 350-386. doi : 10.1051/ps/2018026. http://www.numdam.org/articles/10.1051/ps/2018026/
[1] Entropy-based inhomogeneity detection in porous media. Preprint (2016). | arXiv | MR
and ,[2] Bayesian entropy estimation for countable discrete distributions. J. Mach. Learn. Res. 15 (2014) 2833–2868. | MR | Zbl
, and ,[3] The different paths to entropy. Eur. J. Phys. 34 (2013) 303–321. | DOI | Zbl
,[4] Feature selection using joint mutual information maximisation. Exp. Syst. Appl. 42 (2014) 8520–8532. | DOI
, and ,[5] Efficient multivariate entropy estimation via k-nearest neighbour distances. J. Reine Angew. Math. 673 (2012) 1–31.
, and ,[6] Lectures of the Nearest Neighbor Method. Springer, Cham (2015). | DOI | MR
and ,[7] Nonparametric entropy estimation: an overview. Int. J. Math. Stat. Sci. 6 (1997) 17–39. | MR | Zbl
. , and ,[8] Measure Theory. Springer-Verlag, Berlin (2007). | DOI | MR | Zbl
,[9] Probability Theory. An Advanced Course. Springer-Verlag, New York (1995). | DOI | MR | Zbl
,[10] Statistical estimation of the Shannon entropy. Acta Math. Sin. 35 (2019) 17–46. | DOI | MR | Zbl
and ,[11] Improvement of the k-NN entropy estimator with applications in systems biology. Entropy 18 (2016) 13. | DOI
and ,[12] A mutual information estimator for continuous and discrete variables applied to feature selection and classification problems. Int. J. Comput. Intell. Syst. 9 (2016) 726–733. | DOI
, and ,[13] Elements of Information Theory, 2nd ed. Wiley–Interscience, New York (1991). | DOI | MR | Zbl
and ,[14] On the Kozachenko–Leonenko entropy estimator. J. Stat. Plan. Infer. 185 (2017) 69–93. | DOI | MR | Zbl
and ,[15] A comparison of mutual information estimators for feature selection, in Proc. of the 1st International Conference on Pattern Recognition Applications and Methods (2012) 176–185.
and ,[16] A computationally efficient estimator for mutual information. Proc. R. Soc. Lond. Ser. A 464 (2008) 1203–1215. | MR | Zbl
,[17] Fast binary feature selection with conditional mutual information. J. Mach. Learn. Res. 4 (2004) 1531–1555. | MR | Zbl
,[18] Estimating Mutual Information for Discrete-Continuous Mixtures. Preprint (2017). | arXiv
, , and ,[19] On the estimation of entropy. Ann. Inst. Stat. Math. 45 (1993) 69–88. | DOI | MR | Zbl
and ,[20] Optimal rates of entropy estimation over Lipschitz balls. Preprint (2017). | arXiv | MR
, , and ,[21] Practical Guide to Logistic Regression. CRC Press, Boca Raton (2015).
,[22] The nearest neighbor information estimator is adaptively near minimax rate-optimal. Preprint (2017). | arXiv
, and ,[23] Logistic Regression. A Self-Learning Text, 3rd ed. with contributions by . Springer, New York (2010). | Zbl
and ,[24] Sample estimate of the entropy of a random vector. Probl. Inf. Trans. 23 (1987) 95–101. | MR | Zbl
and ,[25] Estimating mutual information. Phys. Rev. E 69 (2004) 066138. | DOI | MR
, and ,[26] Regression Analysis with Python. Packt Publishing Ltd., Birmingham (2016).
and ,[27] Concentration inequalities and model selection, in École d’Été de Probabilités de Saint-Flour XXXIII – 2003. Springer–Verlag, Berlin (2007). | MR | Zbl
,[28] A new class of entropy estimators for multidimensional densities, in Proc. of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP03), Hong Kong, China (April 06–10, 2003) 297–300.
,[29] A pragmatic entropy and differential entropy estimator for small datasets. J. Commun. Inf. Syst. 29 (2014) 29–36.
, and ,[30] Infinite family of approximations of the Digamma function. Math. Comput. Model. 43 (2006) 1329–1336. | DOI | MR | Zbl
and ,[31] On entropy for mixtures of discrete and continuous variables. Preprint (2007). | arXiv
, and ,[32] Conditional mutual information based feature selection for classification task, in CIARP 2007, in Vol. 4756 of Lect. Notes Comput. Sci., edited by , , . Springer-Verlag, Berlin, Heidelberg (2007) 417–426. | DOI
, and ,[33] Estimation of Rényi entropy and mutual information based on generalized nearest-neighbor graphs, in Proc. of the 23rd International Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada (2010) 1849–1857.
, and ,[34] Estimation of entropy and mutual information. Neural Comput. 15 (2003) 1191–1253. | DOI | Zbl
,[35] Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005) 1226–1238. | DOI
, and ,[36] Limit theory for point processes in manifolds. Ann. Appl. Prob. 23 (2013) 2161–2211. | DOI | MR | Zbl
and ,[37] F-divergence inequalities. IEEE Trans. Inf. Theory 62 (2016) 5973–6006. | DOI | MR | Zbl
and ,[38] A mathematical theory of communication. Bell Syst. Tech. J. 27 (1948) 379–423; 623–656. | DOI | MR | Zbl
,[39] Probability – 1, 3rd ed. Springer, New York (2016). | MR | Zbl
,[40] Analysis of k-nearest neighbor distances with application to entropy estimation. Preprint (2016). | arXiv
and ,[41] Ensemble estimators for multivariate entropy estimation. IEEE Trans. Inf. Theory 59 (2013) 4374–4388. | DOI | MR | Zbl
, and ,[42] Fast multidimensional entropy estimation by k-d partitioning. IEEE Signal Process. Lett. 16 (2009) 537–540. | DOI
and ,[43] Root-n consistent estimators of entropy for densities with unbounded support. Scand. J. Stat. 23 (1996) 75–83. | MR | Zbl
and ,[44] A review of feature selection methods based on mutual information. Neural Comput. Appl. 24 (2014) 175–186. | DOI
and ,[45] Real Analysis: Theory of Measure and Integration, 3rd ed. World Scientific, Singapore (2014). | DOI | MR | Zbl
,[46] A complete proof of universal inequalities for distribution function of binomial law. Theory Probab. Appl. 57 (2013) 539–544. | DOI | MR | Zbl
and ,Cité par Sources :