In this paper, we consider a possible representation of a DNA sequence in a quaternary tree, in which one can visualize repetitions of subwords (seen as suffixes of subsequences). The CGR-tree turns a sequence of letters into a Digital Search Tree (DST), obtained from the suffixes of the reversed sequence. Several results are known concerning the height, the insertion depth for DST built from independent successive random sequences having the same distribution. Here the successive inserted words are strongly dependent. We give the asymptotic behaviour of the insertion depth and the length of branches for the CGR-tree obtained from the suffixes of a reversed i.i.d. or markovian sequence. This behaviour turns out to be at first order the same one as in the case of independent words. As a by-product, asymptotic results on the length of longest runs in a markovian sequence are obtained.
Mots clés : random tree, digital search tree, CGR, lengths of the paths, height, insertion depth, asymptotic growth, strong convergence
@article{PS_2009__13__15_0, author = {C\'enac, Peggy and Chauvin, Brigitte and Ginouillac, St\'ephane and Pouyanne, Nicolas}, title = {Digital search trees and chaos game representation}, journal = {ESAIM: Probability and Statistics}, pages = {15--37}, publisher = {EDP-Sciences}, volume = {13}, year = {2009}, doi = {10.1051/ps:2007043}, mrnumber = {2493853}, language = {en}, url = {http://www.numdam.org/articles/10.1051/ps:2007043/} }
TY - JOUR AU - Cénac, Peggy AU - Chauvin, Brigitte AU - Ginouillac, Stéphane AU - Pouyanne, Nicolas TI - Digital search trees and chaos game representation JO - ESAIM: Probability and Statistics PY - 2009 SP - 15 EP - 37 VL - 13 PB - EDP-Sciences UR - http://www.numdam.org/articles/10.1051/ps:2007043/ DO - 10.1051/ps:2007043 LA - en ID - PS_2009__13__15_0 ER -
%0 Journal Article %A Cénac, Peggy %A Chauvin, Brigitte %A Ginouillac, Stéphane %A Pouyanne, Nicolas %T Digital search trees and chaos game representation %J ESAIM: Probability and Statistics %D 2009 %P 15-37 %V 13 %I EDP-Sciences %U http://www.numdam.org/articles/10.1051/ps:2007043/ %R 10.1051/ps:2007043 %G en %F PS_2009__13__15_0
Cénac, Peggy; Chauvin, Brigitte; Ginouillac, Stéphane; Pouyanne, Nicolas. Digital search trees and chaos game representation. ESAIM: Probability and Statistics, Tome 13 (2009), pp. 15-37. doi : 10.1051/ps:2007043. http://www.numdam.org/articles/10.1051/ps:2007043/
[1] Handbook of mathematical functions with formulas, graphs, and mathematical tables, National Bureau of Standards Applied Mathematics Series 55. For sale by the superintendent of Documents, U.S. Government Printing Office, Washington, D.C. (1964). | MR | Zbl
and ,[2] A diffusion limit for a class of randomly-growing binary search trees. Probab. Theory Related Fields 79 (1998) 509-542. | MR | Zbl
and ,[3] Analysis of genomic sequences by Chaos Game Representation. Bioinformatics 17 (2001) 429-437.
, , , and ,[4] How many random digits are required until given sequences are obtained? J. Appl. Probab. 19 (1982) 518-531. | MR | Zbl
and ,[5] Test on the structure of biological sequences via chaos game representation. Stat. Appl. Genet. Mol. Biol. 4 (2005) 36 (electronic). | MR | Zbl
,[6] Dynamical systems in the analysis of biological sequences. Technical Report 5351, INRIA (2004).
, and ,[7] The variance of the height of digital search trees. Acta Informatica 38 (2002) 261-276. | MR | Zbl
,[8] Random Iterative Models. Springer (1997). | MR | Zbl
,[9] On the length of the longest head run, in Topics in Information Theory 16 (1975) 219-228, I. Csizàr and P. Elias, Eds. North-Holland, Amsterdam Colloq. Math. Soc. Jànos Bolyai. | MR | Zbl
and ,[10] On the length of the longest head-run. In Topics in information theory (Second Colloq., Keszthely, 1975). Colloq. Math. Soc. János Bolyai 16 (1977) 219-228. | MR | Zbl
and ,[11] Compression de données sans perte et combinatoire analytique. Thèse de l'université Paris VI (2006), available at http://www.lri.fr/ fayolle/these.pdf.
,[12] Bounds for reliability of large consecutive-k-out-of-n:f system. IEEE Trans. Reliability 35 (1986) 316-319. | Zbl
,[13] Distribution theory of runs: a markov chain approach. J. Amer. Statist. Soc. 89 (1994) 1050-1058. | MR | Zbl
and ,[14] The occurence of sequence patterns in repeated experiments and hitting times in a markov chain. Stochastic Process. Appl. 11 (1981) 101-108. | MR | Zbl
and ,[15] Nucleotide, dinucleotide and trinucleotide frequencies explain patterns observed in chaos game representations of DNA sequences. Nucleic Acids Res. 21 (1993) 2487-2491.
,[16] An extreme value theory for long head runs. Probab. Theory Related Fields 72 (1986) 279-287. | MR | Zbl
, and ,[17] Chaos Game Representation of gene structure. Nucleic Acid. Res. 18 (1990) 2163-2170.
,[18] Waiting times and number of appearances of events in a sequence of discrete random variables, in Advances in combinatorial methods and applications to probability and statistics, Stat. Ind. Technol., Birkhäuser Boston, Boston, MA (1997) 363-384. | MR | Zbl
,[19] A martingale approach to the study of occurrence of sequence patterns in repeated experiments. Ann. Probab. 8 (1980) 1171-1176. | MR | Zbl
,[20] Evolution of random search trees. Wiley-Interscience Series in Discrete Mathematics and Optimization. John Wiley & Sons Inc., New York (1992). | MR | Zbl
,[21] Problem: Penney-ante. J. Recreational Math. 2 (1969) 241.
,[22] On the probabilities of large deviations for sums of independent random variables. Theory Prob. Appl. (1965) 287-298. | MR | Zbl
,[23] Asymptotic growth of a class of random trees. Annals Probab. 13 (1985) 414-427. | MR | Zbl
,[24] A martingale approach to scan statistics. Ann. Inst. Statist. Math. 57 (2005) 21-37. | MR | Zbl
, , and ,[25] A unified approach to word occurence probabilities. Discrete Appl. Math. 104 (2000) 259-280. | MR | Zbl
,[26] Probabilistic and statistical properties of words: An overview. J. Comput. Biology 7 (2000) 1-46.
, and ,[27] Exact distribution of word occurences in a random sequence of letters. J. Appl. Prob. 36 (1999) 179-193. | MR | Zbl
and ,[28] Novel techniques of graphical representation and analysis of DNA sequences - A review. J. Biosci. 23 (1998) 55-71.
, and ,[29] On the length of the longest head-run for a markov chain with two states. Theory Probab. Appl. 26 (1981) 498-509. | MR | Zbl
,[30] Explicit distributional results in pattern formation. Ann. Appl. Probab. 7 (1997) 666-678. | MR | Zbl
and ,[31] Average Case Analysis of Algorithms on Sequences. John Wiley & Sons, New York (2001). | MR | Zbl
,[32] Probability with martingales. Cambridge Mathematical Textbooks. Cambridge University Press, Cambridge (1991). | MR | Zbl
,Cité par Sources :