In this paper methods and results related to the notion of minimal forbidden words are applied to the fragment assembly problem. The fragment assembly problem can be formulated, in its simplest form, as follows: reconstruct a word from a given set of substrings (fragments) of a word . We introduce an hypothesis involving the set of fragments and the maximal length of the minimal forbidden factors of . Such hypothesis allows us to reconstruct uniquely the word from the set in linear time. We prove also that, if is a word randomly generated by a memoryless source with identical symbol probabilities, is logarithmic with respect to the size of . This result shows that our reconstruction algorithm is suited to approach the above problem in several practical applications e.g. in the case of DNA sequences.
Mots clés : factor automaton, minimal forbidden factor, fragment assembly
@article{ITA_2001__35_6_565_0, author = {Mignosi, F. and Restivo, A. and Sciortino, M.}, title = {Forbidden factors and fragment assembly}, journal = {RAIRO - Theoretical Informatics and Applications - Informatique Th\'eorique et Applications}, pages = {565--577}, publisher = {EDP-Sciences}, volume = {35}, number = {6}, year = {2001}, mrnumber = {1922296}, zbl = {1005.68122}, language = {en}, url = {http://www.numdam.org/item/ITA_2001__35_6_565_0/} }
TY - JOUR AU - Mignosi, F. AU - Restivo, A. AU - Sciortino, M. TI - Forbidden factors and fragment assembly JO - RAIRO - Theoretical Informatics and Applications - Informatique Théorique et Applications PY - 2001 SP - 565 EP - 577 VL - 35 IS - 6 PB - EDP-Sciences UR - http://www.numdam.org/item/ITA_2001__35_6_565_0/ LA - en ID - ITA_2001__35_6_565_0 ER -
%0 Journal Article %A Mignosi, F. %A Restivo, A. %A Sciortino, M. %T Forbidden factors and fragment assembly %J RAIRO - Theoretical Informatics and Applications - Informatique Théorique et Applications %D 2001 %P 565-577 %V 35 %N 6 %I EDP-Sciences %U http://www.numdam.org/item/ITA_2001__35_6_565_0/ %G en %F ITA_2001__35_6_565_0
Mignosi, F.; Restivo, A.; Sciortino, M. Forbidden factors and fragment assembly. RAIRO - Theoretical Informatics and Applications - Informatique Théorique et Applications, Tome 35 (2001) no. 6, pp. 565-577. http://www.numdam.org/item/ITA_2001__35_6_565_0/
[1] Data Structures and Algorithms. Addison Wesley, Reading, Mass (1983). | MR | Zbl
, and ,[2] Forbidden Words in Symbolic Dynamics. Adv. in Appl. Math. 25 (2000) 163-193. | MR | Zbl
, , and ,[3] The smallest automaton recognizing the subwords of a text. Theoret. Comput. Sci. 40 (1985) 31-55. | MR | Zbl
, , , , and ,[4] Words, univalent factors, and boxes. Acta Inform. (to appear). | Zbl
, and ,[5] Automata for matching patterns, in Handbook of Formal Languages, Vol. 2, Chap. 9, edited by G. Rosenberg and A. Salomaan. Springer (1997) 399-462. | MR
and ,[6] Automata and forbidden words. Inform. Process. Lett. 67 (1998) 111-117. | MR
, and ,[7] Data compression using antidictionaries, in Proc. of the IEEE, Special Issue on Lossless Data Compression, Vol. 88, edited by J.A. Storer (2000) 1756-1768.
, , and ,[8] Optimal Sequencing by Hybridization in Rounds, in Proc. of RECOMB 2001, edited by T. Lengauer, D. Sankoff, S. Istrail, P. Pevzner and M. Waterman. ACM Press (2001) 141-148
and ,[9] Algorithms on strings, trees, and sequences: Computer science and computational biology. Cambridge University Press (1997). | MR | Zbl
,[10] A new algorithm for DNA sequence assembly. J. Comput. Biol. 2 (1995) 291-306.
and ,[11] Periodicity, in M. Lothaire, Algebraic Combinatorics on Words, Chap. 8. Cambridge University Press (to appear) 237-274. Also available at url: http://www-igm.univ-mlv.fr/~berstel/Lothaire/index.html | MR
and ,[12] Forbidden Factors and Fragment Assembly. Lecture Notes in Comput. Sci. (2001). Proceedings of DLT'01. | Zbl
, and ,[13] Words and Forbidden Factors. Theoret. Comput. Sci. 273 (2002) 99-117. | MR | Zbl
, and ,[14] On Sequence Assembly, Technical Report cs-00-210. Brandeis University (2000).
, , and ,[15] Approximate nearest neighbors and sequence comparison with block operations. ACM Press (2000). Proceedings of STOC 2000. | MR | Zbl
and ,[16] Whole-Genome DNA Sequencing, IEEE Comput. Engrg. Sci. 3 (1999) 33-43.
,[17] SEQAID: A DNA Sequence Assembly Program Based on a Mathematical Model. Nucl. Acids Res. 12 (1984) 307-321.
, and ,[18] Algorithms for some string matching problems arising in molecular genetics, in Proc. of the 9th IFIP World Computer Congress (1983) 59-64.
, , and ,[19] A New Approach Fragment Assembly in DNA Sequencing, in Proc. of RECOMB 2001, edited by T. Lengauer, D. Sankoff, S. Istrail, P. Pevzner and M. Waterman. ACM Press (2001) 141-148.
, and ,[20] Large Scale Sequencing by Hybridization, in Proc. of RECOMB 2001, edited by T. Lengauer, D. Sankoff, S. Istrail, P. Pevzner and M. Waterman. ACM Press (2001) 269-278.
and ,[21] Introduction to computational biology: Maps, sequences and genomes. Chapman & Hall (1995). | Zbl
,