Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences
Data(s) |
01/12/2012
|
---|---|
Resumo |
Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence. |
Formato |
application/pdf |
Identificador |
http://eprints.iisc.ernet.in/45363/1/jl_bio_com_bio_10-6_mad_2012.pdf Ganapathiraju, Madhavi K and Mitchell, Asia D and Thahir, Mohamed and Motwani, Kamiya and Ananthasubramanian, Seshan (2012) Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences. In: Journal of Bioinformatics and Computational Biology, 10 (6). p. 1250016. |
Publicador |
World Scientific Publishing Company |
Relação |
http://dx.doi.org/10.1142/S0219720012500163 http://eprints.iisc.ernet.in/45363/ |
Palavras-Chave | #Supercomputer Education & Research Centre |
Tipo |
Journal Article PeerReviewed |