Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences


Autoria(s): Ganapathiraju, Madhavi K; Mitchell, Asia D; Thahir, Mohamed; Motwani, Kamiya; Ananthasubramanian, Seshan
Data(s)

01/12/2012

Resumo

Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.

Formato

application/pdf

Identificador

http://eprints.iisc.ernet.in/45363/1/jl_bio_com_bio_10-6_mad_2012.pdf

Ganapathiraju, Madhavi K and Mitchell, Asia D and Thahir, Mohamed and Motwani, Kamiya and Ananthasubramanian, Seshan (2012) Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences. In: Journal of Bioinformatics and Computational Biology, 10 (6). p. 1250016.

Publicador

World Scientific Publishing Company

Relação

http://dx.doi.org/10.1142/S0219720012500163

http://eprints.iisc.ernet.in/45363/

Palavras-Chave #Supercomputer Education & Research Centre
Tipo

Journal Article

PeerReviewed