Biblioteca Digital

Generalized Methods for Discovering Frequent Poly-Regions in DNA

**Autoria(s):** Papapetrou, Panagiotis; Benson, Gary; Kollios, George
Data(s)	20/10/2011 20/10/2011 21/10/2008
Resumo	The problem of discovering frequent poly-regions (i.e. regions of high occurrence of a set of items or patterns of a given alphabet) in a sequence is studied, and three efficient approaches are proposed to solve it. The first one is entropy-based and applies a recursive segmentation technique that produces a set of candidate segments which may potentially lead to a poly-region. The key idea of the second approach is the use of a set of sliding windows over the sequence. Each sliding window covers a sequence segment and keeps a set of statistics that mainly include the number of occurrences of each item or pattern in that segment. Combining these statistics efficiently yields the complete set of poly-regions in the given sequence. The third approach applies a technique based on the majority vote, achieving linear running time with a minimal number of false negatives. After identifying the poly-regions, the sequence is converted to a sequence of labeled intervals (each one corresponding to a poly-region). An efficient algorithm for mining frequent arrangements of intervals is applied to the converted sequence to discover frequently occurring arrangements of poly-regions in different parts of DNA, including coding regions. The proposed algorithms are tested on various DNA sequences producing results of significant biological meaning.
Identificador	Papapetrou, Panagiotis; Benson, Gary; Kollios, George. "Generalized Methods for Discovering Frequent Poly-Regions in DNA", Technical Report BUCS-TR-2008-027, Computer Science Department, Boston University, October 17, 2008. [Available from: http://hdl.handle.net/2144/1719] http://hdl.handle.net/2144/1719
Idioma(s)	en_US
Publicador	Boston University Computer Science Department
Relação	BUCS Technical Reports;BUCS-TR-2008-027
Palavras-Chave	#Poly-regions #Bioinformatics databases #Data mining methods #Mining frequent poly-regions
Tipo	Technical Report

Acesso ao item digital