A combined statistical model for multiple motifs search


Autoria(s): Gao LF(高丽锋); Liu X(刘鑫); Guan S(官山)
Data(s)

2008

Resumo

Transcription factor binding sites (TFBS) play key roles in genebior 6.8 wavelet expression and regulation. They are short sequence segments with de¯nite structure and can be recognized by the corresponding transcription factors correctly. From the viewpoint of statistics, the candidates of TFBS should be quite di®erent from the segments that are randomly combined together by nucleotide. This paper proposes a combined statistical model for ¯nding over- represented short sequence segments in di®erent kinds of data set. While the over-represented short sequence segment is described by position weight matrix, the nucleotide distribution at most sites of the segment should be far from the background nucleotide distribution. The central idea of this approach is to search for such kind of signals. This algorithm is tested on 3 data sets, including binding sites data set of cyclic AMP receptor protein in E.coli, PlantProm DB which is a non-redundant collection of proximal promoter sequences from di®erent species, collection of the intergenic sequences of the whole genome of E.Coli. Even though the complexity of these three data sets is quite di®erent, the results show that this model is rather general and sensible.

Identificador

http://dspace.imech.ac.cn/handle/311007/33079

http://www.irgrid.ac.cn/handle/1471x/8859

Idioma(s)

英语

Fonte

Gao LF,Liu X,Guan S. A combined statistical model for multiple motifs search[J]. Chinese Physics B,2008,17(12):4396.

Palavras-Chave #交叉与边缘领域的力学::物理力学,交叉与边缘领域的力学::生物力学 #transcription factor binding sites #motif #position weight matrix
Tipo

期刊论文