High-throughput SELEX SAGE method for quantitative modeling of transcription-factor binding sites.


Autoria(s): Roulet E.; Busso S.; Camargo A.A.; Simpson A.J.; Mermod N.; Bucher P.
Data(s)

2002

Resumo

The ability to determine the location and relative strength of all transcription-factor binding sites in a genome is important both for a comprehensive understanding of gene regulation and for effective promoter engineering in biotechnological applications. Here we present a bioinformatically driven experimental method to accurately define the DNA-binding sequence specificity of transcription factors. A generalized profile was used as a predictive quantitative model for binding sites, and its parameters were estimated from in vitro-selected ligands using standard hidden Markov model training algorithms. Computer simulations showed that several thousand low- to medium-affinity sequences are required to generate a profile of desired accuracy. To produce data on this scale, we applied high-throughput genomics methods to the biochemical problem addressed here. A method combining systematic evolution of ligands by exponential enrichment (SELEX) and serial analysis of gene expression (SAGE) protocols was coupled to an automated quality-controlled sequence extraction procedure based on Phred quality scores. This allowed the sequencing of a database of more than 10,000 potential DNA ligands for the CTF/NFI transcription factor. The resulting binding-site model defines the sequence specificity of this protein with a high degree of accuracy not achieved earlier and thereby makes it possible to identify previously unknown regulatory sequences in genomic DNA. A covariance analysis of the selected sites revealed non-independent base preferences at different nucleotide positions, providing insight into the binding mechanism.

Identificador

http://serval.unil.ch/?id=serval:BIB_AD8B977B18DC

isbn:1087-0156[print], 1087-0156[linking]

pmid:12101405

doi:10.1038/nbt718

isiid:000177182500036

Idioma(s)

en

Fonte

Nature Biotechnology, vol. 20, no. 8, pp. 831-835

Palavras-Chave #Base Sequence; Binding Sites; CCAAT-Enhancer-Binding Proteins/metabolism; Computational Biology/methods; Computer Simulation; Consensus Sequence/genetics; DNA/genetics; DNA/metabolism; DNA-Binding Proteins/metabolism; Gene Expression Regulation; Genome; Genomics/methods; Ligands; Models, Biological; NFI Transcription Factors; Protein Binding; Response Elements/genetics; Substrate Specificity; Transcription Factors/metabolism
Tipo

info:eu-repo/semantics/article

article