952 resultados para ChIP-Seq


Relevância:

100.00% 100.00%

Publicador:

Resumo:

La méthode ChIP-seq est une technologie combinant la technique de chromatine immunoprecipitation avec le séquençage haut-débit et permettant l’analyse in vivo des facteurs de transcription à grande échelle. Le traitement des grandes quantités de données ainsi générées nécessite des moyens informatiques performants et de nombreux outils ont vu le jour récemment. Reste cependant que cette multiplication des logiciels réalisant chacun une étape de l’analyse engendre des problèmes de compatibilité et complique les analyses. Il existe ainsi un besoin important pour une suite de logiciels performante et flexible permettant l’identification des motifs. Nous proposons ici un ensemble complet d’analyse de données ChIP-seq disponible librement dans R et composé de trois modules PICS, rGADEM et MotIV. A travers l’analyse de quatre jeux de données des facteurs de transcription CTCF, STAT1, FOXA1 et ER nous avons démontré l’efficacité de notre ensemble d’analyse et mis en avant les fonctionnalités novatrices de celui-ci, notamment concernant le traitement des résultats par MotIV conduisant à la découverte de motifs non détectés par les autres algorithmes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To capture the genomic profiles for histone modification, chromatin immunoprecipitation (ChIP) is combined with next generation sequencing, which is called ChIP-seq. However, enriched regions generated from the ChIP-seq data are only evaluated on the limited knowledge acquired from manually examining the relevant biological literature. This paper proposes a novel framework, which integrates multiple knowledge sources such as biological literature, Gene Ontology, and microarray data. In order to precisely analyze ChIP-seq data for histone modification, knowledge integration is based on a unified probabilistic model. The model is employed to re-rank the enriched regions generated from peak finding algorithms. Through filtering the reranked enriched regions using some predefined threshold, more reliable and precise results could be generated. The combination of the multiple knowledge sources with the peaking finding algorithm produces a new paradigm for ChIP-seq data analysis. © (2012) Trans Tech Publications, Switzerland.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Chromatin immunoprecipitation (ChIP) allows enrichment of genomic regions which are associated with specific transcription factors, histone modifications, and indeed any other epitopes which are present on chromatin. The original ChIP methods used site-specific PCR and Southern blotting to confirm which regions of the genome were enriched, on a candidate basis. The combination of ChIP with genomic tiling arrays (ChIP-chip) allowed a more unbiased approach to map ChIP-enriched sites. However, limitations of microarray probe design and probe number have a detrimental impact on the coverage, resolution, sensitivity, and cost of whole-genome tiling microarray sets for higher eukaryotes with large genomes. The combination of ChIP with high-throughput sequencing technology has allowed more comprehensive surveys of genome occupancy, greater resolution, and lower cost for whole genome coverage. Herein, we provide a comparison of high-throughput sequencing platforms and a survey of ChIP-seq analysis tools, discuss experimental design, and describe a detailed ChIP-seq method.Chromatin immunoprecipitation (ChIP) allows enrichment of genomic regions which are associated with specific transcription factors, histone modifications, and indeed any other epitopes which are present on chromatin. The original ChIP methods used site-specific PCR and Southern blotting to confirm which regions of the genome were enriched, on a candidate basis. The combination of ChIP with genomic tiling arrays (ChIP-chip) allowed a more unbiased approach to map ChIP-enriched sites. However, limitations of microarray probe design and probe number have a detrimental impact on the coverage, resolution, sensitivity, and cost of whole-genome tiling microarray sets for higher eukaryotes with large genomes. The combination of ChIP with high-throughput sequencing technology has allowed more comprehensive surveys of genome occupancy, greater resolution, and lower cost for whole genome coverage. Herein, we provide a comparison of high-throughput sequencing platforms and a survey of ChIP-seq analysis tools, discuss experimental design, and describe a detailed ChIP-seq method.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Chromatin immunoprecipitation (ChIP) provides a means of enriching DNA associated with transcription factors, histone modifications, and indeed any other proteins for which suitably characterized antibodies are available. Over the years, sequence detection has progressed from quantitative real-time PCR and Southern blotting to microarrays (ChIP-chip) and now high-throughput sequencing (ChIP-seq). This progression has vastly increased the sequence coverage and data volumes generated. This in turn has enabled informaticians to predict the identity of multi-protein complexes on DNA based on the overrepresentation of sequence motifs in DNA enriched by ChIP with a single antibody against a single protein. In the course of the development of high-throughput sequencing, little has changed in the ChIP methodology until recently. In the last three years, a number of modifications have been made to the ChIP protocol with the goal of enhancing the sensitivity of the method and further reducing the levels of nonspecific background sequences in ChIPped samples. In this chapter, we provide a brief commentary on these methodological changes and describe a detailed ChIP-exo method able to generate narrower peaks and greater peak coverage from ChIPped material.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

To gain insight into the mechanisms by which the Myb transcription factor controls normal hematopoiesis and particularly, how it contributes to leukemogenesis, we mapped the genome-wide occupancy of Myb by chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) in ERMYB myeloid progenitor cells. By integrating the genome occupancy data with whole genome expression profiling data, we identified a Myb-regulated transcriptional program. Gene signatures for leukemia stem cells, normal hematopoietic stem/progenitor cells and myeloid development were overrepresented in 2368 Myb regulated genes. Of these, Myb bound directly near or within 793 genes. Myb directly activates some genes known critical in maintaining hematopoietic stem cells, such as Gfi1 and Cited2. Importantly, we also show that, despite being usually considered as a transactivator, Myb also functions to repress approximately half of its direct targets, including several key regulators of myeloid differentiation, such as Sfpi1 (also known as Pu.1), Runx1, Junb and Cebpb. Furthermore, our results demonstrate that interaction with p300, an established coactivator for Myb, is unexpectedly required for Myb-mediated transcriptional repression. We propose that the repression of the above mentioned key pro-differentiation factors may contribute essentially to Myb's ability to suppress differentiation and promote self-renewal, thus maintaining progenitor cells in an undifferentiated state and promoting leukemic transformation. © 2011 The Author(s).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The main focus of this thesis is the use of high-throughput sequencing technologies in functional genomics (in particular in the form of ChIP-seq, chromatin immunoprecipitation coupled with sequencing, and RNA-seq) and the study of the structure and regulation of transcriptomes. Some parts of it are of a more methodological nature while others describe the application of these functional genomic tools to address various biological problems. A significant part of the research presented here was conducted as part of the ENCODE (ENCyclopedia Of DNA Elements) Project.

The first part of the thesis focuses on the structure and diversity of the human transcriptome. Chapter 1 contains an analysis of the diversity of the human polyadenylated transcriptome based on RNA-seq data generated for the ENCODE Project. Chapter 2 presents a simulation-based examination of the performance of some of the most popular computational tools used to assemble and quantify transcriptomes. Chapter 3 includes a study of variation in gene expression, alternative splicing and allelic expression bias on the single-cell level and on a genome-wide scale in human lymphoblastoid cells; it also brings forward a number of critical to the practice of single-cell RNA-seq measurements methodological considerations.

The second part presents several studies applying functional genomic tools to the study of the regulatory biology of organellar genomes, primarily in mammals but also in plants. Chapter 5 contains an analysis of the occupancy of the human mitochondrial genome by TFAM, an important structural and regulatory protein in mitochondria, using ChIP-seq. In Chapter 6, the mitochondrial DNA occupancy of the TFB2M transcriptional regulator, the MTERF termination factor, and the mitochondrial RNA and DNA polymerases is characterized. Chapter 7 consists of an investigation into the curious phenomenon of the physical association of nuclear transcription factors with mitochondrial DNA, based on the diverse collections of transcription factor ChIP-seq datasets generated by the ENCODE, mouseENCODE and modENCODE consortia. In Chapter 8 this line of research is further extended to existing publicly available ChIP-seq datasets in plants and their mitochondrial and plastid genomes.

The third part is dedicated to the analytical and experimental practice of ChIP-seq. As part of the ENCODE Project, a set of metrics for assessing the quality of ChIP-seq experiments was developed, and the results of this activity are presented in Chapter 9. These metrics were later used to carry out a global analysis of ChIP-seq quality in the published literature (Chapter 10). In Chapter 11, the development and initial application of an automated robotic ChIP-seq (in which these metrics also played a major role) is presented.

The fourth part presents the results of some additional projects the author has been involved in, including the study of the role of the Piwi protein in the transcriptional regulation of transposon expression in Drosophila (Chapter 12), and the use of single-cell RNA-seq to characterize the heterogeneity of gene expression during cellular reprogramming (Chapter 13).

The last part of the thesis provides a review of the results of the ENCODE Project and the interpretation of the complexity of the biochemical activity exhibited by mammalian genomes that they have revealed (Chapters 15 and 16), an overview of the expected in the near future technical developments and their impact on the field of functional genomics (Chapter 14), and a discussion of some so far insufficiently explored research areas, the future study of which will, in the opinion of the author, provide deep insights into many fundamental but not yet completely answered questions about the transcriptional biology of eukaryotes and its regulation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Mitochondria contain a 16.6 kb circular genome encoding 13 proteins as well as mitochondrial tRNAs and rRNAs. Copies of the genome are organized into nucleoids containing both DNA and proteins, including the machinery required for mtDNA replication and transcription. Although mtDNA integrity is essential for cellular and organismal viability, regulation of proliferation of the mitochondrial genome is poorly understood. To elucidate the mechanisms behind this, we chose to study the interplay between mtDNA copy number and the proteins involved in mitochondrial fusion, another required function in cells. Strikingly, we found that mouse embryonic fibroblasts lacking fusion also had a mtDNA copy number deficit. To understand this phenomenon further, we analyzed the binding of mitochondrial transcription factor A, whose role in transcription, replication, and packaging of the genome is well-established and crucial for cellular maintenance. Using ChIP-seq, we were able to detect largely uniform, non-specific binding across the genome, with no occupancy in the known specific binding sites in the regulatory region. We did detect a single binding site directly upstream of a known origin of replication, suggesting that TFAM may play a direct role in replication. Finally, although TFAM has been previously shown to localize to the nuclear genome, we found no evidence for such binding sites in our system.

To further understand the regulation of mtDNA by other proteins, we analyzed publicly available ChIP-seq datasets from ENCODE, modENCODE, and mouseENCODE for evidence of nuclear transcription factor binding to the mitochondrial genome. We identified eight human transcription factors and three mouse transcription factors that demonstrated binding events with the classical strand asymmetrical morphology of classical binding sites. ChIP-seq is a powerful tool for understanding the interactions between proteins and the mitochondrial genome, and future studies promise to further the understanding of how mtDNA is regulated within the nucleoid.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A long-standing yet to be accomplished task in understanding behavior is to dissect the function of each gene involved in the development and function of a neuron. The C. elegans ALA neuron was chosen in this study for its known function in sleep, an ancient but less understood animal behavior. Single-cell transcriptome profiling identified 8,133 protein-coding genes in the ALA neuron, of which 57 are neuropeptide-coding genes. The most enriched genes are also neuropeptides. In combination with gain-of-function and loss-of-function assays, here I showed that the ALA-enriched FMRFamide neuropeptides, FLP-7, FLP-13, and FLP-24, are sufficient and necessary for inducing C. elegans sleep. These neuropeptides act as neuromodulators through GPCRs, NPR-7, and NPR-22. Further investigation in zebrafish indicates that FMRFamide neuropeptides are sleep-promoting molecules in animals. To correlate the behavioral outputs with genomic context, I constructed a gene regulatory network of the relevant genes controlling C. elegans sleep behavior through EGFR signaling in the ALA neuron. First, I identified an ALA cell-specific motif to conduct a genome-wide search for possible ALA-expressed genes. I then filtered out non ALA-expressed genes by comparing the motif-search genes with ALA transcriptomes from single-cell profiling. In corroborating with ChIP-seq data from modENCODE, I sorted out direct interaction of ALA-expressed transcription factors and differentiation genes in the EGFR sleep regulation pathway. This approach provides a network reference for the molecular regulation of C. elegans sleep behavior, and serves as an entry point for the understanding of functional genomics in animal behaviors.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Hairpin pyrrole-imdazole polyamides are cell-permeable, sequence-programmable oligomers that bind in the minor groove of DNA. This thesis describes studies of Py-Im polyamides targeted to biologically important DNA repeat sequences for the purpose of modulating disease states. Design of a hairpin polyamide that binds the CG dyad, a site of DNA methylation that can become dysregulated in cancer, is described. We report the synthesis of a DNA methylation antagonist, its sequence specificity and affinity informed by Bind-n-Seq and iteratively designed, which improves inhibitory activity in a cell-free assay by 1000-fold to low nanomolar IC50. Additionally, a hairpin polyamide targeted to the telomeric sequence is found to trigger a slow necrotic-type cell death with the release of inflammatory molecules in a model of B cell lymphoma. The effects of the polyamide are unique in this class of oligomers; its effects are characterized and a functional assay of phagocytosis by macrophages is described. Additionally, hairpin polyamides targeted to pathologically expanded CTG•CAG triplet repeat DNA sequences, the molecular cause of myotonic dystrophy type 1, are synthesized and assessed for toxicity. Lastly, ChIP-seq of Hypoxia-Inducible Factor is performed under hypoxia-induced conditions. The study results show that ChIP-seq can be employed to understand the genome-wide perturbation of Hypoxia-Inducible Factor occupancy by a Py-Im polyamide.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

cERMIT is a computationally efficient motif discovery tool based on analyzing genome-wide quantitative regulatory evidence. Instead of pre-selecting promising candidate sequences, it utilizes information across all sequence regions to search for high-scoring motifs. We apply cERMIT on a range of direct binding and overexpression datasets; it substantially outperforms state-of-the-art approaches on curated ChIP-chip datasets, and easily scales to current mammalian ChIP-seq experiments with data on thousands of non-coding regions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

DNaseI footprinting is an established assay for identifying transcription factor (TF)-DNA interactions with single base pair resolution. High-throughput DNase-seq assays have recently been used to detect in vivo DNase footprints across the genome. Multiple computational approaches have been developed to identify DNase-seq footprints as predictors of TF binding. However, recent studies have pointed to a substantial cleavage bias of DNase and its negative impact on predictive performance of footprinting. To assess the potential for using DNase-seq to identify individual binding sites, we performed DNase-seq on deproteinized genomic DNA and determined sequence cleavage bias. This allowed us to build bias corrected and TF-specific footprint models. The predictive performance of these models demonstrated that predicted footprints corresponded to high-confidence TF-DNA interactions. DNase-seq footprints were absent under a fraction of ChIP-seq peaks, which we show to be indicative of weaker binding, indirect TF-DNA interactions or possible ChIP artifacts. The modeling approach was also able to detect variation in the consensus motifs that TFs bind to. Finally, cell type specific footprints were detected within DNase hypersensitive sites that are present in multiple cell types, further supporting that footprints can identify changes in TF binding that are not detectable using other strategies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.

We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.

We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.

Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.

This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.