911 resultados para ChIP-seq
Resumo:
SUMMARY : Eukaryotic DNA interacts with the nuclear proteins using non-covalent ionic interactions. Proteins can recognize specific nucleotide sequences based on the sterical interactions with the DNA and these specific protein-DNA interactions are the basis for many nuclear processes, e.g. gene transcription, chromosomal replication, and recombination. New technology termed ChIP-Seq has been recently developed for the analysis of protein-DNA interactions on a whole genome scale and it is based on immunoprecipitation of chromatin and high-throughput DNA sequencing procedure. ChIP-Seq is a novel technique with a great potential to replace older techniques for mapping of protein-DNA interactions. In this thesis, we bring some new insights into the ChIP-Seq data analysis. First, we point out to some common and so far unknown artifacts of the method. Sequence tag distribution in the genome does not follow uniform distribution and we have found extreme hot-spots of tag accumulation over specific loci in the human and mouse genomes. These artifactual sequence tags accumulations will create false peaks in every ChIP-Seq dataset and we propose different filtering methods to reduce the number of false positives. Next, we propose random sampling as a powerful analytical tool in the ChIP-Seq data analysis that could be used to infer biological knowledge from the massive ChIP-Seq datasets. We created unbiased random sampling algorithm and we used this methodology to reveal some of the important biological properties of Nuclear Factor I DNA binding proteins. Finally, by analyzing the ChIP-Seq data in detail, we revealed that Nuclear Factor I transcription factors mainly act as activators of transcription, and that they are associated with specific chromatin modifications that are markers of open chromatin. We speculate that NFI factors only interact with the DNA wrapped around the nucleosome. We also found multiple loci that indicate possible chromatin barrier activity of NFI proteins, which could suggest the use of NFI binding sequences as chromatin insulators in biotechnology applications. RESUME : L'ADN des eucaryotes interagit avec les protéines nucléaires par des interactions noncovalentes ioniques. Les protéines peuvent reconnaître les séquences nucléotidiques spécifiques basées sur l'interaction stérique avec l'ADN, et des interactions spécifiques contrôlent de nombreux processus nucléaire, p.ex. transcription du gène, la réplication chromosomique, et la recombinaison. Une nouvelle technologie appelée ChIP-Seq a été récemment développée pour l'analyse des interactions protéine-ADN à l'échelle du génome entier et cette approche est basée sur l'immuno-précipitation de la chromatine et sur la procédure de séquençage de l'ADN à haut débit. La nouvelle approche ChIP-Seq a donc un fort potentiel pour remplacer les anciennes techniques de cartographie des interactions protéine-ADN. Dans cette thèse, nous apportons de nouvelles perspectives dans l'analyse des données ChIP-Seq. Tout d'abord, nous avons identifié des artefacts très communs associés à cette méthode qui étaient jusqu'à présent insoupçonnés. La distribution des séquences dans le génome ne suit pas une distribution uniforme et nous avons constaté des positions extrêmes d'accumulation de séquence à des régions spécifiques, des génomes humains et de la souris. Ces accumulations des séquences artéfactuelles créera de faux pics dans toutes les données ChIP-Seq, et nous proposons différentes méthodes de filtrage pour réduire le nombre de faux positifs. Ensuite, nous proposons un nouvel échantillonnage aléatoire comme un outil puissant d'analyse des données ChIP-Seq, ce qui pourraient augmenter l'acquisition de connaissances biologiques à partir des données ChIP-Seq. Nous avons créé un algorithme d'échantillonnage aléatoire et nous avons utilisé cette méthode pour révéler certaines des propriétés biologiques importantes de protéines liant à l'ADN nommés Facteur Nucléaire I (NFI). Enfin, en analysant en détail les données de ChIP-Seq pour la famille de facteurs de transcription nommés Facteur Nucléaire I, nous avons révélé que ces protéines agissent principalement comme des activateurs de transcription, et qu'elles sont associées à des modifications de la chromatine spécifiques qui sont des marqueurs de la chromatine ouverte. Nous pensons que lés facteurs NFI interagir uniquement avec l'ADN enroulé autour du nucléosome. Nous avons également constaté plusieurs régions génomiques qui indiquent une éventuelle activité de barrière chromatinienne des protéines NFI, ce qui pourrait suggérer l'utilisation de séquences de liaison NFI comme séquences isolatrices dans des applications de la biotechnologie.
Resumo:
Chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) experiments are widely used to determine, within entire genomes, the occupancy sites of any protein of interest, including, for example, transcription factors, RNA polymerases, or histones with or without various modifications. In addition to allowing the determination of occupancy sites within one cell type and under one condition, this method allows, in principle, the establishment and comparison of occupancy maps in various cell types, tissues, and conditions. Such comparisons require, however, that samples be normalized. Widely used normalization methods that include a quantile normalization step perform well when factor occupancy varies at a subset of sites, but may miss uniform genome-wide increases or decreases in site occupancy. We describe a spike adjustment procedure (SAP) that, unlike commonly used normalization methods intervening at the analysis stage, entails an experimental step prior to immunoprecipitation. A constant, low amount from a single batch of chromatin of a foreign genome is added to the experimental chromatin. This "spike" chromatin then serves as an internal control to which the experimental signals can be adjusted. We show that the method improves similarity between replicates and reveals biological differences including global and largely uniform changes.
Resumo:
MOTIVATION: High-throughput sequencing technologies enable the genome-wide analysis of the impact of genetic variation on molecular phenotypes at unprecedented resolution. However, although powerful, these technologies can also introduce unexpected artifacts. Results: We investigated the impact of library amplification bias on the identification of allele-specific (AS) molecular events from high-throughput sequencing data derived from chromatin immunoprecipitation assays (ChIP-seq). Putative AS DNA binding activity for RNA polymerase II was determined using ChIP-seq data derived from lymphoblastoid cell lines of two parent-daughter trios. We found that, at high-sequencing depth, many significant AS binding sites suffered from an amplification bias, as evidenced by a larger number of clonal reads representing one of the two alleles. To alleviate this bias, we devised an amplification bias detection strategy, which filters out sites with low read complexity and sites featuring a significant excess of clonal reads. This method will be useful for AS analyses involving ChIP-seq and other functional sequencing assays.
Resumo:
Membranous nephropathy (MN), characterized by the presence of diffuse thickening of the glomerular basement membrane and subepithelial in situimmune complex disposition, is the most common cause of idiopathic nephrotic syndrome in adults, with an incidence of 5-10 per million per year. A number of studies have confirmed the relevance of several experimental insights to the pathogenesis of human MN, but the specific biomarkers of MN have not been fully elucidated. As a result, our knowledge of the alterations in histone methylation in MN is unclear. We used chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) to analyze the variations in a methylated histone (H3K9me3) in peripheral blood mononuclear cells from 10 MN patients and 10 healthy subjects. There were 108 genes with significantly different expression in the MN patients compared with the normal controls. In MN patients, significantly increased activity was seen in 75 H3K9me3 genes, and decreased activity was seen in 33, compared with healthy subjects. Five positive genes, DiGeorge syndrome critical region gene 6 (DGCR6), sorting nexin 16 (SNX16), contactin 4 (CNTN4), baculoviral IAP repeat containing 3 (BIRC3), and baculoviral IAP repeat containing 2 (BIRC2), were selected and quantified. There were alterations of H3K9me3 in MN patients. These may be candidates to help explain pathogenesis in MN patients. Such novel findings show that H3K9me3 may be a potential biomarker or promising target for epigenetic-based MN therapies.
Resumo:
La méthode ChIP-seq est une technologie combinant la technique de chromatine immunoprecipitation avec le séquençage haut-débit et permettant l’analyse in vivo des facteurs de transcription à grande échelle. Le traitement des grandes quantités de données ainsi générées nécessite des moyens informatiques performants et de nombreux outils ont vu le jour récemment. Reste cependant que cette multiplication des logiciels réalisant chacun une étape de l’analyse engendre des problèmes de compatibilité et complique les analyses. Il existe ainsi un besoin important pour une suite de logiciels performante et flexible permettant l’identification des motifs. Nous proposons ici un ensemble complet d’analyse de données ChIP-seq disponible librement dans R et composé de trois modules PICS, rGADEM et MotIV. A travers l’analyse de quatre jeux de données des facteurs de transcription CTCF, STAT1, FOXA1 et ER nous avons démontré l’efficacité de notre ensemble d’analyse et mis en avant les fonctionnalités novatrices de celui-ci, notamment concernant le traitement des résultats par MotIV conduisant à la découverte de motifs non détectés par les autres algorithmes.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
To capture the genomic profiles for histone modification, chromatin immunoprecipitation (ChIP) is combined with next generation sequencing, which is called ChIP-seq. However, enriched regions generated from the ChIP-seq data are only evaluated on the limited knowledge acquired from manually examining the relevant biological literature. This paper proposes a novel framework, which integrates multiple knowledge sources such as biological literature, Gene Ontology, and microarray data. In order to precisely analyze ChIP-seq data for histone modification, knowledge integration is based on a unified probabilistic model. The model is employed to re-rank the enriched regions generated from peak finding algorithms. Through filtering the reranked enriched regions using some predefined threshold, more reliable and precise results could be generated. The combination of the multiple knowledge sources with the peaking finding algorithm produces a new paradigm for ChIP-seq data analysis. © (2012) Trans Tech Publications, Switzerland.
Resumo:
Chromatin immunoprecipitation (ChIP) provides a means of enriching DNA associated with transcription factors, histone modifications, and indeed any other proteins for which suitably characterized antibodies are available. Over the years, sequence detection has progressed from quantitative real-time PCR and Southern blotting to microarrays (ChIP-chip) and now high-throughput sequencing (ChIP-seq). This progression has vastly increased the sequence coverage and data volumes generated. This in turn has enabled informaticians to predict the identity of multi-protein complexes on DNA based on the overrepresentation of sequence motifs in DNA enriched by ChIP with a single antibody against a single protein. In the course of the development of high-throughput sequencing, little has changed in the ChIP methodology until recently. In the last three years, a number of modifications have been made to the ChIP protocol with the goal of enhancing the sensitivity of the method and further reducing the levels of nonspecific background sequences in ChIPped samples. In this chapter, we provide a brief commentary on these methodological changes and describe a detailed ChIP-exo method able to generate narrower peaks and greater peak coverage from ChIPped material.
Resumo:
In mammalian circadian clockwork, the CLOCK-BMAL1 complex binds to DNA enhancers of target genes and drives circadian oscillation of transcription. Here we identified 7,978 CLOCK-binding sites in mouse liver by chromatin immunoprecipitation-sequencing (ChIP-Seq), and a newly developed bioinformatics method, motif centrality analysis of ChIP-Seq (MOCCS), revealed a genome-wide distribution of previously unappreciated noncanonical E-boxes targeted by CLOCK. In vitro promoter assays showed that CACGNG, CACGTT, and CATG(T/C)G are functional CLOCK-binding motifs. Furthermore, we extensively revealed rhythmically expressed genes by poly(A)-tailed RNA-Seq and identified 1,629 CLOCK target genes within 11,926 genes expressed in the liver. Our analysis also revealed rhythmically expressed genes that have no apparent CLOCK-binding site, indicating the importance of indirect transcriptional and posttranscriptional regulations. Indirect transcriptional regulation is represented by rhythmic expression of CLOCK-regulated transcription factors, such as Krüppel-like factors (KLFs). Indirect posttranscriptional regulation involves rhythmic microRNAs that were identified by small-RNA-Seq. Collectively, CLOCK-dependent direct transactivation through multiple E-boxes and indirect regulations polyphonically orchestrate dynamic circadian outputs.
Resumo:
RNA polymerase III (Pol III) occurs in two versions, one containing the POLR3G subunit and the other the closely related POLR3GL subunit. It is not clear whether these two Pol III forms have the same function, in particular whether they recognize the same target genes. We show that the POLR3G and POLR3GL genes arose from a DNA-based gene duplication, probably in a common ancestor of vertebrates. POLR3G- as well as POLR3GL-containing Pol III are present in cultured cell lines and in normal mouse liver, although the relative amounts of the two forms vary, with the POLR3G-containing Pol III relatively more abundant in dividing cells. Genome-wide chromatin immunoprecipitations followed by high-throughput sequencing (ChIP-seq) reveal that both forms of Pol III occupy the same target genes, in very constant proportions within one cell line, suggesting that the two forms of Pol III have a similar function with regard to specificity for target genes. In contrast, the POLR3G promoter-not the POLR3GL promoter-binds the transcription factor MYC, as do all other promoters of genes encoding Pol III subunits. Thus, the POLR3G/POLR3GL duplication did not lead to neo-functionalization of the gene product (at least with regard to target gene specificity) but rather to neo-functionalization of the transcription units, which acquired different mechanisms of regulation, thus likely affording greater regulation potential to the cell.
Resumo:
BACKGROUND: The Nuclear Factor I (NFI) family of DNA binding proteins (also called CCAAT box transcription factors or CTF) is involved in both DNA replication and gene expression regulation. Using chromatin immuno-precipitation and high throughput sequencing (ChIP-Seq), we performed a genome-wide mapping of NFI DNA binding sites in primary mouse embryonic fibroblasts. RESULTS: We found that in vivo and in vitro NFI DNA binding specificities are indistinguishable, as in vivo ChIP-Seq NFI binding sites matched predictions based on previously established position weight matrix models of its in vitro binding specificity. Combining ChIP-Seq with mRNA profiling data, we found that NFI preferentially associates with highly expressed genes that it up-regulates, while binding sites were under-represented at expressed but unregulated genes. Genomic binding also correlated with markers of transcribed genes such as histone modifications H3K4me3 and H3K36me3, even outside of annotated transcribed loci, implying NFI in the control of the deposition of these modifications. Positional correlation between + and - strand ChIP-Seq tags revealed that, in contrast to other transcription factors, NFI associates with a nucleosomal length of cleavage-resistant DNA, suggesting an interaction with positioned nucleosomes. In addition, NFI binding prominently occurred at boundaries displaying discontinuities in histone modifications specific of expressed and silent chromatin, such as loci submitted to parental allele-specific imprinted expression. CONCLUSIONS: Our data thus suggest that NFI nucleosomal interaction may contribute to the partitioning of distinct chromatin domains and to epigenetic gene expression regulation.NFI ChIP-Seq and input control DNA data were deposited at Gene Expression Omnibus (GEO) repository under accession number GSE15844. Gene expression microarray data for mouse embryonic fibroblasts are on GEO accession number GSE15871.
Resumo:
BACKGROUND: In mammals, ChIP-seq studies of RNA polymerase II (PolII) occupancy have been performed to reveal how recruitment, initiation and pausing of PolII may control transcription rates, but the focus is rarely on obtaining finely resolved profiles that can portray the progression of PolII through sequential promoter states. RESULTS: Here, we analyze PolII binding profiles from high-coverage ChIP-seq on promoters of actively transcribed genes in mouse and humans. We show that the enrichment of PolII near transcription start sites exhibits a stereotypical bimodal structure, with one peak near active transcription start sites and a second peak 110 base pairs downstream from the first. Using an empirical model that reliably quantifies the spatial PolII signal, gene by gene, we show that the first PolII peak allows for refined positioning of transcription start sites, which is corroborated by mRNA sequencing. This bimodal signature is found both in mouse and humans. Analysis of the pausing-related factors NELF and DSIF suggests that the downstream peak reflects widespread pausing at the +1 nucleosome barrier. Several features of the bimodal pattern are correlated with sequence features such as CpG content and TATA boxes, as well as the histone mark H3K4me3. CONCLUSIONS: We thus show how high coverage DNA sequencing experiments can reveal as-yet unnoticed bimodal spatial features of PolII accumulation that are frequent at individual mammalian genes and reminiscent of transcription initiation and pausing. The initiation-pausing hypothesis is corroborated by evidence from run-on sequencing and immunoprecipitation in other cell types and species.
Resumo:
The functional consequences of structural variation in the human genome range from adaptation, to phenotypic variation, to predisposition to diseases. Copy number variation (CNV) was shown to influence the phenotype by modifying, in a somewhat dose-dependent manner, the expression of genes that map within them, as well as that of genes located on their flanks. To assess the possible mechanism(s) behind this neighboring effect, we compared histone modification status of cell lines from patients affected by Williams-Beuren, Williams-Beuren region duplication, Smith-Magenis or DiGeorge Syndrome and control individuals using a high-throughput version of chromatin immuno-precipitation method (ChIP), called ChlP-seq. We monitored monomethylation of lysine K20 on histone H4 and trimethylation of lysine K27 on histone H3, as proxies for open and condensed chromatin, respectively. Consistent with the changes in expression levels observed for multiple genes mapping on the entire length of chromosomes affected by structural variants, we also detected regions with modified histone status between samples, up- and downstream from the critical regions, up to the end of the rearranged chromosome. We also gauged the intrachromosomal interactions of these cell lines utilizing chromosome conformation capture (4C-seq) technique. We observed that a set of genes flanking the Williams-Beuren Syndrome critical region (WBSCR) were often looping together, possibly forming an interacting cluster with each other and the WBSCR. Deletion of the WBSCR disrupts the expression of this group of flanking genes, as well as long-range interactions between them and the rearranged interval. We conclude, that large genomic rearrangements can lead to changes in the state of the chromatin spreading far away from the critical region, thus possibly affecting expression globally and as a result modifying the phenotype of the patients. - Les conséquences fonctionnelles des variations structurelles dans le génome humain sont vastes, allant de l'adaptation, en passant par les variations phénotypiques, aux prédispositions à certaines maladies. Il a été démontré que les variations du nombre de copies (CNV) influencent le phénotype en modifiant, d'une manière plus ou moins dose-dépendante, l'expression des gènes se situant à l'intérieur de ces régions, mais également celle des gènes se trouvant dans les régions flanquantes. Afin d'étudier les mécanismes possibles sous-jacents à cet effet de voisinage, nous avons comparé les états de modification des histones dans des lignées cellulaires dérivées de patients atteints du syndrome de Williams-Beuren, de la duplication de la région Williams-Beuren, du syndrome de Smith-Magenis ou du syndrome de Di- George et d'individus contrôles en utilisant une version haut-débit de la méthode d'immunoprécipitation de la chromatine (ChIP), appelée ChIP-seq. Nous avons suivi la mono-méthylation de la lysine K20 sur l'histone H4 et la tri-méthylation de la lysine K27 sur l'histone H3, marqueurs respectifs de la chromatine ouverte et fermée. En accord avec les changements de niveaux d'expression observés pour de multiples gènes tout le long des chromosomes affectés par les CNVs, nous avons aussi détecté des régions présentant des modifications d'histones entre les échantillons, situées de part et d'autre des régions critiques, jusqu'aux extrémités du chromosome réarrangé. Nous avons aussi évalué les interactions intra-chromosomiques ayant lieu dans ces cellules par l'utilisation de la technique de capture de conformation des chromosomes (4C-seq). Nous avons observé qu'un groupe de gènes flanquants la région critique du syndrome de Williams-Beuren (WBSCR) forment souvent une boucle, constituant un groupe d'interactions privilégiées entre ces gènes et la WBSCR. La délétion de la WBSCR perturbe l'expression de ce groupe de gènes flanquants, mais également les interactions à grande échelle entre eux et la région réarrangée. Nous en concluons que les larges réarrangements génomiques peuvent aboutir à des changements de l'état de la chromatine pouvant s'étendre bien plus loin que la région critique, affectant donc potentiellement l'expression de manière globale et ainsi modifiant le phénotype des patients.