5 resultados para genome-wide

em CaltechTHESIS


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The main focus of this thesis is the use of high-throughput sequencing technologies in functional genomics (in particular in the form of ChIP-seq, chromatin immunoprecipitation coupled with sequencing, and RNA-seq) and the study of the structure and regulation of transcriptomes. Some parts of it are of a more methodological nature while others describe the application of these functional genomic tools to address various biological problems. A significant part of the research presented here was conducted as part of the ENCODE (ENCyclopedia Of DNA Elements) Project.

The first part of the thesis focuses on the structure and diversity of the human transcriptome. Chapter 1 contains an analysis of the diversity of the human polyadenylated transcriptome based on RNA-seq data generated for the ENCODE Project. Chapter 2 presents a simulation-based examination of the performance of some of the most popular computational tools used to assemble and quantify transcriptomes. Chapter 3 includes a study of variation in gene expression, alternative splicing and allelic expression bias on the single-cell level and on a genome-wide scale in human lymphoblastoid cells; it also brings forward a number of critical to the practice of single-cell RNA-seq measurements methodological considerations.

The second part presents several studies applying functional genomic tools to the study of the regulatory biology of organellar genomes, primarily in mammals but also in plants. Chapter 5 contains an analysis of the occupancy of the human mitochondrial genome by TFAM, an important structural and regulatory protein in mitochondria, using ChIP-seq. In Chapter 6, the mitochondrial DNA occupancy of the TFB2M transcriptional regulator, the MTERF termination factor, and the mitochondrial RNA and DNA polymerases is characterized. Chapter 7 consists of an investigation into the curious phenomenon of the physical association of nuclear transcription factors with mitochondrial DNA, based on the diverse collections of transcription factor ChIP-seq datasets generated by the ENCODE, mouseENCODE and modENCODE consortia. In Chapter 8 this line of research is further extended to existing publicly available ChIP-seq datasets in plants and their mitochondrial and plastid genomes.

The third part is dedicated to the analytical and experimental practice of ChIP-seq. As part of the ENCODE Project, a set of metrics for assessing the quality of ChIP-seq experiments was developed, and the results of this activity are presented in Chapter 9. These metrics were later used to carry out a global analysis of ChIP-seq quality in the published literature (Chapter 10). In Chapter 11, the development and initial application of an automated robotic ChIP-seq (in which these metrics also played a major role) is presented.

The fourth part presents the results of some additional projects the author has been involved in, including the study of the role of the Piwi protein in the transcriptional regulation of transposon expression in Drosophila (Chapter 12), and the use of single-cell RNA-seq to characterize the heterogeneity of gene expression during cellular reprogramming (Chapter 13).

The last part of the thesis provides a review of the results of the ENCODE Project and the interpretation of the complexity of the biochemical activity exhibited by mammalian genomes that they have revealed (Chapters 15 and 16), an overview of the expected in the near future technical developments and their impact on the field of functional genomics (Chapter 14), and a discussion of some so far insufficiently explored research areas, the future study of which will, in the opinion of the author, provide deep insights into many fundamental but not yet completely answered questions about the transcriptional biology of eukaryotes and its regulation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The ability to regulate gene expression is of central importance for the adaptability of living organisms to changes in their internal and external environment. At the transcriptional level, binding of transcription factors (TFs) in the vicinity of promoters can modulate the rate at which transcripts are produced, and as such play an important role in gene regulation. TFs with regulatory action at multiple promoters is the rule rather than the exception, with examples ranging from TFs like the cAMP receptor protein (CRP) in E. coli that regulates hundreds of different genes, to situations involving multiple copies of the same gene, such as on plasmids, or viral DNA. When the number of TFs heavily exceeds the number of binding sites, TF binding to each promoter can be regarded as independent. However, when the number of TF molecules is comparable to the number of binding sites, TF titration will result in coupling ("entanglement") between transcription of different genes. The last few decades have seen rapid advances in our ability to quantitatively measure such effects, which calls for biophysical models to explain these data. Here we develop a statistical mechanical model which takes the TF titration effect into account and use it to predict both the level of gene expression and the resulting correlation in transcription rates for a general set of promoters. To test these predictions experimentally, we create genetic constructs with known TF copy number, binding site affinities, and gene copy number; hence avoiding the need to use free fit parameters. Our results clearly prove the TF titration effect and that the statistical mechanical model can accurately predict the fold change in gene expression for the studied cases. We also generalize these experimental efforts to cover systems with multiple different genes, using the method of mRNA fluorescence in situ hybridization (FISH). Interestingly, we can use the TF titration affect as a tool to measure the plasmid copy number at different points in the cell cycle, as well as the plasmid copy number variance. Finally, we investigate the strategies of transcriptional regulation used in a real organism by analyzing the thousands of known regulatory interactions in E. coli. We introduce a "random promoter architecture model" to identify overrepresented regulatory strategies, such as TF pairs which coregulate the same genes more frequently than would be expected by chance, indicating a related biological function. Furthermore, we investigate whether promoter architecture has a systematic effect on gene expression by linking the regulatory data of E. coli to genome-wide expression censuses.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A long-standing yet to be accomplished task in understanding behavior is to dissect the function of each gene involved in the development and function of a neuron. The C. elegans ALA neuron was chosen in this study for its known function in sleep, an ancient but less understood animal behavior. Single-cell transcriptome profiling identified 8,133 protein-coding genes in the ALA neuron, of which 57 are neuropeptide-coding genes. The most enriched genes are also neuropeptides. In combination with gain-of-function and loss-of-function assays, here I showed that the ALA-enriched FMRFamide neuropeptides, FLP-7, FLP-13, and FLP-24, are sufficient and necessary for inducing C. elegans sleep. These neuropeptides act as neuromodulators through GPCRs, NPR-7, and NPR-22. Further investigation in zebrafish indicates that FMRFamide neuropeptides are sleep-promoting molecules in animals. To correlate the behavioral outputs with genomic context, I constructed a gene regulatory network of the relevant genes controlling C. elegans sleep behavior through EGFR signaling in the ALA neuron. First, I identified an ALA cell-specific motif to conduct a genome-wide search for possible ALA-expressed genes. I then filtered out non ALA-expressed genes by comparing the motif-search genes with ALA transcriptomes from single-cell profiling. In corroborating with ChIP-seq data from modENCODE, I sorted out direct interaction of ALA-expressed transcription factors and differentiation genes in the EGFR sleep regulation pathway. This approach provides a network reference for the molecular regulation of C. elegans sleep behavior, and serves as an entry point for the understanding of functional genomics in animal behaviors.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

With recent advances in high-throughput sequencing, mapping of genome-wide transcription factor occupancy has become feasible. To advance the understanding of skeletal muscle differentiation specifically and transcriptional regulation in general, I determined the genome-wide occupancy map for myogenin in differentiating C2C12 myocyte cells. I then analyzed the myogenin map for underlying sequence content and the association between occupied elements and expression trajectories of adjacent genes. Having determined that myogenin primarily associates with expressed genes, I performed a similar analysis on occupancy maps of other transcription factors active during skeletal muscle differentiation, including an extensive analysis of co-occupancy. This analysis provided strong motif evidence for protein-protein interactions as the primary driving force in the formation of Myogenin / Mef2 and MyoD / AP-1 complexes at jointly-occupied sites. Finally, factor occupancy analysis was extended to include bHLH transcription factors in tissues other than skeletal muscle. The cross-tissue analysis led to the emergence of a motif structure used by bHLH TFs to encode either tissue-specific or "general" (public) access in a variety of lineages.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Hairpin pyrrole-imdazole polyamides are cell-permeable, sequence-programmable oligomers that bind in the minor groove of DNA. This thesis describes studies of Py-Im polyamides targeted to biologically important DNA repeat sequences for the purpose of modulating disease states. Design of a hairpin polyamide that binds the CG dyad, a site of DNA methylation that can become dysregulated in cancer, is described. We report the synthesis of a DNA methylation antagonist, its sequence specificity and affinity informed by Bind-n-Seq and iteratively designed, which improves inhibitory activity in a cell-free assay by 1000-fold to low nanomolar IC50. Additionally, a hairpin polyamide targeted to the telomeric sequence is found to trigger a slow necrotic-type cell death with the release of inflammatory molecules in a model of B cell lymphoma. The effects of the polyamide are unique in this class of oligomers; its effects are characterized and a functional assay of phagocytosis by macrophages is described. Additionally, hairpin polyamides targeted to pathologically expanded CTG•CAG triplet repeat DNA sequences, the molecular cause of myotonic dystrophy type 1, are synthesized and assessed for toxicity. Lastly, ChIP-seq of Hypoxia-Inducible Factor is performed under hypoxia-induced conditions. The study results show that ChIP-seq can be employed to understand the genome-wide perturbation of Hypoxia-Inducible Factor occupancy by a Py-Im polyamide.