981 resultados para Regulatory Elements
Resumo:
The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.
Resumo:
Alignments of homologous genomic sequences are widely used to identify functional genetic elements and study their evolution. Most studies tacitly equate homology of functional elements with sequence homology. This assumption is violated by the phenomenon of turnover, in which functionally equivalent elements reside at locations that are nonorthologous at the sequence level. Turnover has been demonstrated previously for transcription-factor-binding sites. Here, we show that transcription start sites of equivalent genes do not always reside at equivalent locations in the human and mouse genomes. We also identify two types of partial turnover, illustrating evolutionary pathways that could lead to complete turnover. These findings suggest that the signals encoding transcription start sites are highly flexible and evolvable, and have cautionary implications for the use of sequence-level conservation to detect gene regulatory elements.
Resumo:
Neural Crest cells (NCC) constitute a unique embryonic cell population that arises between the prospective epidermis and the dorsal aspect of the neural tube of vertebrates. NCC migrate ventromedially and dorsolaterally throughout the developing embryo giving rise to the peripheral nervous system constituents and melanocytes that ultimately reside in the skin and hair follicles respectively. Mice and humans with mutations in the Endothelin receptor b (Ednrb) gene manifest strikingly similar phenotypes characterized by hypopigmentation, hearing loss and megacolon these are due to absence of melanocytes in the skin and inner ear and lack of enteric ganglia in the distal part of the gut, respectively. Piebald lethal mice and humans with Hirschsprung's disease or Waardenburg syndrome carry different mutations in the Ednrb gene. The major goals of this project were to determine whether the action of Ednrb in NCC is required prior to commitment of these cells to the melanocytic lineage and to investigate its potential participation in the actual process of commitment. In order to achieve these goals transgenic mice that express Ednrb under two different regulatory elements were created. The first, Dct-Ednrb, expresses Ednrb under the control of the DOPAchrome tautomerase (Dct) promoter to direct expression to already committed melanocyte precursors. The second, Nes-Ednrb, expresses Ednrb under the regulation of the human nestin gene second enhancer to direct expression to pre-migratory NCC. Crosses of the Dct-Ednrb mouse with piebald lethal showed that the transgene was capable of rescuing the hypopigmentation phenotype of the later. This result indicates that the action of Ednrb after NCC commit to the melanocytic lineage is sufficient for normal melanocyte development. The Dct-Ednrb was further crossed with two other hypopigmentation mutants that carry mutations in the transcription factors Sox10 and Pax3. The transgene rescued the phenotype of the Sox10 mutant only. This suggests that Ednrb interacts with Sox10 but not with Pax3 during melanocyte development. The Nes-Ednrb mice developed a hypopigmentation phenotype that was augmented when crossed with piebald lethal or lethal spotting (mutation in Edn3, the ligand for Ednrb) mice but was rescued by over expression of Edn3. These results suggest that alterations in Ednrb expression early in development affect melanocyte development. This study provides novel information necessary to better understand the early embryonic development of NCC, clarifies specific interactions between different melanogenic genes and, could eventually help in the implementation of therapies for human pigmentary genetic disorders. ^
Resumo:
The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^
Resumo:
To carry out their specific roles in the cell, genes and gene products often work together in groups, forming many relationships among themselves and with other molecules. Such relationships include physical protein-protein interaction relationships, regulatory relationships, metabolic relationships, genetic relationships, and much more. With advances in science and technology, some high throughput technologies have been developed to simultaneously detect tens of thousands of pairwise protein-protein interactions and protein-DNA interactions. However, the data generated by high throughput methods are prone to noise. Furthermore, the technology itself has its limitations, and cannot detect all kinds of relationships between genes and their products. Thus there is a pressing need to investigate all kinds of relationships and their roles in a living system using bioinformatic approaches, and is a central challenge in Computational Biology and Systems Biology. This dissertation focuses on exploring relationships between genes and gene products using bioinformatic approaches. Specifically, we consider problems related to regulatory relationships, protein-protein interactions, and semantic relationships between genes. A regulatory element is an important pattern or "signal", often located in the promoter of a gene, which is used in the process of turning a gene "on" or "off". Predicting regulatory elements is a key step in exploring the regulatory relationships between genes and gene products. In this dissertation, we consider the problem of improving the prediction of regulatory elements by using comparative genomics data. With regard to protein-protein interactions, we have developed bioinformatics techniques to estimate support for the data on these interactions. While protein-protein interactions and regulatory relationships can be detected by high throughput biological techniques, there is another type of relationship called semantic relationship that cannot be detected by a single technique, but can be inferred using multiple sources of biological data. The contributions of this thesis involved the development and application of a set of bioinformatic approaches that address the challenges mentioned above. These included (i) an EM-based algorithm that improves the prediction of regulatory elements using comparative genomics data, (ii) an approach for estimating the support of protein-protein interaction data, with application to functional annotation of genes, (iii) a novel method for inferring functional network of genes, and (iv) techniques for clustering genes using multi-source data.
Resumo:
A large proportion of the variation in traits between individuals can be attributed to variation in the nucleotide sequence of the genome. The most commonly studied traits in human genetics are related to disease and disease susceptibility. Although scientists have identified genetic causes for over 4,000 monogenic diseases, the underlying mechanisms of many highly prevalent multifactorial inheritance disorders such as diabetes, obesity, and cardiovascular disease remain largely unknown. Identifying genetic mechanisms for complex traits has been challenging because most of the variants are located outside of protein-coding regions, and determining the effects of such non-coding variants remains difficult. In this dissertation, I evaluate the hypothesis that such non-coding variants contribute to human traits and diseases by altering the regulation of genes rather than the sequence of those genes. I will specifically focus on studies to determine the functional impacts of genetic variation associated with two related complex traits: gestational hyperglycemia and fetal adiposity. At the genomic locus associated with maternal hyperglycemia, we found that genetic variation in regulatory elements altered the expression of the HKDC1 gene. Furthermore, we demonstrated that HKDC1 phosphorylates glucose in vitro and in vivo, thus demonstrating that HKDC1 is a fifth human hexokinase gene. At the fetal-adiposity associated locus, we identified variants that likely alter VEPH1 expression in preadipocytes during differentiation. To make such studies of regulatory variation high-throughput and routine, we developed POP-STARR, a novel high throughput reporter assay that can empirically measure the effects of regulatory variants directly from patient DNA. By combining targeted genome capture technologies with STARR-seq, we assayed thousands of haplotypes from 760 individuals in a single experiment. We subsequently used POP-STARR to identify three key features of regulatory variants: that regulatory variants typically have weak effects on gene expression; that the effects of regulatory variants are often coordinated with respect to disease-risk, suggesting a general mechanism by which the weak effects can together have phenotypic impact; and that nucleotide transversions have larger impacts on enhancer activity than transitions. Together, the findings presented here demonstrate successful strategies for determining the regulatory mechanisms underlying genetic associations with human traits and diseases, and value of doing so for driving novel biological discovery.
Resumo:
Gene regulation is a complex and tightly controlled process that defines cell function in physiological and abnormal states. Programmable gene repression technologies enable loss-of-function studies for dissecting gene regulation mechanisms and represent an exciting avenue for gene therapy. Established and recently developed methods now exist to modulate gene sequence, epigenetic marks, transcriptional activity, and post-transcriptional processes, providing unprecedented genetic control over cell phenotype. Our objective was to apply and develop targeted repression technologies for regenerative medicine, genomics, and gene therapy applications. We used RNA interference to control cell cycle regulation in myogenic differentiation and enhance the proliferative capacity of tissue engineered cartilage constructs. These studies demonstrate how modulation of a single gene can be used to guide cell differentiation for regenerative medicine strategies. RNA-guided gene regulation with the CRISPR/Cas9 system has rapidly expanded the targeted repression repertoire from silencing single protein-coding genes to modulation of genes, promoters, and other distal regulatory elements. In order to facilitate its adaptation for basic research and translational applications, we demonstrated the high degree of specificity for gene targeting, gene silencing, and chromatin modification possible with Cas9 repressors. The specificity and effectiveness of RNA-guided transcriptional repressors for silencing endogenous genes are promising characteristics for mechanistic studies of gene regulation and cell phenotype. Furthermore, our results support the use of Cas9-based repressors as a platform for novel gene therapy strategies. We developed an in vivo AAV-based gene repression system for silencing endogenous genes in a mouse model. Together, these studies demonstrate the utility of gene repression tools for guiding cell phenotype and the potential of the RNA-guided CRISPR/Cas9 platform for applications such as causal studies of gene regulatory mechanisms and gene therapy.
Resumo:
© Medina et al.Although cell cycle control is an ancient, conserved, and essential process, some core animal and fungal cell cycle regulators share no more sequence identity than non-homologous proteins. Here, we show that evolution along the fungal lineage was punctuated by the early acquisition and entrainment of the SBF transcription factor through horizontal gene transfer. Cell cycle evolution in the fungal ancestor then proceeded through a hybrid network containing both SBF and its ancestral animal counterpart E2F, which is still maintained in many basal fungi. We hypothesize that a virally-derived SBF may have initially hijacked cell cycle control by activating transcription via the cis-regulatory elements targeted by the ancestral cell cycle regulator E2F, much like extant viral oncogenes. Consistent with this hypothesis, we show that SBF can regulate promoters with E2F binding sites in budding yeast.
Resumo:
To provide biological insights into transcriptional regulation, a couple of groups have recently presented models relating the promoter DNA-bound transcription factors (TFs) to downstream gene’s mean transcript level or transcript production rates over time. However, transcript production is dynamic in response to changes of TF concentrations over time. Also, TFs are not the only factors binding to promoters; other DNA binding factors (DBFs) bind as well, especially nucleosomes, resulting in competition between DBFs for binding at same genomic location. Additionally, not only TFs, but also some other elements regulate transcription. Within core promoter, various regulatory elements influence RNAPII recruitment, PIC formation, RNAPII searching for TSS, and RNAPII initiating transcription. Moreover, it is proposed that downstream from TSS, nucleosomes resist RNAPII elongation.
Here, we provide a machine learning framework to predict transcript production rates from DNA sequences. We applied this framework in the S. cerevisiae yeast for two scenarios: a) to predict the dynamic transcript production rate during the cell cycle for native promoters; b) to predict the mean transcript production rate over time for synthetic promoters. As far as we know, our framework is the first successful attempt to have a model that can predict dynamic transcript production rates from DNA sequences only: with cell cycle data set, we got Pearson correlation coefficient Cp = 0.751 and coefficient of determination r2 = 0.564 on test set for predicting dynamic transcript production rate over time. Also, for DREAM6 Gene Promoter Expression Prediction challenge, our fitted model outperformed all participant teams, best of all teams, and a model combining best team’s k-mer based sequence features and another paper’s biologically mechanistic features, in terms of all scoring metrics.
Moreover, our framework shows its capability of identifying generalizable fea- tures by interpreting the highly predictive models, and thereby provide support for associated hypothesized mechanisms about transcriptional regulation. With the learned sparse linear models, we got results supporting the following biological insights: a) TFs govern the probability of RNAPII recruitment and initiation possibly through interactions with PIC components and transcription cofactors; b) the core promoter amplifies the transcript production probably by influencing PIC formation, RNAPII recruitment, DNA melting, RNAPII searching for and selecting TSS, releasing RNAPII from general transcription factors, and thereby initiation; c) there is strong transcriptional synergy between TFs and core promoter elements; d) the regulatory elements within core promoter region are more than TATA box and nucleosome free region, suggesting the existence of still unidentified TAF-dependent and cofactor-dependent core promoter elements in yeast S. cerevisiae; e) nucleosome occupancy is helpful for representing +1 and -1 nucleosomes’ regulatory roles on transcription.
Resumo:
Cellular exposure to hypoxia results in altered gene expression in a range of physiologic and pathophysiologic states. Discrete cohorts of genes can be either up- or down-regulated in response to hypoxia. While the Hypoxia-Inducible Factor (HIF) is the primary driver of hypoxia-induced adaptive gene expression, less is known about the signalling mechanisms regulating hypoxia-dependent gene repression. Using RNA-seq, we demonstrate that equivalent numbers of genes are induced and repressed in human embryonic kidney (HEK293) cells. We demonstrate that nuclear localization of the Repressor Element 1-Silencing Transcription factor (REST) is induced in hypoxia and that REST is responsible for regulating approximately 20% of the hypoxia-repressed genes. Using chromatin immunoprecipitation assays we demonstrate that REST-dependent gene repression is at least in part mediated by direct binding to the promoters of target genes. Based on these data, we propose that REST is a key mediator of gene repression in hypoxia.
Resumo:
Sugarcane has an importance in Brazil due to sugar and biofuel production. Considering this aspect, there is basic research being done in order to understand its physiology to improve production. The aim of this research is the Base Excision Repair pathway, in special the enzyme MUTM DNA-glycosylase (formamidopyrimidine) which recognizes oxidized guanine in DNA. The sugarcane scMUTM genes were analyzed using four BACs (Bacterial Artificial Chromosome) from a sugarcane genomic library from R570 cultivar. The resulted showed the presence in the region that had homology to scMUTM the presence of transposable elements. Comparing the similarity, it was observed a highest similarity to Sorghum bicolor sequence, both nucleotide and peptide sequences. Furthermore, promoter regions from MUTM genes in some grass showed different cis-regulatory elements, among which, most were related to oxidative stress, suggesting a gene regulation by oxidative stress
Resumo:
The central role of translation regulation in the control of critical cellular processes has long been recognized. Yet the systematic exploration of quantitative changes in translation at a genome-wide scale in response to specific stimuli has only recently become technically feasible. Using a genetic approach, we have identified new Arabidopsis weak-ethylene insensitive mutants that also display defects in translation, which suggested the existence of a previously unknown molecular module involved in ethylene-mediated translation regulation of components of this signaling pathway. To explore this link in detail, we implemented for Arabidopsis the ribosome-footprinting technology, which enables the study of translation at a whole-genome level at single codon resolution[1]. Using ribosome-footprinting we examined the effects of short exposure to ethylene on the Arabidopsis translatome looking for ethylene-triggered changes in translation rates that could not be explained by changes in transcript levels. The results of this research, in combination with the characterization of a subset of the aforementioned weak-ethylene insensitive mutants that are defective in the UPF genes (core-components of the nonsense-mediated mRNA decay machinery), uncovered a translation-based branch of the ethylene signaling pathway[2]. In the presence of ethylene, translation of a negative regulator of ethylene signaling EBF2 is repressed, despite induced transcription of this gene. These translational effects of ethylene require the long 3´UTR of EBF2 (3´EBF2), which is recognized by the C-terminal end of the key ethylene-signaling protein EIN2 (EIN2C) in the cytoplasm once EIN2C is released from the ER-membrane by proteolytic cleavage. EIN2C binds the 3´EBF2, recruits the UPF proteins and moves to P-bodies, where the translation of EBF2 in inhibited despite its mRNA accumulation. Once the ethylene signal is withdrawn, the translation of the stored EBF2 mRNAs is resumed, thus rapidly dampening the ethylene response. These findings represent a mechanistic paradigm of gene-specific regulation of translation in response to a key growth regulator. Translation regulatory elements can be located in both 3′ and 5′ UTRs. We are now focusing on the ead1 and ead2 mutants, another set of ethylene-signaling mutants defective in translational regulation. Ribosome-footprinting on the ead1 mutant revealed an accumulation of translating ribosomes in the 5´UTRs of uORF-containing genes and reduction in the levels of ribosomes in the main ORF. The mutant is also impaired in the translation of GFP when this reporter is fused to WT 5´UTR of potential EAD1 targets but not when GFP is fused to the uORF-less versions of the same 5´UTRs. Our hypothesis is that EAD1/2 work as a complex that is required for the efficient translation of mRNAs that have common structural (complex 5´UTR with uORFs) and functional (regulation of key cellular processes) features. We are working towards the identification of the conditions where the EAD1 regulation of translation is required. [1] Ingolia, N. et al. (2009) Genome-Wide Analysis in Vivo of Translation with Nucleotide Resolution Using Ribosome Profiling. Science, 324; 218-222 [2] Merchante, C. et al. (2015) Gene-Specific Translation Regulation Mediated by the Hormone-Signaling Molecule EIN2. Cell, 163(3): 684-697
Resumo:
La chromatine eucaryote, contenant l’ADN et de nombreuses protéines de liaison, subit une compaction dynamique et fonctionnelle à de multiples échelles, nécessaire pour la régulation de nombreux processus biologiques comme l’expression génique. Afin de définir et maintenir les fonctions cellulaires, les protéines de la régulation transcriptionnelle et de la régulation de la structure chromatinienne agissent de concert pour orchestrer les programmes d’expression génique des cellules. Les facteurs de transcription opèrent de manière combinée et hiérarchique au niveau de nombreux éléments régulateurs, dont le fonctionnement est complexe et intégré, capables de générer de larges boucles topologiques pour réguler spécifiquement un promoteur cible à un moment précis. Le co-activateur transcriptionnel Mediator sert de centre d’interprétation, en connectant physiquement les régulateurs de la transcription à la machinerie transcriptionnelle, pour générer une réponse calibrée. Le complexe de maintenance de la structure des chromosomes, Cohesin, est impliqué dans la formation et la stabilisation des connexions génomiques à l’échelle de nombreuses structures chromatiniennes tri-dimensionnelles dont la caractérisation fonctionnelle commence à être explorée. Ensemble, les facteurs de transcription, Mediator et Cohesin contrôlent l’expression des programmes responsables du maintien de l’identité cellulaire. Les cellules cancéreuses présentent de nombreuses dérégulations au niveau transcriptionnel, et donc un programme d’expression aberrant. Nous avons démontré que les mécanismes de régulation qui contrôlent les cellules cancéreuses sont conservés, et proposons une stratégie qui permette de révéler les facteurs clefs dans la progression tumorale. Nous avons appliqué cette stratégie à la problématique de la résistance endocrinienne dans la progression du cancer du sein hormono-dépendant. Les résultats obtenus suggèrent que le complexe transcriptionnel AP-1 pourrait être impliqué dans l’acquisition et/ou le maintien de la résistance, en réponse aux pressions de sélection induites par les traitements hormonaux. Nous proposons une adaptation progressive et agressive des cellules cancéreuses par re-hiérarchisation des facteurs clefs qui contrôlent sa croissance.
Resumo:
Ojoplano (opo) is a vertebrate-specific gene that was first identified in medaka fish as a recessive mutant, showing both neural crest defects and a failure of optic cup folding. In humans, this gene is associated with genetic diseases including hereditary craniofacial malformations and schizophrenia. It is localized in a 2Mb gene desert flanked by insulator sequences, between the genes SLC35B and TFAp2a. This region, syntenic between all vertebrates, represents only 2% of chromosome 6. However, it includes 23% of the all conserved cis-regulatory elements in this chromosome. Using transgenesis assays in zebrafish, we screened the enhancer activity of this locus and obtain a collection of nine enhancers. These regulatory elements were all conserved from human to teleosts and showed epigenetic marks for enhancer activity. We could associate multiple enhancers with ororfacial celfting disease and in order to explore the functionality of the enhancers, we performed a bioinformatics analysis to search for transcription factor bindings in the enhancer sequences. In terms of gene regulation we observe that H6:10137 opo enhancer has two Vsx2 binding sites and that this transcription factor regulates the expression of opo during eye development. Our findings suggest that the regulation of Vsx2 over opo is essential for optic cup folding. So far, there is no clear connection between optic cup patterning and morphogenesis. Vsx2 provides this link by controlling the expression of opo.