990 resultados para Cis-acting regulatory variants
Resumo:
Cancer and cardio-vascular diseases are the leading causes of death world-wide. Caused by systemic genetic and molecular disruptions in cells, these disorders are the manifestation of profound disturbance of normal cellular homeostasis. People suffering or at high risk for these disorders need early diagnosis and personalized therapeutic intervention. Successful implementation of such clinical measures can significantly improve global health. However, development of effective therapies is hindered by the challenges in identifying genetic and molecular determinants of the onset of diseases; and in cases where therapies already exist, the main challenge is to identify molecular determinants that drive resistance to the therapies. Due to the progress in sequencing technologies, the access to a large genome-wide biological data is now extended far beyond few experimental labs to the global research community. The unprecedented availability of the data has revolutionized the capabilities of computational researchers, enabling them to collaboratively address the long standing problems from many different perspectives. Likewise, this thesis tackles the two main public health related challenges using data driven approaches. Numerous association studies have been proposed to identify genomic variants that determine disease. However, their clinical utility remains limited due to their inability to distinguish causal variants from associated variants. In the presented thesis, we first propose a simple scheme that improves association studies in supervised fashion and has shown its applicability in identifying genomic regulatory variants associated with hypertension. Next, we propose a coupled Bayesian regression approach -- eQTeL, which leverages epigenetic data to estimate regulatory and gene interaction potential, and identifies combinations of regulatory genomic variants that explain the gene expression variance. On human heart data, eQTeL not only explains a significantly greater proportion of expression variance in samples, but also predicts gene expression more accurately than other methods. We demonstrate that eQTeL accurately detects causal regulatory SNPs by simulation, particularly those with small effect sizes. Using various functional data, we show that SNPs detected by eQTeL are enriched for allele-specific protein binding and histone modifications, which potentially disrupt binding of core cardiac transcription factors and are spatially proximal to their target. eQTeL SNPs capture a substantial proportion of genetic determinants of expression variance and we estimate that 58% of these SNPs are putatively causal. The challenge of identifying molecular determinants of cancer resistance so far could only be dealt with labor intensive and costly experimental studies, and in case of experimental drugs such studies are infeasible. Here we take a fundamentally different data driven approach to understand the evolving landscape of emerging resistance. We introduce a novel class of genetic interactions termed synthetic rescues (SR) in cancer, which denotes a functional interaction between two genes where a change in the activity of one vulnerable gene (which may be a target of a cancer drug) is lethal, but subsequently altered activity of its partner rescuer gene restores cell viability. Next we describe a comprehensive computational framework --termed INCISOR-- for identifying SR underlying cancer resistance. Applying INCISOR to mine The Cancer Genome Atlas (TCGA), a large collection of cancer patient data, we identified the first pan-cancer SR networks, composed of interactions common to many cancer types. We experimentally test and validate a subset of these interactions involving the master regulator gene mTOR. We find that rescuer genes become increasingly activated as breast cancer progresses, testifying to pervasive ongoing rescue processes. We show that SRs can be utilized to successfully predict patients' survival and response to the majority of current cancer drugs, and importantly, for predicting the emergence of drug resistance from the initial tumor biopsy. Our analysis suggests a potential new strategy for enhancing the effectiveness of existing cancer therapies by targeting their rescuer genes to counteract resistance. The thesis provides statistical frameworks that can harness ever increasing high throughput genomic data to address challenges in determining the molecular underpinnings of hypertension, cardiovascular disease and cancer resistance. We discover novel molecular mechanistic insights that will advance the progress in early disease prevention and personalized therapeutics. Our analyses sheds light on the fundamental biological understanding of gene regulation and interaction, and opens up exciting avenues of translational applications in risk prediction and therapeutics.
Resumo:
We show for the first time that upon injection into the cytoplasm of the oocyte, fluorescein-labeled spliceosomal snRNAs, in the context of functional snRNPs, are targeted to elongating pre-mRNAs. This finding presents us with a novel assay with which to dissect the mechanism by which snRNPs are targeted to nascent pre-mRNA transcripts. Two critical advantages offered by this system are immediately evident. First, it allows us to investigate the mechanisms employed to recruit snRNPs as it actually transpires within the realm of the cell nucleus. Second, it allows a genome-wide analysis of snRNP recruitment to nascent transcripts, and, hence, the conclusions drawn from these studies do not depend on the sequence of any particular promoter or pre-mRNA. Indeed, it is with this assay that we have stumbled upon a most unanticipated discovery: Contrary to the current paradigm, the co-transcriptional recruitment of splicing snRNPs to nascent transcripts is not contingent on their role in splicing in vivo. Based on these and other data, we have constructed a two-step recruitment-loading model wherein snRNPs are first recruited to pre-mRNA transcripts and only then loaded directly onto cis-acting sequences on nascent pre-mRNA. While conducting studies on snRNP trafficking, a new discovery was made. We found that the lampbrush chromosomes could be visualized by light microscopy in vivo, and that these chromosomes have an architecture that is identical with those in formaldehyde treated nuclear spread preparations. Importantly, we now have the first system with which we can examine the dynamic interactions of macromolecules with specific RNA polymerase II transcriptional units in the live nucleus.
Resumo:
The HIV-1 transcript is alternatively spliced to over 30 different mRNAs. Whether RNA secondary structure can influence HIV-1 RNA alternative splicing has not previously been examined. Here we have determined the secondary structure of the HIV-1/BRU RNA segment, containing the alternative A3, A4a, A4b, A4c and A5 3′ splice sites. Site A3, required for tat mRNA production, is contained in the terminal loop of a stem–loop structure (SLS2), which is highly conserved in HIV-1 and related SIVcpz strains. The exon splicing silencer (ESS2) acting on site A3 is located in a long irregular stem–loop structure (SLS3). Two SLS3 domains were protected by nuclear components under splicing condition assays. One contains the A4c branch points and a putative SR protein binding site. The other one is adjacent to ESS2. Unexpectedly, only the 3′ A residue of ESS2 was protected. The suboptimal A3 polypyrimidine tract (PPT) is base paired. Using site-directed mutagenesis and transfection of a mini-HIV-1 cDNA into HeLa cells, we found that, in a wild-type PPT context, a mutation of the A3 downstream sequence that reinforced SLS2 stability decreased site A3 utilization. This was not the case with an optimized PPT. Hence, sequence and secondary structure of the PPT may cooperate in limiting site A3 utilization.
Resumo:
Significant differences in levels of copia [Drosophila long terminal repeat (LTR) retrotransposon] expression exist among six species representing the Drosophila melanogaster species complex (D. melanogaster, Drosophila mauritiana, Drosophila simulans, Drosophila sechellia, Drosophila yakuba, and Drosophila erecta) and a more distantly related species (Drosophila willistoni). These differences in expression are correlated with major size variation mapping to putative regulatory regions of the copia 5' LTR and adjacent untranslated leader region (ULR). Sequence analysis indicates that these size variants were derived from a series of regional duplication events. The ability of the copia LTR-ULR size variants to drive expression of a bacterial chloramphenicol acetyltransferase reporter gene was tested in each of the seven species. The results indicate that both element-encoded (cis) and host-genome-encoded (trans) genetic differences are responsible for the variability in copia expression within and between Drosophila species. This finding indicates that models purporting to explain the dynamics and distribution of retrotransposons in natural populations must consider the potential impact of both element-encoded and host-genome-encoded regulatory variation to be valid. We propose that interelement selection among retrotransposons may provide a molecular drive mechanism for the evolution of eukaryotic enhancers which can be subsequently distributed throughout the genome by retrotransposition.
Resumo:
Background Transcription factors (TFs) co-ordinately regulate target genes that are dispersed throughout the genome. This co-ordinate regulation is achieved, in part, through the interaction of transcription factors with conserved cis-regulatory motifs that are in close proximity to the target genes. While much is known about the families of transcription factors that regulate gene expression in plants, there are few well characterised cis-regulatory motifs. In Arabidopsis, over-expression of the MYB transcription factor PAP1 (PRODUCTION OF ANTHOCYANIN PIGMENT 1) leads to transgenic plants with elevated anthocyanin levels due to the co-ordinated up-regulation of genes in the anthocyanin biosynthetic pathway. In addition to the anthocyanin biosynthetic genes, there are a number of un-associated genes that also change in expression level. This may be a direct or indirect consequence of the over-expression of PAP1. Results Oligo array analysis of PAP1 over-expression Arabidopsis plants identified genes co-ordinately up-regulated in response to the elevated expression of this transcription factor. Transient assays on the promoter regions of 33 of these up-regulated genes identified eight promoter fragments that were transactivated by PAP1. Bioinformatic analysis on these promoters revealed a common cis-regulatory motif that we showed is required for PAP1 dependent transactivation. Conclusion Co-ordinated gene regulation by individual transcription factors is a complex collection of both direct and indirect effects. Transient transactivation assays provide a rapid method to identify direct target genes from indirect target genes. Bioinformatic analysis of the promoters of these direct target genes is able to locate motifs that are common to this sub-set of promoters, which is impossible to identify with the larger set of direct and indirect target genes. While this type of analysis does not prove a direct interaction between protein and DNA, it does provide a tool to characterise cis-regulatory sequences that are necessary for transcription activation in a complex list of co-ordinately regulated genes.
Resumo:
Individual copies of tRNA1Gly from within the multigene family in Bombyx mori could be classified based on in vitro transcription in homologous nuclear extracts into three categories of highly, moderately, or weakly transcribed genes. Segregation of the poorly transcribed gene copies 6 and 7, which are clustered in tandem within 425 base pairs, resulted in enhancement of their individual transcription levels, but the linkage itself had little influence on the transcriptional status. For these gene copies, when fused together generating a single coding region, transcription was barely detectable, which suggested the presence of negatively regulating elements located in the far flanking sequences. They exerted the silencing effect on transcription overriding the activity of positive regulatory elements. Systematic analysis of deletion, chimeric, and mutant constructs revealed the presence of a sequence element TATATAA located beyond 800 nucleotides upstream to the coding region acting as negative modulator, which when mutated resulted in high level transcription. Conversely, a TATATAA motif reintroduced at either far upstream or far downstream flanking regions exerted a negative effect on transcription. The location of cis-regulatory sequences at such farther distances from the coding region and the behavior of TATATAA element as negative regulator reported here are novel. These element(s) could play significant roles in activation or silencing of genes from within a multigene family, by recruitment or sequestration of transcription factors.
Resumo:
The rapid recent increase in microarray-based gene expression studies in the corpus luteum (CL) utilizing macaque models gathered increasing volume of data in publically accessible microarray expression databases. Examining gene pathways in different functional states of CL may help to understand the factors that control luteal function and hence human fertility. Co-regulation of genes in microarray experiments may imply common transcriptional regulation by sequence-specific DNA-binding transcriptional factors. We have computationally analyzed the transcription factor binding sites (TFBS) in a previously reported macaque luteal microarray gene set (n = 15) that are common targets of luteotropin (luteinizing hormone (LH) and human chorionic gonadotropin (hCG)) and luteolysin (prostaglandin (PG) F-2 alpha). This in silico approach can reveal transcriptional networks that control these important genes which are representative of the interplay between luteotropic and luteolytic factors in the control of luteal function. Our computational analyses revealed 6 matrix families whose binding sites are significantly over-represented in promoters of these genes. The roles of these factors are discussed, which might help to understand the transcriptional regulatory network in the control of luteal function. These factors might be promising experimental targets for investigation of human luteal insufficiency. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Rapid and high wing-beat frequencies achieved during insect flight are powered by the indirect flight muscles, the largest group of muscles present in the thorax. Any anomaly during the assembly and/or structural impairment of the indirect flight muscles gives rise to a flightless phenotype. Multiple mutagenesis screens in Drosophila melanogaster for defective flight behavior have led to the isolation and characterization of mutations that have been instrumental in the identification of many proteins and residues that are important for muscle assembly, function, and disease. In this article, we present a molecular-genetic characterization of a flightless mutation, flightless-H (fliH), originally designated as heldup-a (hdp-a). We show that fliH is a cis-regulatory mutation of the wings up A (wupA) gene, which codes for the troponin-I protein, one of the troponin complex proteins, involved in regulation of muscle contraction. The mutation leads to reduced levels of troponin-I transcript and protein. In addition to this, there is also coordinated reduction in transcript and protein levels of other structural protein isoforms that are part of the troponin complex. The altered transcript and protein stoichiometry ultimately culminates in unregulated acto-myosin interactions and a hypercontraction muscle phenotype. Our results shed new insights into the importance of maintaining the stoichiometry of structural proteins during muscle assembly for proper function with implications for the identification of mutations and disease phenotypes in other species, including humans.
Resumo:
Colistin resistance is rare in Acinetobacter baumannii, and little is known about its mechanism. We investigated the role of PmrCAB in this trait, using (i) resistant and susceptible clinical strains, (ii) laboratory-selected mutants of the type strain ATCC 19606 and of the clinical isolate ABRIM, and (iii) a susceptible/resistant pair of isogenic clinical isolates, Ab15/133 and Ab15/132, isolated from the same patient. pmrAB sequences in all the colistin-susceptible isolates were identical to reference sequences, whereas resistant clinical isolates harbored one or two amino acid replacements variously located in PmrB. Single substitutions in PmrB were also found in resistant mutants of strains ATCC 19606 and ABRIM and in the resistant clinical isolate Ab15/132. No mutations in PmrA or PmrC were found. Reverse transcriptase (RT)-PCR identified increased expression of pmrA (4- to 13-fold), pmrB (2- to 7-fold), and pmrC (1- to 3-fold) in resistant versus susceptible organisms. Matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) mass spectrometry showed the addition of phosphoethanolamine to the hepta-acylated form of lipid A in the resistant variants and in strain ATCC 19606 grown under low-Mg induction conditions. pmrB gene knockout mutants of the colistin-resistant ATCC 19606 derivative showed >100-fold increased susceptibility to colistin and 5-fold decreased expression of pmrC; they also lacked the addition of phosphoethanolamine to lipid A. We conclude that the development of a moderate level of colistin resistance in A. baumannii requires distinct genetic events, including (i) at least one point mutation in pmrB, (ii) upregulation of pmrAB, and (iii) expression of pmrC, which lead to addition of phosphoethanolamine to lipid A. Copyright © 2011, American Society for Microbiology. All Rights Reserved.
Resumo:
Regulatory and coding variants are known to be enriched with associations identified by genome-wide association studies (GWASs) of complex disease, but their contributions to trait heritability are currently unknown. We applied variance-component methods to imputed genotype data for 11 common diseases to partition the heritability explained by genotyped SNPs () across functional categories (while accounting for shared variance due to linkage disequilibrium). Extensive simulations showed that in contrast to current estimates from GWAS summary statistics, the variance-component approach partitions heritability accurately under a wide range of complex-disease architectures. Across the 11 diseases DNaseI hypersensitivity sites (DHSs) from 217 cell types spanned 16% of imputed SNPs (and 24% of genotyped SNPs) but explained an average of 79% (SE = 8%) of from imputed SNPs (5.1× enrichment; p = 3.7 × 10−17) and 38% (SE = 4%) of from genotyped SNPs (1.6× enrichment, p = 1.0 × 10−4). Further enrichment was observed at enhancer DHSs and cell-type-specific DHSs. In contrast, coding variants, which span 1% of the genome, explained <10% of despite having the highest enrichment. We replicated these findings but found no significant contribution from rare coding variants in independent schizophrenia cohorts genotyped on GWAS and exome chips. Our results highlight the value of analyzing components of heritability to unravel the functional architecture of common disease.
Resumo:
The 5` cis-regulatory region of the CCR5 gene exhibits a strong signature of balancing selection in several human populations. Here we analyze the polymorphism of this region in Amerindians from Amazonia, who have a complex demographic history, including recent bottlenecks that are known to reduce genetic variability. Amerindians show high nucleotide diversity (pi = 0.27%) and significantly positive Tajima`s D, and carry haplotypes associated with weak and strong gene expression. To evaluate whether these signatures of balancing selection could be explained by demography, we perform neutrality tests based on empiric and simulated data. The observed Tajima`s D was higher than that of other world populations: higher than that found for 18 noncoding regions of South Amerindians, and higher than 99.6% of simulated genealogies, which assume nonequilibrium conditions. Moreover, comparing Amerindians and Asians, the Fst for CCR5 cis-regulatory region was unusually low, in relation to neutral markers. These findings indicate that, despite their complex demographic history, South Amerindians carry a detectable signature of selection on the CCR5 cis-regulatory region. (C) 2010 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.
Resumo:
Type XVIII collagen is a component of basement membranes, and expressed prominently in the eye, blood vessels, liver, and the central nervous system. Homozygous mutations in COL18A1 lead to Knobloch Syndrome, characterized by ocular defects and occipital encephalocele. However, relatively little has been described on the role of type XVIII collagen in development, and nothing is known about the regulation of its tissue-specific expression pattern. We have used zebrafish transgenesis to identify and characterize cis-regulatory sequences controlling expression of the human gene. Candidate enhancers were selected from non-coding sequence associated with COL18A1 based on sequence conservation among mammals. Although these displayed no overt conservation with orthologous zebrafish sequences, four regions nonetheless acted as tissue-specific transcriptional enhancers in the zebrafish embryo, and together recapitulated the major aspects of col18a1 expression. Additional post-hoc computational analysis on positive enhancer sequences revealed alignments between mammalian and teleost sequences, which we hypothesize predict the corresponding zebrafish enhancers; for one of these, we demonstrate functional overlap with the orthologous human enhancer sequence. Our results provide important insight into the biological function and regulation of COL18A1, and point to additional sequences that may contribute to complex diseases involving COL18A1. More generally, we show that combining functional data with targeted analyses for phylogenetic conservation can reveal conserved cis-regulatory elements in the large number of cases where computational alignment alone falls short. (C) 2009 Elsevier Inc. All rights reserved.