77 resultados para Noncoding Regions
em Université de Lausanne, Switzerland
Resumo:
BACKGROUND: Experimental evidences show that glutathione and its rate-limiting synthesizing enzyme, the glutamate-cysteine ligase (GCL), are involved in the pathogenesis of schizophrenia. Furthermore, genetic association has been previously reported between two single nucleotide polymorphisms lying in noncoding regions of glutamate cysteine ligase modifier (GCLM) gene, which specifies for the modifier subunit of GCL and schizophrenia. OBJECTIVE: We wanted to investigate the presence of GCLM true functional mutations, likely in linkage disequilibrium with the previously identified single nucleotide polymorphism alleles, in the same set of cases that allowed the detection of the original association signal. METHODS: We screened all the coding regions of GCLM and their intronic flanking vicinities in 353 patients with schizophrenia by direct DNA sequencing. RESULTS: Ten sequence variations were identified, five of which were not previously described. None of these DNA changes was within the GCLM coding sequence and in-silico analysis failed to indicate functional impairment induced by these variations. Furthermore, screening of normal controls and downstream statistical analyses revealed no significant relationship of any of these DNA variants with schizophrenia. CONCLUSION: It is unlikely that functional mutations in the GCLM gene could play a major role in genetic predisposition to schizophrenia and further studies will be required to assess its etiological function in the disease.
Resumo:
Species delimitation has been invigorated as a discipline in systematics by an influx of new character sets, analytical methods, and conceptual advances. We use genetic data from 68 markers, combined with distributional, bioclimatic, and coloration information, to hypothesize boundaries of evolutionarily independent lineages (species) within the widespread and highly variable nominal fire ant species Solenopsis saevissima, a member of a species group containing invasive pests as well as species that are models for ecological and evolutionary research. Our integrated approach uses diverse methods of analysis to sequentially test whether populations meet specific operational criteria (contingent properties) for candidacy as morphologically cryptic species, including genetic clustering, monophyly, reproductive isolation, and occupation of distinctive niche space. We hypothesize that nominal S. saevissima comprises at least 4-6 previously unrecognized species, including several pairs whose parapatric distributions implicate the development of intrinsic premating or postmating barriers to gene flow. Our genetic data further suggest that regional genetic differentiation in S. saevissima has been influenced by hybridization with other nominal species occurring in sympatry or parapatry, including the quite distantly related Solenopsis geminata. The results of this study illustrate the importance of employing different classes of genetic data (coding and noncoding regions and nuclear and mitochondrial DNA [mtDNA] markers), different methods of genetic data analysis (tree-based and non-tree based methods), and different sources of data (genetic, morphological, and ecological data) to explicitly test various operational criteria for species boundaries in clades of recently diverged lineages, while warning against over reliance on any single data type (e.g., mtDNA sequence variation) when drawing inferences.
Resumo:
We performed whole genome sequencing in 16 unrelated patients with autosomal recessive retinitis pigmentosa (ARRP), a disease characterized by progressive retinal degeneration and caused by mutations in over 50 genes, in search of pathogenic DNA variants. Eight patients were from North America, whereas eight were Japanese, a population for which ARRP seems to have different genetic drivers. Using a specific workflow, we assessed both the coding and noncoding regions of the human genome, including the evaluation of highly polymorphic SNPs, structural and copy number variations, as well as 69 control genomes sequenced by the same procedures. We detected homozygous or compound heterozygous mutations in 7 genes associated with ARRP (USH2A, RDH12, CNGB1, EYS, PDE6B, DFNB31, and CERKL) in eight patients, three Japanese and five Americans. Fourteen of the 16 mutant alleles identified were previously unknown. Among these, there was a 2.3-kb deletion in USH2A and an inverted duplication of ∼446 kb in EYS, which would have likely escaped conventional screening techniques or exome sequencing. Moreover, in another Japanese patient, we identified a homozygous frameshift (p.L206fs), absent in more than 2,500 chromosomes from ethnically matched controls, in the ciliary gene NEK2, encoding a serine/threonine-protein kinase. Inactivation of this gene in zebrafish induced retinal photoreceptor defects that were rescued by human NEK2 mRNA. In addition to identifying a previously undescribed ARRP gene, our study highlights the importance of rare structural DNA variations in Mendelian diseases and advocates the need for screening approaches that transcend the analysis of the coding sequences of the human genome.
Resumo:
Functional RNA structures play an important role both in the context of noncoding RNA transcripts as well as regulatory elements in mRNAs. Here we present a computational study to detect functional RNA structures within the ENCODE regions of the human genome. Since structural RNAs in general lack characteristic signals in primary sequence, comparative approaches evaluating evolutionary conservation of structures are most promising. We have used three recently introduced programs based on either phylogenetic-stochastic context-free grammar (EvoFold) or energy directed folding (RNAz and AlifoldZ), yielding several thousand candidate structures (corresponding to approximately 2.7% of the ENCODE regions). EvoFold has its highest sensitivity in highly conserved and relatively AU-rich regions, while RNAz favors slightly GC-rich regions, resulting in a relatively small overlap between methods. Comparison with the GENCODE annotation points to functional RNAs in all genomic contexts, with a slightly increased density in 3'-UTRs. While we estimate a significant false discovery rate of approximately 50%-70% many of the predictions can be further substantiated by additional criteria: 248 loci are predicted by both RNAz and EvoFold, and an additional 239 RNAz or EvoFold predictions are supported by the (more stringent) AlifoldZ algorithm. Five hundred seventy RNAz structure predictions fall into regions that show signs of selection pressure also on the sequence level (i.e., conserved elements). More than 700 predictions overlap with noncoding transcripts detected by oligonucleotide tiling arrays. One hundred seventy-five selected candidates were tested by RT-PCR in six tissues, and expression could be verified in 43 cases (24.6%).
Resumo:
This report presents systematic empirical annotation of transcript products from 399 annotated protein-coding loci across the 1% of the human genome targeted by the Encyclopedia of DNA elements (ENCODE) pilot project using a combination of 5' rapid amplification of cDNA ends (RACE) and high-density resolution tiling arrays. We identified previously unannotated and often tissue- or cell-line-specific transcribed fragments (RACEfrags), both 5' distal to the annotated 5' terminus and internal to the annotated gene bounds for the vast majority (81.5%) of the tested genes. Half of the distal RACEfrags span large segments of genomic sequences away from the main portion of the coding transcript and often overlap with the upstream-annotated gene(s). Notably, at least 20% of the resultant novel transcripts have changes in their open reading frames (ORFs), most of them fusing ORFs of adjacent transcripts. A significant fraction of distal RACEfrags show expression levels comparable to those of known exons of the same locus, suggesting that they are not part of very minority splice forms. These results have significant implications concerning (1) our current understanding of the architecture of protein-coding genes; (2) our views on locations of regulatory regions in the genome; and (3) the interpretation of sequence polymorphisms mapping to regions hitherto considered to be "noncoding," ultimately relating to the identification of disease-related sequence alterations.
Resumo:
Long noncoding RNAs (lncRNAs) are one of the most intensively studied groups of noncoding elements. Debate continues over what proportion of lncRNAs are functional or merely represent transcriptional noise. Although characterization of individual lncRNAs has identified approximately 200 functional loci across the Eukarya, general surveys have found only modest or no evidence of long-term evolutionary conservation. Although this lack of conservation suggests that most lncRNAs are nonfunctional, the possibility remains that some represent recent evolutionary innovations. We examine recent selection pressures acting on lncRNAs in mouse populations. We compare patterns of within-species nucleotide variation at approximately 10,000 lncRNA loci in a cohort of the wild house mouse, Mus musculus castaneus, with between-species nucleotide divergence from the rat (Rattus norvegicus). Loci under selective constraint are expected to show reduced nucleotide diversity and divergence. We find limited evidence of sequence conservation compared with putatively neutrally evolving ancestral repeats (ARs). Comparisons of sequence diversity and divergence between ARs, protein-coding (PC) exons and lncRNAs, and the associated flanking regions, show weak, but significantly lower levels of sequence diversity and divergence at lncRNAs compared with ARs. lncRNAs conserved deep in the vertebrate phylogeny show lower within-species sequence diversity than lncRNAs in general. A set of 74 functionally characterized lncRNAs show levels of diversity and divergence comparable to PC exons, suggesting that these lncRNAs are under substantial selective constraints. Our results suggest that, in mouse populations, most lncRNA loci evolve at rates similar to ARs, whereas older lncRNAs tend to show signals of selection similar to PC genes.
Resumo:
Genomes of eusocial insects code for dramatic examples of phenotypic plasticity and social organization. We compared the genomes of seven ants, the honeybee, and various solitary insects to examine whether eusocial lineages share distinct features of genomic organization. Each ant lineage contains ∼4000 novel genes, but only 64 of these genes are conserved among all seven ants. Many gene families have been expanded in ants, notably those involved in chemical communication (e.g., desaturases and odorant receptors). Alignment of the ant genomes revealed reduced purifying selection compared with Drosophila without significantly reduced synteny. Correspondingly, ant genomes exhibit dramatic divergence of noncoding regulatory elements; however, extant conserved regions are enriched for novel noncoding RNAs and transcription factor-binding sites. Comparison of orthologous gene promoters between eusocial and solitary species revealed significant regulatory evolution in both cis (e.g., Creb) and trans (e.g., fork head) for nearly 2000 genes, many of which exhibit phenotypic plasticity. Our results emphasize that genomic changes can occur remarkably fast in ants, because two recently diverged leaf-cutter ant species exhibit faster accumulation of species-specific genes and greater divergence in regulatory elements compared with other ants or Drosophila. Thus, while the "socio-genomes" of ants and the honeybee are broadly characterized by a pervasive pattern of divergence in gene composition and regulation, they preserve lineage-specific regulatory features linked to eusociality. We propose that changes in gene regulation played a key role in the origins of insect eusociality, whereas changes in gene composition were more relevant for lineage-specific eusocial adaptations.
Resumo:
Methyl-CpG Binding Domain (MBD) proteins are thought to be key molecules in the interpretation of DNA methylation signals leading to gene silencing through recruitment of chromatin remodeling complexes. In cancer, the MBD-family member, MBD2, may be primarily involved in the repression of genes exhibiting methylated CpG at their 5' end. Here we ask whether MBD2 randomly associates methylated sequences, producing chance effects on transcription, or exhibits a more specific recognition of some methylated regions. Using chromatin and DNA immunoprecipitation, we analyzed MBD2 and RNA polymerase II deposition and DNA methylation in HeLa cells on arrays representing 25,500 promoter regions. This first whole-genome mapping revealed the preferential localization of MBD2 near transcription start sites (TSSs), within the region analyzed, 7.5 kb upstream through 2.45 kb downstream of 5' transcription start sites. Probe by probe analysis correlated MBD2 deposition and DNA methylation. Motif analysis did not reveal specific sequence motifs; however, CCG and CGC sequences seem to be overrepresented. Nonrandom association (multiple correspondence analysis, p < 0.0001) between silent genes, DNA methylation and MBD2 binding was observed. The association between MBD2 binding and transcriptional repression weakened as the distance between binding site and TSS increased, suggesting that MBD2 represses transcriptional initiation. This hypothesis may represent a functional explanation for the preferential binding of MBD2 at methyl-CpG in TSS regions.
Improving the performance of positive selection inference by filtering unreliable alignment regions.
Resumo:
Errors in the inferred multiple sequence alignment may lead to false prediction of positive selection. Recently, methods for detecting unreliable alignment regions were developed and were shown to accurately identify incorrectly aligned regions. While removing unreliable alignment regions is expected to increase the accuracy of positive selection inference, such filtering may also significantly decrease the power of the test, as positively selected regions are fast evolving, and those same regions are often those that are difficult to align. Here, we used realistic simulations that mimic sequence evolution of HIV-1 genes to test the hypothesis that the performance of positive selection inference using codon models can be improved by removing unreliable alignment regions. Our study shows that the benefit of removing unreliable regions exceeds the loss of power due to the removal of some of the true positively selected sites.
Resumo:
Odorous chemicals are detected by the mouse main olfactory epithelium (MOE) by about 1100 types of olfactory receptors (OR) expressed by olfactory sensory neurons (OSNs). Each mature OSN is thought to express only one allele of a single OR gene. Major impediments to understand the transcriptional control of OR gene expression are the lack of a proper characterization of OR transcription start sites (TSSs) and promoters, and of regulatory transcripts at OR loci. We have applied the nanoCAGE technology to profile the transcriptome and the active promoters in the MOE. nanoCAGE analysis revealed the map and architecture of promoters for 87.5% of the mouse OR genes, as well as the expression of many novel noncoding RNAs including antisense transcripts. We identified candidate transcription factors for OR gene expression and among them confirmed by chromatin immunoprecipitation the binding of TBP, EBF1 (OLF1), and MEF2A to OR promoters. Finally, we showed that a short genomic fragment flanking the major TSS of the OR gene Olfr160 (M72) can drive OSN-specific expression in transgenic mice.
Resumo:
BACKGROUND: The comparison of complete genomes has revealed surprisingly large numbers of conserved non-protein-coding (CNC) DNA regions. However, the biological function of CNC remains elusive. CNC differ in two aspects from conserved protein-coding regions. They are not conserved across phylum boundaries, and they do not contain readily detectable sub-domains. Here we characterize the persistence length and time of CNC and conserved protein-coding regions in the vertebrate and insect lineages. RESULTS: The persistence length is the length of a genome region over which a certain level of sequence identity is consistently maintained. The persistence time is the evolutionary period during which a conserved region evolves under the same selective constraints.Our main findings are: (i) Insect genomes contain 1.60 times less conserved information than vertebrates; (ii) Vertebrate CNC have a higher persistence length than conserved coding regions or insect CNC; (iii) CNC have shorter persistence times as compared to conserved coding regions in both lineages. CONCLUSION: Higher persistence length of vertebrate CNC indicates that the conserved information in vertebrates and insects is organized in functional elements of different lengths. These findings might be related to the higher morphological complexity of vertebrates and give clues about the structure of active CNC elements.Shorter persistence time might explain the previously puzzling observations of highly conserved CNC within each phylum, and of a lack of conservation between phyla. It suggests that CNC divergence might be a key factor in vertebrate evolution. Further evolutionary studies will help to relate individual CNC to specific developmental processes.
Biased gene conversion and GC-content evolution in the coding sequences of reptiles and vertebrates.
Resumo:
Mammalian and avian genomes are characterized by a substantial spatial heterogeneity of GC-content, which is often interpreted as reflecting the effect of local GC-biased gene conversion (gBGC), a meiotic repair bias that favors G and C over A and T alleles in high-recombining genomic regions. Surprisingly, the first fully sequenced nonavian sauropsid (i.e., reptile), the green anole Anolis carolinensis, revealed a highly homogeneous genomic GC-content landscape, suggesting the possibility that gBGC might not be at work in this lineage. Here, we analyze GC-content evolution at third-codon positions (GC3) in 44 vertebrates species, including eight newly sequenced transcriptomes, with a specific focus on nonavian sauropsids. We report that reptiles, including the green anole, have a genome-wide distribution of GC3 similar to that of mammals and birds, and we infer a strong GC3-heterogeneity to be already present in the tetrapod ancestor. We further show that the dynamic of coding sequence GC-content is largely governed by karyotypic features in vertebrates, notably in the green anole, in agreement with the gBGC hypothesis. The discrepancy between third-codon positions and noncoding DNA regarding GC-content dynamics in the green anole could not be explained by the activity of transposable elements or selection on codon usage. This analysis highlights the unique value of third-codon positions as an insertion/deletion-free marker of nucleotide substitution biases that ultimately affect the evolution of proteins.