924 resultados para Coding Sequences
Resumo:
BACKGROUND: The evolutionary lineage leading to the teleost fish underwent a whole genome duplication termed FSGD or 3R in addition to two prior genome duplications that took place earlier during vertebrate evolution (termed 1R and 2R). Resulting from the FSGD, additional copies of genes are present in fish, compared to tetrapods whose lineage did not experience the 3R genome duplication. Interestingly, we find that ParaHox genes do not differ in number in extant teleost fishes despite their additional genome duplication from the genomic situation in mammals, but they are distributed over twice as many paralogous regions in fish genomes. RESULTS: We determined the DNA sequence of the entire ParaHox C1 paralogon in the East African cichlid fish Astatotilapia burtoni, and compared it to orthologous regions in other vertebrate genomes as well as to the paralogous vertebrate ParaHox D paralogons. Evolutionary relationships among genes from these four chromosomal regions were studied with several phylogenetic algorithms. We provide evidence that the genes of the ParaHox C paralogous cluster are duplicated in teleosts, just as it had been shown previously for the D paralogon genes. Overall, however, synteny and cluster integrity seems to be less conserved in ParaHox gene clusters than in Hox gene clusters. Comparative analyses of non-coding sequences uncovered conserved, possibly co-regulatory elements, which are likely to contain promoter motives of the genes belonging to the ParaHox paralogons. CONCLUSION: There seems to be strong stabilizing selection for gene order as well as gene orientation in the ParaHox C paralogon, since with a few exceptions, only the lengths of the introns and intergenic regions differ between the distantly related species examined. The high degree of evolutionary conservation of this gene cluster's architecture in particular - but possibly clusters of genes more generally - might be linked to the presence of promoter, enhancer or inhibitor motifs that serve to regulate more than just one gene. Therefore, deletions, inversions or relocations of individual genes could destroy the regulation of the clustered genes in this region. The existence of such a regulation network might explain the evolutionary conservation of gene order and orientation over the course of hundreds of millions of years of vertebrate evolution. Another possible explanation for the highly conserved gene order might be the existence of a regulator not located immediately next to its corresponding gene but further away since a relocation or inversion would possibly interrupt this interaction. Different ParaHox clusters were found to have experienced differential gene loss in teleosts. Yet the complete set of these homeobox genes was maintained, albeit distributed over almost twice the number of chromosomes. Selection due to dosage effects and/or stoichiometric disturbance might act more strongly to maintain a modal number of homeobox genes (and possibly transcription factors more generally) per genome, yet permit the accumulation of other (non regulatory) genes associated with these homeobox gene clusters.
Resumo:
Aleppo pine (Pinus halepensis Mill.) is a relevant conifer species for studying adaptive responses to drought and fire regimes in the Mediterranean region. In this study, we performed Illumina next-generation sequencing of two phenotypically divergent Aleppo pine accessions with the aims of (i) characterizing the transcriptome through Illumina RNA-Seq on trees phenotypically divergent for adaptive traits linked to fire adaptation and drought, (ii) performing a functional annotation of the assembled transcriptome, (iii) identifying genes with accelerated evolutionary rates, (iv) studying the expression levels of the annotated genes and (v) developing gene-based markers for population genomic and association genetic studies. The assembled transcriptome consisted of 48,629 contigs and covered about 54.6 Mbp. The comparison of Aleppo pine transcripts to Picea sitchensis protein-coding sequences resulted in the detection of 34,014 SNPs across species, with a Ka /Ks average value of 0.216, suggesting that the majority of the assembled genes are under negative selection. Several genes were differentially expressed across the two pine accessions with contrasted phenotypes, including a glutathione-s-transferase, a cellulose synthase and a cobra-like protein. A large number of new markers (3334 amplifiable SSRs and 28,236 SNPs) have been identified which should facilitate future population genomics and association genetics in this species. A 384-SNP Oligo Pool Assay for genotyping with the Illumina VeraCode technology has been designed which showed an high overall SNP conversion rate (76.6%). Our results showed that Illumina next-generation sequencing is a valuable technology to obtain an extensive overview on whole transcriptomes of nonmodel species with large genomes.
Resumo:
Mammals are characterized by specific phenotypic traits that include lactation, hair, and relatively large brains with unique structures. Individual mammalian lineages have, in turn, evolved characteristic traits that distinguish them from others. These include obvious anatom¬ical differences but also differences related to reproduction, life span, cognitive abilities, be¬havior. and disease susceptibility. However, the molecular basis of the diverse mammalian phenotypes and the selective pressures that shaped their evolution remain largely unknown. In the first part of my thesis, I analyzed the genetic factors associated with the origin of a unique mammalian phenotype lactation and I studied the selective pressures that forged the transition from oviparity to viviparity. Using a comparative genomics approach and evolutionary simulations, I showed that the emergence of lactation, as well as the appear¬ance of the casein gene family, significantly reduced selective pressure on the major egg-yolk proteins (the vitellogenin family). This led to a progressive loss of vitellogenins, which - in oviparous species - act as storage proteins for lipids, amino acids, phosphorous and calcium in the isolated egg. The passage to internal fertilization and placentation in therian mam¬mals rendered vitellogenins completely dispensable, which ended in the loss of the whole gene family in this lineage. As illustrated by the vitellogenin study, changes in gene content are one possible underlying factor for the evolution of mammalian-specific phenotypes. However, more subtle genomic changes, such as mutations in protein-coding sequences, can also greatly affect the phenotypes. In particular, it was proposed that changes at the level of gene reg¬ulation could underlie many (or even most) phenotypic differences between species. In the second part of my thesis, I participated in a major comparative study of mammalian tissue transcriptomes, with the goal of understanding how evolutionary forces affected expression patterns in the past 200 million years of mammalian evolution. I showed that, while com¬parisons of gene expressions are in agreement with the known species phylogeny, the rate of expression evolution varies greatly among lineages. Species with low effective population size, such as monotremes and hominoids, showed significantly accelerated rates of gene expression evolution. The most likely explanation for the high rate of gene expression evolution in these lineages is the accumulation of mildly deleterious mutations in regulatory regions, due to the low efficiency of purifying selection. Thus, our observations are in agreement with the nearly neutral theory of molecular evolution. I also describe substantial differences in evolutionary rates between tissues, with brain being the most constrained (especially in primates) and testis significantly accelerated. The rate of gene expression evolution also varies significantly between chromosomes. In particular, I observed an acceleration of gene expression changes on the X chromosome, probably as a result of adaptive processes associated with the origin of therian sex chromosomes. Lastly, I identified several individual genes as well as co-regulated expression modules that have undergone lineage specific expression changes and likely under¬lie various phenotypic innovations in mammals. The methods developed during my thesis, as well as the comprehensive gene content analyses and transcriptomics datasets made available by our group, will likely prove to be useful for further exploratory analyses of the diverse mammalian phenotypes.
Resumo:
The antiretroviral protein TRIM5alpha is known to have evolved different restriction capacities against various retroviruses, driven by positive Darwinian selection. However, how these different specificities have evolved in the primate lineages is not fully understood. Here we used ancestral protein resurrection to estimate the evolution of antiviral restriction specificities of TRIM5alpha on the primate lineage leading to humans. We used TRIM5alpha coding sequences from 24 primates for the reconstruction of ancestral TRIM5alpha sequences using maximum-likelihood and Bayesian approaches. Ancestral sequences were transduced into HeLa and CRFK cells. Stable cell lines were generated and used to test restriction of a panel of extant retroviruses (human immunodeficiency virus type 1 [HIV-1] and HIV-2, simian immunodeficiency virus [SIV] variants SIV(mac) and SIV(agm), and murine leukemia virus [MLV] variants N-MLV and B-MLV). The resurrected TRIM5alpha variant from the common ancestor of Old World primates (Old World monkeys and apes, approximately 25 million years before present) was effective against present day HIV-1. In contrast to the HIV-1 restriction pattern, we show that the restriction efficacy against other retroviruses, such as a murine oncoretrovirus (N-MLV), is higher for more recent resurrected hominoid variants. Ancestral TRIM5alpha variants have generally limited efficacy against HIV-2, SIV(agm), and SIV(mac). Our study sheds new light on the evolution of the intrinsic antiviral defense machinery and illustrates the utility of functional evolutionary reconstruction for characterizing recently emerged protein differences.
Resumo:
The nature and assembly of the chlamydial division septum is poorly defined due to the paucity of a detectable peptidoglycan (PG)-based cell wall, the inhibition of constriction by penicillin and the presence of coding sequences for cell wall precursor and remodelling enzymes in the reduced chlamydial (pan-)genome. Here we show that the chlamydial amidase (AmiA) is active and remodels PG in Escherichia coli. Moreover, forward genetics using an E. coli amidase mutant as entry point reveals that the chlamydial LysM-domain protein NlpD is active in an E. coli reporter strain for PG endopeptidase activity (ΔnlpI). Immunolocalization unveils NlpD as the first septal (cell-wall-binding) protein in Chlamydiae and we show that its septal sequestration depends on prior cell wall synthesis. Since AmiA assembles into peripheral clusters, trimming of a PG-like polymer or precursors occurs throughout the chlamydial envelope, while NlpD targets PG-like peptide crosslinks at the chlamydial septum during constriction.
Resumo:
We performed whole genome sequencing in 16 unrelated patients with autosomal recessive retinitis pigmentosa (ARRP), a disease characterized by progressive retinal degeneration and caused by mutations in over 50 genes, in search of pathogenic DNA variants. Eight patients were from North America, whereas eight were Japanese, a population for which ARRP seems to have different genetic drivers. Using a specific workflow, we assessed both the coding and noncoding regions of the human genome, including the evaluation of highly polymorphic SNPs, structural and copy number variations, as well as 69 control genomes sequenced by the same procedures. We detected homozygous or compound heterozygous mutations in 7 genes associated with ARRP (USH2A, RDH12, CNGB1, EYS, PDE6B, DFNB31, and CERKL) in eight patients, three Japanese and five Americans. Fourteen of the 16 mutant alleles identified were previously unknown. Among these, there was a 2.3-kb deletion in USH2A and an inverted duplication of ∼446 kb in EYS, which would have likely escaped conventional screening techniques or exome sequencing. Moreover, in another Japanese patient, we identified a homozygous frameshift (p.L206fs), absent in more than 2,500 chromosomes from ethnically matched controls, in the ciliary gene NEK2, encoding a serine/threonine-protein kinase. Inactivation of this gene in zebrafish induced retinal photoreceptor defects that were rescued by human NEK2 mRNA. In addition to identifying a previously undescribed ARRP gene, our study highlights the importance of rare structural DNA variations in Mendelian diseases and advocates the need for screening approaches that transcend the analysis of the coding sequences of the human genome.
Resumo:
We describe the unusual structure of a vaccinia virus late mRNA. In these molecules, the protein-coding sequences of a major late structural polypeptide are preceded by long leader RNAs, which in some cases are thousands of nucleotides long. These sequences map to different regions of the viral genome and in one instance are separated from the late gene by more than 100 kb of DNA. Moreover, the leader sequences map either upstream or downstream of the late gene, are transcribed from either DNA strand, and are fused to the late gene coding sequence via a poly(A) stretch. This demonstrates that vaccinia virus produces late mRNAs by tagging the protein-coding sequences onto the 3' end of other RNAs.
Resumo:
BACKGROUND: The mouse inbred line C57BL/6J is widely used in mouse genetics and its genome has been incorporated into many genetic reference populations. More recently large initiatives such as the International Knockout Mouse Consortium (IKMC) are using the C57BL/6N mouse strain to generate null alleles for all mouse genes. Hence both strains are now widely used in mouse genetics studies. Here we perform a comprehensive genomic and phenotypic analysis of the two strains to identify differences that may influence their underlying genetic mechanisms. RESULTS: We undertake genome sequence comparisons of C57BL/6J and C57BL/6N to identify SNPs, indels and structural variants, with a focus on identifying all coding variants. We annotate 34 SNPs and 2 indels that distinguish C57BL/6J and C57BL/6N coding sequences, as well as 15 structural variants that overlap a gene. In parallel we assess the comparative phenotypes of the two inbred lines utilizing the EMPReSSslim phenotyping pipeline, a broad based assessment encompassing diverse biological systems. We perform additional secondary phenotyping assessments to explore other phenotype domains and to elaborate phenotype differences identified in the primary assessment. We uncover significant phenotypic differences between the two lines, replicated across multiple centers, in a number of physiological, biochemical and behavioral systems. CONCLUSIONS: Comparison of C57BL/6J and C57BL/6N demonstrates a range of phenotypic differences that have the potential to impact upon penetrance and expressivity of mutational effects in these strains. Moreover, the sequence variants we identify provide a set of candidate genes for the phenotypic differences observed between the two strains.
Resumo:
A 6008 base pair fragment of the vaccinia virus DNA containing the gene for the precursor of the major core protein 4 a, which has been designated P4 a, was sequenced. A long open reading frame (ORF) encoding a protein of molecular weight 102,157 started close to the position where the P4 a mRNA had been mapped. Analysis of the mRNA by S1 nuclease mapping and primer extension indicated that the 5' end defined by the former method is not the true 5' end. This suggests that the P4 a coding region is preceded by leader sequences that are not derived from the immediate vicinity of the gene, similar to what has been reported for another late vaccinia virus mRNA. The sequenced DNA contained several further ORFs on the same, or opposite DNA strand, providing further evidence for the close spacing of protein-coding sequences in the viral genome.
Resumo:
Purpose: Microphthalmia and anophthalmia are at the severe end of the spectrum of abnormalities in ocular development. A few genes (orthodenticle homeobox 2 [OTX2], retina and anterior neural fold homeobox [RAX], SRY-box 2 [SOX2], CEH10 homeodomain-containing homolog [CHX10], and growth differentiation factor 6 [GDF6]) have been implicated mainly in isolated micro/anophthalmia but causative mutations of these genes explain less than a quarter of these developmental defects. The essential role of the LIM homeobox 2 (LHX2) transcription factor in early eye development has recently been documented. We postulated that mutations in this gene could lead to micro/anophthalmia, and thus performed molecular screening of its sequence in patients having micro/anophthalmia. Methods: Seventy patients having non-syndromic forms of colobomatous microphthalmia (n=25), isolated microphthalmia (n=18), or anophthalmia (n=17), and syndromic forms of micro/anophthalmia (n=10) were included in this study after negative molecular screening for OTX2, RAX, SOX2, and CHX10 mutations. Mutation screening of LHX2 was performed by direct sequencing of the coding sequences and intron/exon boundaries. Results: Two heterozygous variants of unknown significance (c.128C > G [p.Pro43Arg]; c.776C > A [p.Pro259Gln]) were identified in LHX2 among the 70 patients. These variations were not identified in a panel of 100 control patients of mixed origins. The variation c.776C > A (p.Pro259Gln) was considered as non pathogenic by in silico analysis, while the variation c.128C > G (p.Pro43Arg) considered as deleterious by in silico analysis and was inherited from the asymptomatic father. Conclusions: Mutations in LHX2 do not represent a frequent cause of micro/anophthalmia.
Resumo:
We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.
Resumo:
Amino acid tandem repeats, also called homopolymeric tracts, are extremely abundant in eukaryotic proteins. To gain insight into the genome-wide evolution of these regions in mammals, we analyzed the repeat content in a large data set of rat-mouse-human orthologs. Our results show that human proteins contain more amino acid repeats than rodent proteins and that trinucleotide repeats are also more abundant in human coding sequences. Using the human species as an outgroup, we were able to address differences in repeat loss and repeat gain in the rat and mouse lineages. In this data set, mouse proteins contain substantially more repeats than rat proteins, which can be at least partly attributed to a higher repeat loss in the rat lineage. The data are consistent with a role for trinucleotide slippage in the generation of novel amino acid repeats. We confirm the previously observed functional bias of proteins with repeats, with overrepresentation of transcription factors and DNA-binding proteins. We show that genes encoding amino acid repeats tend to have an unusually high GC content, and that differences in coding GC content among orthologs are directly related to the presence/absence of repeats. We propose that the different GC content isochore structure in rodents and humans may result in an increased amino acid repeat prevalence in the human lineage.
Resumo:
The IncP alpha promiscuous plasmid (R18, R68, RK2, RP1 and RP4) comprises 60,099 bp of nucleotide sequence, encoding at least 74 genes. About 40 kb of the genome, designated the IncP core and including all essential replication and transfer functions, can be aligned with equivalent sequences in the IncP beta plasmid R751. The compiled IncP alpha sequence revealed several previously unidentified reading frames that are potential genes. IncP alpha plasmids carry genetic information very efficiently: the coding sequences of the genes are closely packed but rarely overlap, and occupy almost 86% of the genome's nucleotide sequence. All of the 74 genes should be expressed, although there is as yet experimental evidence for expression of only 60 of them. Six examples of tandem-in-frame initiation sites specifying two gene products each are known. Two overlapping gene arrangements occupy different reading frames of the same region. Intergenic regions include most of the 25 promoters; transcripts are usually polycistronic. Translation of most of the open reading frames seems to be initiated independently, each from its own ribosomal binding and initiation site, although, a few cases of coupled translation have been reported. The most frequently used initiation codon is AUG but translation for a few open reading frames begins at GUG or UUG. The most common stop-codon is UGA followed by UAA and then UAG. Regulatory circuits are complex and largely dependent on two components of the central control operon. KorA and KorB are transcriptional repressors controlling at least seven operons. KorA and KorB act synergistically in several cases by recognizing and binding to conserved nucleotide sequences. Twelve KorB binding sites were found around the IncP alpha sequence and these are conserved in R751 (IncP beta) with respect to both sequence and location. Replication of IncP alpha plasmids requires oriV and the plasmid-encoded initiator protein TrfA in combination with the host-encoded replication machinery. Conjugative plasmid transfer depends on two separate regions occupying about half of the genome. The primary segregational stability system designated Par/Mrs consists of a putative site-specific recombinase, a possible partitioning apparatus and a post-segregational lethality mechanism, all encoded in two divergent operons. Proteins related to the products of F sop and P1 par partitioning genes are separately encoded in the central control operon.
Resumo:
Assessing the contribution of promoters and coding sequences to gene evolution is an important step toward discovering the major genetic determinants of human evolution. Many specific examples have revealed the evolutionary importance of cis-regulatory regions. However, the relative contribution of regulatory and coding regions to the evolutionary process and whether systemic factors differentially influence their evolution remains unclear. To address these questions, we carried out an analysis at the genome scale to identify signatures of positive selection in human proximal promoters. Next, we examined whether genes with positively selected promoters (Prom+ genes) show systemic differences with respect to a set of genes with positively selected protein-coding regions (Cod+ genes). We found that the number of genes in each set was not significantly different (8.1% and 8.5%, respectively). Furthermore, a functional analysis showed that, in both cases, positive selection affects almost all biological processes and only a few genes of each group are located in enriched categories, indicating that promoters and coding regions are not evolutionarily specialized with respect to gene function. On the other hand, we show that the topology of the human protein network has a different influence on the molecular evolution of proximal promoters and coding regions. Notably, Prom+ genes have an unexpectedly high centrality when compared with a reference distribution (P = 0.008, for Eigenvalue centrality). Moreover, the frequency of Prom+ genes increases from the periphery to the center of the protein network (P = 0.02, for the logistic regression coefficient). This means that gene centrality does not constrain the evolution of proximal promoters, unlike the case with coding regions, and further indicates that the evolution of proximal promoters is more efficient in the center of the protein network than in the periphery. These results show that proximal promoters have had a systemic contribution to human evolution by increasing the participation of central genes in the evolutionary process.
Resumo:
Background: The ratio of the rates of non-synonymous and synonymous substitution (d(N)/d(S)) is commonly used to estimate selection in coding sequences. It is often suggested that, all else being equal, d(N)/d(S) should be lower in populations with large effective size (Ne) due to increased efficacy of purifying selection. As N-e is difficult to measure directly, life history traits such as body mass, which is typically negatively associated with population size, have commonly been used as proxies in empirical tests of this hypothesis. However, evidence of whether the expected positive correlation between body mass and d(N)/d(S) is consistently observed is conflicting. Results: Employing whole genome sequence data from 48 avian species, we assess the relationship between rates of molecular evolution and life history in birds. We find a negative correlation between dN/dS and body mass, contrary to nearly neutral expectation. This raises the question whether the correlation might be a method artefact. We therefore in turn consider non-stationary base composition, divergence time and saturation as possible explanations, but find no clear patterns. However, in striking contrast to d(N)/d(S), the ratio of radical to conservative amino acid substitutions (K-r/K-c) correlates positively with body mass. Conclusions: Our results in principle accord with the notion that non-synonymous substitutions causing radical amino acid changes are more efficiently removed by selection in large populations, consistent with nearly neutral theory. These findings have implications for the use of d(N)/d(S) and suggest that caution is warranted when drawing conclusions about lineage-specific modes of protein evolution using this metric.