973 resultados para GC-CONTENT EVOLUTION
Biased gene conversion and GC-content evolution in the coding sequences of reptiles and vertebrates.
Resumo:
Mammalian and avian genomes are characterized by a substantial spatial heterogeneity of GC-content, which is often interpreted as reflecting the effect of local GC-biased gene conversion (gBGC), a meiotic repair bias that favors G and C over A and T alleles in high-recombining genomic regions. Surprisingly, the first fully sequenced nonavian sauropsid (i.e., reptile), the green anole Anolis carolinensis, revealed a highly homogeneous genomic GC-content landscape, suggesting the possibility that gBGC might not be at work in this lineage. Here, we analyze GC-content evolution at third-codon positions (GC3) in 44 vertebrates species, including eight newly sequenced transcriptomes, with a specific focus on nonavian sauropsids. We report that reptiles, including the green anole, have a genome-wide distribution of GC3 similar to that of mammals and birds, and we infer a strong GC3-heterogeneity to be already present in the tetrapod ancestor. We further show that the dynamic of coding sequence GC-content is largely governed by karyotypic features in vertebrates, notably in the green anole, in agreement with the gBGC hypothesis. The discrepancy between third-codon positions and noncoding DNA regarding GC-content dynamics in the green anole could not be explained by the activity of transposable elements or selection on codon usage. This analysis highlights the unique value of third-codon positions as an insertion/deletion-free marker of nucleotide substitution biases that ultimately affect the evolution of proteins.
Resumo:
Plasmodium falciparum is the parasite responsible for the most acute form of malaria in humans. Recently, the serine repeat antigen (SERA) in P. falciparum has attracted attention as a potential vaccine and drug target, and it has been shown to be a member of a large gene family. To clarify the relationships among the numerous P. falciparum SERAs and to identify orthologs to SERA5 and SERA6 in Plasmodium species affecting rodents, gene trees were inferred from nucleotide and amino acid sequence data for 33 putative SERA homologs in seven different species. (A distance method for nucleotide sequences that is specifically designed to accommodate differing GC content yielded results that were largely compatible with the amino acid tree. Standard-distance and maximum-likelihood methods for nucleotide sequences, on the other hand, yielded gene trees that differed in important respects.) To infer the pattern of duplication, speciation, and gene loss events in the SERA gene family history, the resulting gene trees were then "reconciled" with two competing Plasmodium species tree topologies that have been identified by previous phylogenetic studies. Parsimony of reconciliation was used as a criterion for selecting a gene tree/species tree pair and provided (1) support for one of the two species trees and for the core topology of the amino acid-derived gene tree, (2) a basis for critiquing fine detail in a poorly resolved region of the gene tree, (3) a set of predicted "missing genes" in some species, (4) clarification of the relationship among the P. falciparum SERA, and (5) some information about SERA5 and SERA6 orthologs in the rodent malaria parasites. Parsimony of reconciliation and a second criterion--implied mutational pattern at two key active sites in the SERA proteins-were also seen to be useful supplements to standard "bootstrap" analysis for inferred topologies.
Resumo:
BACKGROUND: The expansion of amino acid repeats is determined by a high mutation rate and can be increased or limited by selection. It has been suggested that recent expansions could be associated with the potential of adaptation to new environments. In this work, we quantify the strength of this association, as well as the contribution of potential confounding factors. RESULTS: Mammalian positively selected genes have accumulated more recent amino acid repeats than other mammalian genes. However, we found little support for an accelerated evolutionary rate as the main driver for the expansion of amino acid repeats. The most significant predictors of amino acid repeats are gene function and GC content. There is no correlation with expression level. CONCLUSIONS: Our analyses show that amino acid repeat expansions are causally independent from protein adaptive evolution in mammalian genomes. Relaxed purifying selection or positive selection do not associate with more or more recent amino acid repeats. Their occurrence is slightly favoured by the sequence context but mainly determined by the molecular function of the gene.
Resumo:
Positive selection is widely estimated from protein coding sequence alignments by the nonsynonymous-to-synonymous ratio omega. Increasingly elaborate codon models are used in a likelihood framework for this estimation. Although there is widespread concern about the robustness of the estimation of the omega ratio, more efforts are needed to estimate this robustness, especially in the context of complex models. Here, we focused on the branch-site codon model. We investigated its robustness on a large set of simulated data. First, we investigated the impact of sequence divergence. We found evidence of underestimation of the synonymous substitution rate for values as small as 0.5, with a slight increase in false positives for the branch-site test. When dS increases further, underestimation of dS is worse, but false positives decrease. Interestingly, the detection of true positives follows a similar distribution, with a maximum for intermediary values of dS. Thus, high dS is more of a concern for a loss of power (false negatives) than for false positives of the test. Second, we investigated the impact of GC content. We showed that there is no significant difference of false positives between high GC (up to similar to 80%) and low GC (similar to 30%) genes. Moreover, neither shifts of GC content on a specific branch nor major shifts in GC along the gene sequence generate many false positives. Our results confirm that the branch-site is a very conservative test.
Resumo:
Protein-coding genes evolve at different rates, and the influence of different parameters, from gene size to expression level, has been extensively studied. While in yeast gene expression level is the major causal factor of gene evolutionary rate, the situation is more complex in animals. Here we investigate these relations further, especially taking in account gene expression in different organs as well as indirect correlations between parameters. We used RNA-seq data from two large datasets, covering 22 mouse tissues and 27 human tissues. Over all tissues, evolutionary rate only correlates weakly with levels and breadth of expression. The strongest explanatory factors of purifying selection are GC content, expression in many developmental stages, and expression in brain tissues. While the main component of evolutionary rate is purifying selection, we also find tissue-specific patterns for sites under neutral evolution and for positive selection. We observe fast evolution of genes expressed in testis, but also in other tissues, notably liver, which are explained by weak purifying selection rather than by positive selection.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
In the present study, the coding region of the H gene was sequenced and analyzed in fourteen genera of New World primates (Alouatta, Aotus, Ateles, Brachyteles, Cacajao, Callicebus, Callithrix, Cebus, Chiropotes, Lagothrix, Leontopithecus, Pithecia, Saguinus, and Saimiri), in order to investigate the evolution of the gene. The analyses revealed that this coding region contains 1,101 nucleotides, with the exception of Brachyteles, the callitrichines (Callithrix, Leontopithecus, and Saguinus) and one species of Callicebus (moloch), in which one codon was deleted. In the primates studied, the high GC content (63%), the nonrandom distribution of codons and the low evolution rate of the gene (0.513 substitutions/site/MA in the order Primates) suggest the action of a purifying type of selective pressure, confirmed by the Z-test. Our analyses did not identify mutations equivalent to those responsible for the H-deficient phenotypes found in humans, nor any other alteration that might explain the lack of expression of the gene in the erythrocytes of Neotropical monkeys. The phylogenetic trees obtained for the H gene and the distance matrix data suggest the occurrence of divergent evolution in the primates.
Resumo:
Surrogate methods for detecting lateral gene transfer are those that do not require inference of phylogenetic trees. Herein I apply four such methods to identify open reading frames (ORFs) in the genome of Escherichia coli K12 that may have arisen by lateral gene transfer. Only two of these methods detect the same ORFs more frequently than expected by chance, whereas several intersections contain many fewer ORFs than expected. Each of the four methods detects a different non-random set of ORFs. The methods may detect lateral ORFs of different relative ages; testing this hypothesis will require rigorous inference of trees. (C) 2001 Federation of European Microbiological Societies. Published by Elsevier Science BN. All rights reserved.
Resumo:
A study od Typanosomatidae GC distribution and codon usage is presented. The codon usage patterns in coincidence with the phylogenetical data are similar in Crithidia and Leishmania, whereas they are more divergent in Trypanosoma brucei and T. cruzi. The analyisis of the GC mutational pressure in these organisms reveals that T. brucei, and to a lesser extent T. cruzi, have envolved towards a more balanced use of all bases, whereas Leishmania and Crithidia retain features of a primeval genetic apparatus. Tables with approximated GC mutational pressure in homologous genes, and codon usage in Trypanosomatidae are presented.
Resumo:
Arbuscular mycorrhizal fungi (AMF) are an ecologically important group of fungi. Previous studies showed the presence of divergent copies of beta-tubulin and V-type vacuolar H+-ATPase genes in AMF genomes and suggested horizontal gene transfer from host plants or mycoparasites to AMF. We sequenced these genes from DNA isolated from an in vitro cultured isolate of Glomus intraradices that was free of any obvious contaminants. We found two highly variable beta-tubulin sequences and variable H+-ATPase sequences. Despite this high variation, comparison of the sequences with those in gene banks supported a glomeromycotan origin of G. intraradices beta-tubulin and H+-ATPase sequences. Thus, our results are in sharp contrast with the previously reported polyphyletic origin of those genes. We present evidence that some highly divergent sequences of beta-tubulin and H+-ATPase deposited in the databases are likely to be contaminants. We therefore reject the prediction of horizontal transfer to AMF genomes. High differences in GC content between glomeromycotan sequences and sequences grouping in other lineages are shown and we suggest they can be used as an indicator to detect such contaminants. H+-ATPase phylogeny gave unexpected results and failed to resolve fungi as a natural group. beta-Tubulin phylogeny supported Glomeromeromycota as sister group of the Chytridiomycota. Contrasts between our results and trees previously generated using rDNA sequences are discussed.
Resumo:
Amino acid tandem repeats, also called homopolymeric tracts, are extremely abundant in eukaryotic proteins. To gain insight into the genome-wide evolution of these regions in mammals, we analyzed the repeat content in a large data set of rat-mouse-human orthologs. Our results show that human proteins contain more amino acid repeats than rodent proteins and that trinucleotide repeats are also more abundant in human coding sequences. Using the human species as an outgroup, we were able to address differences in repeat loss and repeat gain in the rat and mouse lineages. In this data set, mouse proteins contain substantially more repeats than rat proteins, which can be at least partly attributed to a higher repeat loss in the rat lineage. The data are consistent with a role for trinucleotide slippage in the generation of novel amino acid repeats. We confirm the previously observed functional bias of proteins with repeats, with overrepresentation of transcription factors and DNA-binding proteins. We show that genes encoding amino acid repeats tend to have an unusually high GC content, and that differences in coding GC content among orthologs are directly related to the presence/absence of repeats. We propose that the different GC content isochore structure in rodents and humans may result in an increased amino acid repeat prevalence in the human lineage.
Resumo:
The influenza virus has been a challenge to science due to its ability to withstand new environmental conditions. Taking into account the development of virus sequence databases, computational approaches can be helpful to understand virus behavior over time. Furthermore, they can suggest new directions to deal with influenza. This work presents triplet entropy analysis as a potential phylodynamic tool to quantify nucleotide organization of viral sequences. The application of this measure to segments of hemagglutinin (HA) and neuraminidase (NA) of H1N1 and H3N2 virus subtypes has shown some variability effects along timeline, inferring about virus evolution. Sequences were divided by year and compared for virus subtype (H1N1 and H3N2). The nonparametric Mann-Whitney test was used for comparison between groups. Results show that differentiation in entropy precedes differentiation in GC content for both groups. Considering the HA fragment, both triplet entropy as well as GC concentration show intersection in 2009, year of the recent pandemic. Some conclusions about possible flu evolutionary lines were drawn. © 2013 Elsevier B.V.
Resumo:
O surgimento das plataformas de sequenciamento de nova geração (NGS) proporcionou o aumento do volume de dados produzidos, tornando possível a obtenção de genomas completos. Apesar das vantagens alcançadas com estas plataformas, são observadas regiões de elevada ou baixa cobertura, em relação à média, associadas diretamente ao conteúdo GC. Este viés GC pode afetar análises genômicas e dificultar a montagem de genomas através da abordagem de novo, além de afetar as análises baseadas em referência. Além do que, as maneiras de avaliar o viés GC deve ser adequada para dados com diferentes perfis de relação/associação entre GC e cobertura, tais como linear e quadrático. Desta forma, este trabalho propõe o uso do Coeficiente de Correlação de Pearson (r) para analisar a correlação entre conteúdo GC e Cobertura, permitindo identificar aintensidade da correlação linear e detectar associações não-lineares, além de identificar a relação entre viés GC e as plataformas de sequenciamento. Os sinais positivos e negativos de r também permitem inferir relações diretamente proporcionais e inversamente proporcionais respectivamente. Utilizou-se dados da espécie Corynebacterium pseudotuberculosis, conhecido por serem genomas clonais obtidas através de diferentes tecnologias de sequenciamento para identificar se há relação do viés GC com as plataformas utilizadas.
Resumo:
Pseudogenes (Ψs), including processed and non-processed Ψs, are ubiquitous genetic elements derived from originally functional genes in all studied genomes within the three kingdoms of life. However, systematic surveys of non-processed Ψs utilizing genomic information from multiple samples within a species are still rare. Here a systematic comparative analysis was conducted of Ψs within 80 fully re-sequenced Arabidopsis thaliana accessions, and 7546 genes, representing ~28% of the genomic annotated open reading frames (ORFs), were found with disruptive mutations in at least one accession. The distribution of these Ψs on chromosomes showed a significantly negative correlation between Ψs/ORFs and their local gene densities, suggesting a higher proportion of Ψs in gene desert regions, e.g. near centromeres. On the other hand, compared with the non-Ψ loci, even the intact coding sequences (CDSs) in the Ψ loci were found to have shorter CDS length, fewer exon number and lower GC content. In addition, a significant functional bias against the null hypothesis was detected in the Ψs mainly involved in responses to environmental stimuli and biotic stress as reported, suggesting that they are likely important for adaptive evolution to rapidly changing environments by pseudogenization to accumulate successive mutations.