994 resultados para GC content


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Amino acid tandem repeats, also called homopolymeric tracts, are extremely abundant in eukaryotic proteins. To gain insight into the genome-wide evolution of these regions in mammals, we analyzed the repeat content in a large data set of rat-mouse-human orthologs. Our results show that human proteins contain more amino acid repeats than rodent proteins and that trinucleotide repeats are also more abundant in human coding sequences. Using the human species as an outgroup, we were able to address differences in repeat loss and repeat gain in the rat and mouse lineages. In this data set, mouse proteins contain substantially more repeats than rat proteins, which can be at least partly attributed to a higher repeat loss in the rat lineage. The data are consistent with a role for trinucleotide slippage in the generation of novel amino acid repeats. We confirm the previously observed functional bias of proteins with repeats, with overrepresentation of transcription factors and DNA-binding proteins. We show that genes encoding amino acid repeats tend to have an unusually high GC content, and that differences in coding GC content among orthologs are directly related to the presence/absence of repeats. We propose that the different GC content isochore structure in rodents and humans may result in an increased amino acid repeat prevalence in the human lineage.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Protein-coding genes evolve at different rates, and the influence of different parameters, from gene size to expression level, has been extensively studied. While in yeast gene expression level is the major causal factor of gene evolutionary rate, the situation is more complex in animals. Here we investigate these relations further, especially taking in account gene expression in different organs as well as indirect correlations between parameters. We used RNA-seq data from two large datasets, covering 22 mouse tissues and 27 human tissues. Over all tissues, evolutionary rate only correlates weakly with levels and breadth of expression. The strongest explanatory factors of purifying selection are GC content, expression in many developmental stages, and expression in brain tissues. While the main component of evolutionary rate is purifying selection, we also find tissue-specific patterns for sites under neutral evolution and for positive selection. We observe fast evolution of genes expressed in testis, but also in other tissues, notably liver, which are explained by weak purifying selection rather than by positive selection.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The design of a large and reliable DNA codeword library is a key problem in DNA based computing. DNA codes, namely sets of fixed length edit metric codewords over the alphabet {A, C, G, T}, satisfy certain combinatorial constraints with respect to biological and chemical restrictions of DNA strands. The primary constraints that we consider are the reverse--complement constraint and the fixed GC--content constraint, as well as the basic edit distance constraint between codewords. We focus on exploring the theory underlying DNA codes and discuss several approaches to searching for optimal DNA codes. We use Conway's lexicode algorithm and an exhaustive search algorithm to produce provably optimal DNA codes for codes with small parameter values. And a genetic algorithm is proposed to search for some sub--optimal DNA codes with relatively large parameter values, where we can consider their sizes as reasonable lower bounds of DNA codes. Furthermore, we provide tables of bounds on sizes of DNA codes with length from 1 to 9 and minimum distance from 1 to 9.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Recurrent submicroscopic genomic copy number changes are the result of nonallelic homologous recombination (NAHR). Nonrecurrent aberrations, however, can result from different nonexclusive recombination-repair mechanisms. We previously described small microduplications at Xq28 containing MECP2 in four male patients with a severe neurological phenotype. Here, we report on the fine-mapping and breakpoint analysis of 16 unique microduplications. The size of the overlapping copy number changes varies between 0.3 and 2.3 Mb, and FISH analysis on three patients demonstrated a tandem orientation. Although eight of the 32 breakpoint regions coincide with low-copy repeats, none of the duplications are the result of NAHR. Bioinformatics analysis of the breakpoint regions demonstrated a 2.5-fold higher frequency of Alu interspersed repeats as compared with control regions, as well as a very high GC content (53%). Unexpectedly, we obtained the junction in only one patient by long-range PCR, which revealed nonhomologous end joining as the mechanism. Breakpoint analysis in two other patients by inverse PCR and subsequent array comparative genomic hybridization analysis demonstrated the presence of a second duplicated region more telomeric at Xq28, of which one copy was inserted in between the duplicated MECP2 regions. These data suggest a two-step mechanism in which part of Xq28 is first inserted near the MECP2 locus, followed by breakage-induced replication with strand invasion of the normal sister chromatid. Our results indicate that the mechanism by which copy number changes occur in regions with a complex genomic architecture can yield complex rearrangements.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The genome sequence of Leifsonia xyli subsp. xyli, which causes ratoon stunting disease and affects sugarcane worldwide, was determined. The single circular chromosome of Leifsonia xyli subsp. xyli CTCB07 was 2.6 Mb in length with a GC content of 68% and 2,044 predicted open reading frames. The analysis also revealed 307 predicted pseudogenes, which is more than any bacterial plant pathogen sequenced to date. Many of these pseudogenes, if functional, would likely be involved in the degradation of plant heteropolysaccharides, uptake of free sugars, and synthesis of amino acids. Although L. xyli subsp. xyli has only been identified colonizing the xylem vessels of sugarcane, the numbers of predicted regulatory genes and sugar transporters are similar to those in free-living organisms. Some of the predicted pathogenicity genes appear to have been acquired by lateral transfer and include genes for cellulase, pectinase, wilt-inducing protein, lysozyme, and desaturase. The presence of the latter may contribute to stunting, since it is likely involved in the synthesis of abscisic acid, a hormone that arrests growth. Our findings are consistent with the nutritionally fastidious behavior exhibited by L. xyli subsp. xyli and suggest an ongoing adaptation to the restricted ecological niche it inhabits.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The accurate specific identification of ticks is essential for the study, control and prevention of tick-borne diseases. Herein, we determined ribosomal nucleotide sequences of the second internal transcribed spacer (ITS2) of 15 Neotropical hard tick species of the genus Amblyomma Koch found in Brazil. Most of the studied ticks accidentally parasite humans and potentially act as vectors of zoonoses. Lengths of the ITS2 sequences ranged from 956 to 1,207 bp, whereas GC content varied from 62.4 to 66.9%. A matrix of ITS2 divergence was calculated with the ITS2 sequence data obtained showing divergence levels varying from 1.5 to 28.8%. The analysis indicated that this molecular marker can be useful for Amblyomma-specific identification. Phylogenetic inferences based on the ITS2 sequences were used to assess some issues in subgenus taxonomy. © 2007 Entomological Society of America.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Chromosome mapping and studies of the genomic organization of repetitive DNA sequences provide valuable insights that enhance our evolutionary and structural understanding of these sequences, as well as identifying chromosomal rearrangements and sex determination. This study investigated the occurrence and organization of repetitive DNA sequences in Leporinus elongatus using restriction enzyme digestion and the mapping of sequences by chromosomal fluorescence in situ hybridization (FISH). A 378-bp fragment with a 54.2% GC content was isolated after digestion with the SmaI restriction enzyme. BLASTN search found no similarity with previously described sequences, so this repetitive sequence was named LeSmaI. FISH experiments were conducted using L. elongatus and other Anostomidae species, i.e. L. macrocephalus,L. obtusidens, L. striatus, L. lacustris, L. friderici, Schizodon borellii, S. isognathus, and Abramites hypselonotus which detected signals that were unique to male and female L. elongatus individuals. Double-FISH using LeSmaI and 18S rDNA showed that LeSmaI was located in a nucleolus organizer region (NOR) in the male and female metaphases of L. elongatus. This report also discusses the role of repetitive DNA associated with NORs in the diversification of Anostomidae species karyotypes. Copyright © 2012 S. Karger AG, Basel.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The influenza virus has been a challenge to science due to its ability to withstand new environmental conditions. Taking into account the development of virus sequence databases, computational approaches can be helpful to understand virus behavior over time. Furthermore, they can suggest new directions to deal with influenza. This work presents triplet entropy analysis as a potential phylodynamic tool to quantify nucleotide organization of viral sequences. The application of this measure to segments of hemagglutinin (HA) and neuraminidase (NA) of H1N1 and H3N2 virus subtypes has shown some variability effects along timeline, inferring about virus evolution. Sequences were divided by year and compared for virus subtype (H1N1 and H3N2). The nonparametric Mann-Whitney test was used for comparison between groups. Results show that differentiation in entropy precedes differentiation in GC content for both groups. Considering the HA fragment, both triplet entropy as well as GC concentration show intersection in 2009, year of the recent pandemic. Some conclusions about possible flu evolutionary lines were drawn. © 2013 Elsevier B.V.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the present study, the coding region of the H gene was sequenced and analyzed in fourteen genera of New World primates (Alouatta, Aotus, Ateles, Brachyteles, Cacajao, Callicebus, Callithrix, Cebus, Chiropotes, Lagothrix, Leontopithecus, Pithecia, Saguinus, and Saimiri), in order to investigate the evolution of the gene. The analyses revealed that this coding region contains 1,101 nucleotides, with the exception of Brachyteles, the callitrichines (Callithrix, Leontopithecus, and Saguinus) and one species of Callicebus (moloch), in which one codon was deleted. In the primates studied, the high GC content (63%), the nonrandom distribution of codons and the low evolution rate of the gene (0.513 substitutions/site/MA in the order Primates) suggest the action of a purifying type of selective pressure, confirmed by the Z-test. Our analyses did not identify mutations equivalent to those responsible for the H-deficient phenotypes found in humans, nor any other alteration that might explain the lack of expression of the gene in the erythrocytes of Neotropical monkeys. The phylogenetic trees obtained for the H gene and the distance matrix data suggest the occurrence of divergent evolution in the primates.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Vibrio campbellii PEL22A was isolated from open ocean water in the Abrolhos Bank. The genome of PEL22A consists of 6,788,038 bp (the GC content is 45%). The number of coding sequences (CDS) is 6,359, as determined according to the Rapid Annotation using Subsystem Technology (RAST) server. The number of ribosomal genes is 80, of which 68 are tRNAs and 12 are rRNAs. V. campbellii PEL22A contains genes related to virulence and fitness, including a complete proteorhodopsin cluster, complete type II and III secretion systems, incomplete type I, IV, and VI secretion systems, a hemolysin, and CTX Phi.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Abstract Background A large number of probabilistic models used in sequence analysis assign non-zero probability values to most input sequences. To decide when a given probability is sufficient the most common way is bayesian binary classification, where the probability of the model characterizing the sequence family of interest is compared to that of an alternative probability model. We can use as alternative model a null model. This is the scoring technique used by sequence analysis tools such as HMMER, SAM and INFERNAL. The most prevalent null models are position-independent residue distributions that include: the uniform distribution, genomic distribution, family-specific distribution and the target sequence distribution. This paper presents a study to evaluate the impact of the choice of a null model in the final result of classifications. In particular, we are interested in minimizing the number of false predictions in a classification. This is a crucial issue to reduce costs of biological validation. Results For all the tests, the target null model presented the lowest number of false positives, when using random sequences as a test. The study was performed in DNA sequences using GC content as the measure of content bias, but the results should be valid also for protein sequences. To broaden the application of the results, the study was performed using randomly generated sequences. Previous studies were performed on aminoacid sequences, using only one probabilistic model (HMM) and on a specific benchmark, and lack more general conclusions about the performance of null models. Finally, a benchmark test with P. falciparum confirmed these results. Conclusions Of the evaluated models the best suited for classification are the uniform model and the target model. However, the use of the uniform model presents a GC bias that can cause more false positives for candidate sequences with extreme compositional bias, a characteristic not described in previous studies. In these cases the target model is more dependable for biological validation due to its higher specificity.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Abstract Background Plasmodium vivax is the most widely distributed human malaria, responsible for 70–80 million clinical cases each year and large socio-economical burdens for countries such as Brazil where it is the most prevalent species. Unfortunately, due to the impossibility of growing this parasite in continuous in vitro culture, research on P. vivax remains largely neglected. Methods A pilot survey of expressed sequence tags (ESTs) from the asexual blood stages of P. vivax was performed. To do so, 1,184 clones from a cDNA library constructed with parasites obtained from 10 different human patients in the Brazilian Amazon were sequenced. Sequences were automatedly processed to remove contaminants and low quality reads. A total of 806 sequences with an average length of 586 bp met such criteria and their clustering revealed 666 distinct events. The consensus sequence of each cluster and the unique sequences of the singlets were used in similarity searches against different databases that included P. vivax, Plasmodium falciparum, Plasmodium yoelii, Plasmodium knowlesi, Apicomplexa and the GenBank non-redundant database. An E-value of <10-30 was used to define a significant database match. ESTs were manually assigned a gene ontology (GO) terminology Results A total of 769 ESTs could be assigned a putative identity based upon sequence similarity to known proteins in GenBank. Moreover, 292 ESTs were annotated and a GO terminology was assigned to 164 of them. Conclusion These are the first ESTs reported for P. vivax and, as such, they represent a valuable resource to assist in the annotation of the P. vivax genome currently being sequenced. Moreover, since the GC-content of the P. vivax genome is strikingly different from that of P. falciparum, these ESTs will help in the validation of gene predictions for P. vivax and to create a gene index of this malaria parasite.