996 resultados para Base composition
Resumo:
Theoretical and empirical studies were conducted on the pattern of nucleotide and amino acid substitution in evolution, taking into account the effects of mutation at the nucleotide level and purifying selection at the amino acid level. A theoretical model for predicting the evolutionary change in electrophoretic mobility of a protein was also developed by using information on the pattern of amino acid substitution. The specific problems studied and the main results obtained are as follows: (1) Estimation of the pattern of nucleotide substitution in DNA nuclear genomes. The pattern of point mutations and nucleotide substitutions among the four different nucleotides are inferred from the evolutionary changes of pseudogenes and functional genes, respectively. Both patterns are non-random, the rate of change varying considerably with nucleotide pair, and that in both cases transitions occur somewhat more frequently than transversions. In protein evolution, substitution occurs more often between amino acids with similar physico-chemical properties than between dissimilar amino acids. (2) Estimation of the pattern of nucleotide substitution in RNA genomes. The majority of mutations in retroviruses accumulate at the reverse transcription stage. Selection at the amino acid level is very weak, and almost non-existent between synonymous codons. The pattern of mutation is very different from that in DNA genomes. Nevertheless, the pattern of purifying selection at the amino acid level is similar to that in DNA genomes, although selection intensity is much weaker. (3) Evaluation of the determinants of molecular evolutionary rates in protein-coding genes. Based on rates of nucleotide substitution for mammalian genes, the rate of amino acid substitution of a protein is determined by its amino acid composition. The content of glycine is shown to correlate strongly and negatively with the rate of substitution. Empirical formulae, called indices of mutability, are developed in order to predict the rate of molecular evolution of a protein from data on its amino acid sequence. (4) Studies on the evolutionary patterns of electrophoretic mobility of proteins. A theoretical model was constructed that predicts the electric charge of a protein at any given pH and its isoelectric point from data on its primary and quaternary structures. Using this model, the evolutionary change in electrophoretic mobilities of different proteins and the expected amount of electrophoretically hidden genetic variation were studied. In the absence of selection for the pI value, proteins will on the average evolve toward a mildly basic pI. (Abstract shortened with permission of author.) ^
Resumo:
The correspondence between the transversion/transition ratio and the neighboring base composition in chloroplast DNA is examined. For 18 noncoding regions of the chloroplast genome, alignments between rice (Oryza sativa) and maize (Zea mays) were generated by two different methods. Difficulties of aligning noncoding DNA are discussed, and the alignments are analyzed in a manner that reduces alignment artifacts. Sequence divergence is < 10%, so multiple substitutions at a site are assumed to be rare. Observed substitutions were analyzed with respect to the A+T content of the two immediately flanking bases. It is shown that as this content increases, the proportion of transversions also increases. When both the 5'- and 3'-flanking nucleotides are G or C (A+T content of 0), only 25% of the observed substitutions are transversions. However, when both the 5'- and 3'-flanking nucleotides are A or T (A+T content of 2), 57% of the observed substitutions are transversions. Therefore, the influence of flanking base composition on substitutions, previously reported for a single noncoding region, is a general feature of the chloroplast genome.
Resumo:
The base composition pattern (BCP) in the putative promoter region (PPRs) up to 5 Kb lengths of 682 human genes on Chromosome 22 (Chr22) was examined. Two-dimensional (2D) and three-dimensional (3D) functions were designed to delineate the DNA base composition, with four major patterns identified. It is found that 17.6% genes include TATA box, 28.0% GC box, 18.9% CAAT box and 38.4% CpG islands, and approximately 10% genes have one of four putative initiator (Inr) motifs. The occurrence of the promoter elements is tightly associated with the base composition features in the promoter regions, and the associations of the base composition features with occurrence of the promoter elements in the promoter regions mediate tissue-wide expression of the genes in human. The occurrence of two or more promoter elements in the promoter regions is required for the medium- and wide-range expression profiles of the human genes on Chr22. Thus, the reported data shed light on the characteristics of the PPRs of the human genes on Chr22, which may improve our understanding of regulatory roles of the PPRs with occurrence of the promoter elements in gene expression.
Resumo:
Combined pituitary hormone deficiency (CPHD) has an incidence of approximately 1 in 8000 births. Although the proportion of familial CPHD cases is unknown, about 10% have an affected first degree relative. We have recently reported three mutations in the PROP1 gene that cause CPHD in human subjects. We report here the frequency of one of these mutations, a 301-302delAG deletion in exon 2 of PROP1, in 10 independently ascertained CPHD kindreds and 21 sporadic cases of CPHD from 8 different countries. Our results show that 55% (11 of 20) of PROP1 alleles have the 301-302delAG deletion in familial CPHD cases. Interestingly, although only 12% (5 of 42) of the PROP1 alleles of our 21 sporadic cases were 301-302delAG, the frequency of this allele (in 20 of 21 of the sporadic subjects given TRH stimulation tests) was 50% (3 of 6) and 0% (0 of 34) in the CPHD cases with pituitary and hypothalamic defects, respectively. Using whole genome radiation hybrid analysis, we localized the PROP1 gene to the distal end of chromosome 5q and identified a tightly linked polymorphic marker, D5S408, which can be used in segregation studies. Analysis of this marker in affected subjects with the 301-302delAG deletion suggests that rather than being inherited from a common founder, the 301-302delAG may be a recurring mutation.
Resumo:
A vaccinia virus late gene coding for a major structural polypeptide of 11 kDa was sequenced. Although the 5' flanking gene region is very A+T rich, it shows little homology either to the corresponding region of vaccinia early genes or to consensus sequences characteristic of most eukaryotic genes. Three DNA fragments (100, 200, and 500 base pairs, respectively), derived from the flanking region and including the late gene mRNA start site, were inserted into the coding sequence of the vaccinia virus thymidine kinase (TK) early gene by homologous in vivo recombination. Recombinants were selected on the basis of their TK- phenotype. Cells were infected with the recombinant viruses and RNA was isolated at 1-hr intervals. Transcripts initiating either from the TK early promoter, or from the late gene promoter at its authentic position, or from the translocated late gene promoters within the early gene were detected by nuclease S1 mapping. Early after infection, only transcripts from the TK early promoter were detected. Later in infection, however, transcripts were also initiated from the translocated late promoters. This RNA appeared at the same time and in similar quantities as the RNA from the late promoter at its authentic position. No quantitative differences in promoter efficiency between the 100-, 200-, and 500-base-pair insertions were observed. We conclude that all necessary signals for correct regulation of late-gene expression reside within only 100 base pairs of 5' flanking sequence.
Resumo:
As increasingly large molecular data sets are collected for phylogenomics, the conflicting phylogenetic signal among gene trees poses challenges to resolve some difficult nodes of the Tree of Life. Among these nodes, the phylogenetic position of the honey bees (Apini) within the corbiculate bee group remains controversial, despite its considerable importance for understanding the emergence and maintenance of eusociality. Here, we show that this controversy stems in part from pervasive phylogenetic conflicts among GC-rich gene trees. GC-rich genes typically have a high nucleotidic heterogeneity among species, which can induce topological conflicts among gene trees. When retaining only the most GC-homogeneous genes or using a nonhomogeneous model of sequence evolution, our analyses reveal a monophyletic group of the three lineages with a eusocial lifestyle (honey bees, bumble bees, and stingless bees). These phylogenetic relationships strongly suggest a single origin of eusociality in the corbiculate bees, with no reversal to solitary living in this group. To accurately reconstruct other important evolutionary steps across the Tree of Life, we suggest removing GC-rich and GC-heterogeneous genes from large phylogenomic data sets. Interpreted as a consequence of genome-wide variations in recombination rates, this GC effect can affect all taxa featuring GC-biased gene conversion, which is common in eukaryotes.
Resumo:
A new method for computing evolutionary distances between DNA sequences is proposed. Contrasting with classical methods, the underlying model does not assume that sequence base compositions (A, C, G, and T contents) are at equilibrium, thus allowing unequal base compositions among compared sequences. This makes the method more efficient than the usual ones in recovering phylogenetic trees from sequence data when base composition is heterogeneous within the data set, as we show by using both simulated and empirical data. When applied to small-subunit ribosomal RNA sequences from several prokaryotic or eukaryotic organisms, this method provides evidence for an early divergence of the microsporidian Vairimorpha necatrix in the eukaryotic lineage.
Resumo:
Lower levels of cytosine methylation have been found in the liver cell DNA from non-obese diabetic (NOD) mice under hyperglycemic conditions. Because the Fourier transform-infrared (FT-IR) profiles of dry DNA samples are differently affected by DNA base composition, single-stranded form and histone binding, it is expected that the methylation status in the DNA could also affect its FT-IR profile. The DNA FT-IR signatures obtained from the liver cell nuclei of hyperglycemic and normoglycemic NOD mice of the same age were compared. Dried DNA samples were examined in an IR microspectroscope equipped with an all-reflecting objective (ARO) and adequate software. Changes in DNA cytosine methylation levels induced by hyperglycemia in mouse liver cells produced changes in the respective DNA FT-IR profiles, revealing modifications to the vibrational intensities and frequencies of several chemical markers, including νas -CH3 stretching vibrations in the 5-methylcytosine methyl group. A smaller band area reflecting lower energy absorbed in the DNA was found in the hyperglycemic mice and assumed to be related to the lower levels of -CH3 groups. Other spectral differences were found at 1700-1500 cm(-1) and in the fingerprint region, and a slight change in the DNA conformation at the lower DNA methylation levels was suggested for the hyperglycemic mice. The changes that affect cytosine methylation levels certainly affect the DNA-protein interactions and, consequently, gene expression in liver cells from the hyperglycemic NOD mice.
Resumo:
Complete or near-complete mitochondrial genomes are now available for 11 species or strains of parasitic flatworms belonging to the Trematoda and the Cestoda. The organization of these genomes is not strikingly different from those of other eumetazoans, although one gene (atp8) commonly found in other phyla is absent from flatworms. The gene order in most flatworms has similarities to those seen in higher protostomes such as annelids. However, the gene order has been drastically altered in Schistosoma mansoni, which obscures this possible relationship. Among the sequenced taxa, base composition varies considerably, creating potential difficulties for phylogeny reconstruction. Long non-coding regions are present in all taxa, but these vary in length from only a few hundred to similar to10 000 nucleotides. Among Schistosoma spp., the long non-coding regions are rich in repeats and length variation among individuals is known. Data from mitochondrial genomes are valuable for studies on species identification, phylogenies and biogeography.
Resumo:
Unlike other members of the genus, Echinococcus granulosus is known to exhibit considerable levels of variation in biology, physiology and molecular genetics. Indeed, some of the taxa regarded as 'genotypes' within E. granulosus might be sufficiently distinct as to merit specific status. Here, complete mitochondrial genomes are presented of 2 genotypes of E. granulosus (G1-sheep-dog strain: G4-horse-dog strain) and of another taeniid cestode, Taenia crassiceps. These genomes are characterized and compared with those of Echinococcus multilocularis and Hymenolepis diminuta. Genomes of all the species are very similar in structure, length and base-composition. Pairwise comparisons of concatenated protein-coding genes indicate that the G1 and G4 genotypes of E. granulosus are almost as distant from each other as each is from a distinct species, E. multilocularis. Sequences for the variable genes atp6 and nad3 were obtained from additional genotypes of E. granulosus, from E. vogeli and E. oligarthrus. Again, pairwise comparisons showed the distinctiveness of the G1 and G4 genotypes. Phylogenetic analyses of concatenated atp6, nad1 (partial) and cox1 (partial) genes from E. multilocularis, E. vogeli, E. oligarthrus, 5 genotypes of E. granulosus, and using T. crassiceps as an outgroup, yielded the same results. We conclude that the sheep-dog and horse-dog strains of E. granulosus should be regarded as distinct at the specific level.
Resumo:
Systematics is the study of diversity of the organisms and their relationships comprising classification, nomenclature and identification. The term classification or taxonomy means the arrangement of the organisms in groups (rate) and the nomenclature is the attribution of correct international scientific names to organisms and identification is the inclusion of unknown strains in groups derived from classification. Therefore, classification for a stable nomenclature and a perfect identification are required previously. The beginning of the new bacterial systematics era can be remembered by the introduction and application of new taxonomic concepts and techniques, from the 50s and 60s. Important progress were achieved using numerical taxonomy and molecular taxonomy. Molecular taxonomy, brought into effect after the emergence of the Molecular Biology resources, provided knowledge that comprises systematics of bacteria, in which occurs great evolutionary interest, or where is observed the necessity of eliminating any environmental interference. When you study the composition and disposition of nucleotides in certain portions of the genetic material, you study searching their genome, much less susceptible to environmental alterations than proteins, codified based on it. In the molecular taxonomy, you can research both DNA and RNA, and the main techniques that have been used in the systematics comprise the build of restriction maps, DNA-DNA hybridization, DNA-RNA hybridization, sequencing of DNA sequencing of sub-units 16S and 23S of rRNA, RAPD, RFLP, PFGE etc. Techniques such as base sequencing, though they are extremely sensible and greatly precise, are relatively onerous and impracticable to the great majority of the bacterial taxonomy laboratories. Several specialized techniques have been applied to taxonomic studies of microorganisms. In the last years, these have included preliminary electrophoretic analysis of soluble proteins and isoenzymes, and subsequently determination of deoxyribonucleic acid base composition and assessment of base sequence homology by means of DNA-RNA hybrid experiments beside others. These various techniques, as expected, have generally indicated a lack of taxonomic information in microbial systematics. There are numberless techniques and methodologies that make bacteria identification and classification study possible, part of them described here, allowing establish different degrees of subspecific and interspecific similarity through phenetic-genetic polymorphism analysis. However, was pointed out the necessity of using more than one technique for better establish similarity degrees within microorganisms. Obtaining data resulting from application of a sole technique isolatedly may not provide significant information from Bacterial Systematics viewpoint
Resumo:
A Gram-negative, rod-shaped, aerobic bacterium, designated strain RP007(T), was isolated from a polycyclic aromatic hydrocarbon-contaminated soil in New Zealand. Two additional strains were recovered from a compost heap in Belgium (LMG 18808) and from the rhizosphere of maize in the Netherlands (LMG 24204). The three strains had virtually identical 16S rRNA gene sequences and whole-cell protein profiles, and they were identified as members of the genus Burkholderia, with Burkholderia phenazinium as their closest relative. Strain RP007(T) had a DNA G+C content of 63.5 mol% and could be distinguished from B. phenazinium based on a range of biochemical characteristics. Strain RP007(T) showed levels of DNA-DNA relatedness towards the type strain of B. phenazinium and those of other recognized Burkholderia species of less than 30 %. The results of 16S rRNA gene sequence analysis, DNA-DNA hybridization experiments and physiological and biochemical tests allowed the differentiation of strain RP007(T) from all recognized species of the genus Burkholderia. Strains RP007(T), LMG 18808 and LMG 24204 are therefore considered to represent a single novel species of the genus Burkholderia, for which the name Burkholderia sartisoli sp. nov. is proposed. The type strain is RP007(T) (=LMG 24000(T) =CCUG 53604(T) =ICMP 13529(T)).
Resumo:
Plasmodium falciparum is the parasite responsible for the most acute form of malaria in humans. Recently, the serine repeat antigen (SERA) in P. falciparum has attracted attention as a potential vaccine and drug target, and it has been shown to be a member of a large gene family. To clarify the relationships among the numerous P. falciparum SERAs and to identify orthologs to SERA5 and SERA6 in Plasmodium species affecting rodents, gene trees were inferred from nucleotide and amino acid sequence data for 33 putative SERA homologs in seven different species. (A distance method for nucleotide sequences that is specifically designed to accommodate differing GC content yielded results that were largely compatible with the amino acid tree. Standard-distance and maximum-likelihood methods for nucleotide sequences, on the other hand, yielded gene trees that differed in important respects.) To infer the pattern of duplication, speciation, and gene loss events in the SERA gene family history, the resulting gene trees were then "reconciled" with two competing Plasmodium species tree topologies that have been identified by previous phylogenetic studies. Parsimony of reconciliation was used as a criterion for selecting a gene tree/species tree pair and provided (1) support for one of the two species trees and for the core topology of the amino acid-derived gene tree, (2) a basis for critiquing fine detail in a poorly resolved region of the gene tree, (3) a set of predicted "missing genes" in some species, (4) clarification of the relationship among the P. falciparum SERA, and (5) some information about SERA5 and SERA6 orthologs in the rodent malaria parasites. Parsimony of reconciliation and a second criterion--implied mutational pattern at two key active sites in the SERA proteins-were also seen to be useful supplements to standard "bootstrap" analysis for inferred topologies.
Resumo:
Selective pressures related to gene function and chromosomal architecture are acting on genome sequences and can be revealed, for instance, by appropriate genometric methods. Cumulative nucleotide skew analyses, i.e., GC, TA, and ORF orientation skews, predict the location of the origin of DNA replication for 88 out of 100 completely sequenced bacterial chromosomes. These methods appear fully reliable for proteobacteria, Gram-positives, and spirochetes as well as for euryarchaeotes. Based on this genome architecture information, coorientation analyses reveal that in prokaryotes, ribosomal RNA (rRNA) genes encoding the small and large ribosomal subunits are all transcribed in the same direction as DNA replication; that is, they are located along the leading strand. This result offers a simple and reliable method for circumscribing the region containing the origin of the DNA replication and reveals a strong selective pressure acting on the orientation of rRNA genes similar to the weaker one acting on the orientation of ORFs. Rate of coorientation of transfer RNA (tRNA) genes with DNA replication appears to be taxon-specific. Analyzing nucleotide biases such as GC and TA skews of genes and plotting one against the other reveals a taxonomic clusterization of species. All ribosomal RNA genes are enriched in Gs and depleted in Cs, the only so far known exception being the rRNA genes of deuterostomian mitochondria. However, this exception can be explained by the fact that in the chromosome of the human mitochondrion, the model of the deuterostomian organelle genome, DNA replication, and rRNA transcription proceed in opposite directions. A general rule is deduced from prokaryotic and mitochondrial genomes: ribosomal RNA genes that are transcribed in the same direction as the DNA replication are enriched in Gs, and those transcribed in the opposite direction are depleted in Gs.
Resumo:
DNA that survives in museum specimens, bones and other tissues recovered by archaeologists is invariably fragmented and chemically modified. The extent to which such modifications accumulate over time is largely unknown but could potentially be used to differentiate between endogenous old DNA and present-day DNA contaminating specimens and experiments. Here we examine mitochondrial DNA sequences from tissue remains that vary in age between 18 and 60,000 years with respect to three molecular features: fragment length, base composition at strand breaks, and apparent C to T substitutions. We find that fragment length does not decrease consistently over time and that strand breaks occur preferentially before purine residues by what may be at least two different molecular mechanisms that are not yet understood. In contrast, the frequency of apparent C to T substitutions towards the 5'-ends of molecules tends to increase over time. These nucleotide misincorporations are thus a useful tool to distinguish recent from ancient DNA sources in specimens that have not been subjected to unusual or harsh treatments.