10 resultados para Comparison of nucleotide sequences

em National Center for Biotechnology Information - NCBI


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The genome sequence of the extremely thermophilic archaeon Methanococcus jannaschii provides a wealth of data on proteins from a thermophile. In this paper, sequences of 115 proteins from M. jannaschii are compared with their homologs from mesophilic Methanococcus species. Although the growth temperatures of the mesophiles are about 50°C below that of M. jannaschii, their genomic G+C contents are nearly identical. The properties most correlated with the proteins of the thermophile include higher residue volume, higher residue hydrophobicity, more charged amino acids (especially Glu, Arg, and Lys), and fewer uncharged polar residues (Ser, Thr, Asn, and Gln). These are recurring themes, with all trends applying to 83–92% of the proteins for which complete sequences were available. Nearly all of the amino acid replacements most significantly correlated with the temperature change are the same relatively conservative changes observed in all proteins, but in the case of the mesophile/thermophile comparison there is a directional bias. We identify 26 specific pairs of amino acids with a statistically significant (P < 0.01) preferred direction of replacement.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Simple phylogenetic tests were applied to a large data set of nucleotide sequences from two nuclear genes and a region of the mitochondrial genome of Trypanosoma cruzi, the agent of Chagas' disease. Incongruent gene genealogies manifest genetic exchange among distantly related lineages of T. cruzi. Two widely distributed isoenzyme types of T. cruzi are hybrids, their genetic composition being the likely result of genetic exchange between two distantly related lineages. The data show that the reference strain for the T. cruzi genome project (CL Brener) is a hybrid. Well-supported gene genealogies show that mitochondrial and nuclear gene sequences from T. cruzi cluster, respectively, in three or four distinct clades that do not fully correspond to the two previously defined major lineages of T. cruzi. There is clear genetic differentiation among the major groups of sequences, but genetic diversity within each major group is low. We estimate that the major extant lineages of T. cruzi have diverged during the Miocene or early Pliocene (3–16 million years ago).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We examine the occurrence of the ≈300 known protein folds in different groups of organisms. To do this, we characterize a large fraction of the currently known protein sequences (≈140,000) in structural terms, by matching them to known structures via sequence comparison (or by secondary-structure class prediction for those without structural homologues). Overall, we find that an appreciable fraction of the known folds are present in each of the major groups of organisms (e.g., bacteria and eukaryotes share 156 of 275 folds), and most of the common folds are associated with many families of nonhomologous sequences (i.e., >10 sequence families for each common fold). However, different groups of organisms have characteristically distinct distributions of folds. So, for instance, some of the most common folds in vertebrates, such as globins or zinc fingers, are rare or absent in bacteria. Many of these differences in fold usage are biologically reasonable, such as the folds of metabolic enzymes being common in bacteria and those associated with extracellular transport and communication being common in animals. They also have important implications for database-based methods for fold recognition, suggesting that an unknown sequence from a plant is more likely to have a certain fold (e.g., a TIM barrel) than an unknown sequence from an animal.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many eubacterial DNA polymerases are bifunctional molecules having both polymerization (P) and 5′ nuclease (N) activities, which are contained in separable domains. We previously showed that the DNA polymerase I of Thermus aquaticus (TaqNP) endonucleolytically cleaves DNA substrates, releasing unpaired 5′ arms of bifurcated duplexes. Here, we compare the substrate specificities of TaqNP and the isolated 5′ nuclease domain of this enzyme, TaqN. Both enzymes are significantly activated by primer oligonucleotides that are hybridized to the 3′ arm of the bifurcation; optimal stimulation requires overlap of the 3′ terminal nucleotide of the primer with the terminal base pair of the duplex, but the terminal nucleotide need not hybridize to the complementary strand in the substrate. In the presence of Mn2+ ions, TaqN can cleave both RNA and circular DNA at structural bifurcations. Certain anti-TaqNP mAbs block cleavage by one or both enzymes, whereas others can stimulate cleavage of nonoptimal substrates.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A large family of membrane channel proteins selective for transport of water (aquaporins) or water plus glycerol (aquaglyceroporins) has been found in diverse life forms. Escherichia coli has two members of this family—a water channel, AqpZ, and a glycerol facilitator, GlpF. Despite having similar primary amino acid sequences and predicted structures, the oligomeric state and solute selectivity of AqpZ and GlpF are disputed. Here we report biochemical and functional characterizations of affinity-purified GlpF and compare it to AqpZ. Histidine-tagged (His-GlpF) and hemagglutinin-tagged (HA-GlpF) polypeptides encoded by a bicistronic construct were expressed in bacteria. HA-GlpF and His-GlpF appear to form oligomers during Ni-nitrilotriacetate affinity purification. Sucrose gradient sedimentation analyses showed that the oligomeric state of octyl glucoside-solubilized GlpF varies: low ionic strength favors subunit dissociation, whereas Mg2+ stabilizes tetrameric assembly. Reconstitution of affinity-purified GlpF into proteoliposomes increases glycerol permeability more than 100-fold and water permeability up to 10-fold compared with control liposomes. Glycerol and water permeability of GlpF both occur with low Arrhenius activation energies and are reversibly inhibited by HgCl2. Our studies demonstrate that, unlike AqpZ, a water-selective stable tetramer, purified GlpF exists in multiple oligomeric forms under nondenaturing conditions and is highly permeable to glycerol but less well permeated by water.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Of the rules used by the splicing machinery to precisely determine intron–exon boundaries only a fraction is known. Recent evidence suggests that specific short sequences within exons help in defining these boundaries. Such sequences are known as exonic splicing enhancers (ESE). A possible bioinformatical approach to studying ESE sequences is to compare genes that harbor introns with genes that do not. For this purpose two non-redundant samples of 719 intron-containing and 63 intron-lacking human genes were created. We performed a statistical analysis on these datasets of intron-containing and intron-lacking human coding sequences and found a statistically significant difference (P = 0.01) between these samples in terms of 5–6mer oligonucleotide distributions. The difference is not created by a few strong signals present in the majority of exons, but rather by the accumulation of multiple weak signals through small variations in codon frequencies, codon biases and context-dependent codon biases between the samples. A list of putative novel human splicing regulation sequences has been elucidated by our analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Estimation of evolutionary distances has always been a major issue in the study of molecular evolution because evolutionary distances are required for estimating the rate of evolution in a gene, the divergence dates between genes or organisms, and the relationships among genes or organisms. Other closely related issues are the estimation of the pattern of nucleotide substitution, the estimation of the degree of rate variation among sites in a DNA sequence, and statistical testing of the molecular clock hypothesis. Mathematical treatments of these problems are considerably simplified by the assumption of a stationary process in which the nucleotide compositions of the sequences under study have remained approximately constant over time, and there now exist fairly extensive studies of stationary models of nucleotide substitution, although some problems remain to be solved. Nonstationary models are much more complex, but significant progress has been recently made by the development of the paralinear and LogDet distances. This paper reviews recent studies on the above issues and reports results on correcting the estimation bias of evolutionary distances, the estimation of the pattern of nucleotide substitution, and the estimation of rate variation among the sites in a sequence.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The circumsporozoite (CS) protein of malaria parasites (Plasmodium) covers the surface of sporozoites that invade hepatocytes in mammalian hosts and macrophages in avian hosts. CS genes have been characterized from many Plasmodium that infect mammals; two domains of the corresponding proteins, identified initially by their conservation (region I and region II), have been implicated in binding to hepatocytes. The CS gene from the avian parasite Plasmodium gallinaceum was characterized to compare these functional domains to those of mammalian Plasmodium and for the study of Plasmodium evolution. The P. gallinaceum protein has the characteristics of CS proteins, including a secretory signal sequence, central repeat region, regions of charged amino acids, and an anchor sequence. Comparison with CS signal sequences reveals four distinct groupings, with P. gallinaceum most closely related to the human malaria Plasmodium falciparum. The 5-amino acid sequence designated region I, which is identical in all mammalian CS and implicated in hepatocyte invasion, is different in the avian protein. The P. gallinaceum repeat region consists of 9-amino acid repeats with the consensus sequence QP(A/V)GGNGG(A/V). The conserved motif designated region II-plus, which is associated with targeting the invasion of liver cells, is also conserved in the avian protein. Phylogenetic analysis of the aligned Plasmodium CS sequences yields a tree with a topology similar to the one obtained using sequence data from the small subunit rRNA gene. The phylogeny using the CS gene supports the proposal that the human malaria P. falciparum is significantly more related to avian parasites than to other parasites infecting mammals, although the biology of sporozoite invasion is different between the avian and mammalian species.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The recently sequenced genome of the parasitic bacterium Mycoplasma genitalium contains only 468 identified protein-coding genes that have been dubbed a minimal gene complement [Fraser, C.M., Gocayne, J.D., White, O., Adams, M.D., Clayton, R.A., et al. (1995) Science 270, 397-403]. Although the M. genitalium gene complement is indeed the smallest among known cellular life forms, there is no evidence that it is the minimal self-sufficient gene set. To derive such a set, we compared the 468 predicted M. genitalium protein sequences with the 1703 protein sequences encoded by the other completely sequenced small bacterial genome, that of Haemophilus influenzae. M. genitalium and H. influenzae belong to two ancient bacterial lineages, i.e., Gram-positive and Gram-negative bacteria, respectively. Therefore, the genes that are conserved in these two bacteria are almost certainly essential for cellular function. It is this category of genes that is most likely to approximate the minimal gene set. We found that 240 M. genitalium genes have orthologs among the genes of H. influenzae. This collection of genes falls short of comprising the minimal set as some enzymes responsible for intermediate steps in essential pathways are missing. The apparent reason for this is the phenomenon that we call nonorthologous gene displacement when the same function is fulfilled by nonorthologous proteins in two organisms. We identified 22 nonorthologous displacements and supplemented the set of orthologs with the respective M. genitalium genes. After examining the resulting list of 262 genes for possible functional redundancy and for the presence of apparently parasite-specific genes, 6 genes were removed. We suggest that the remaining 256 genes are close to the minimal gene set that is necessary and sufficient to sustain the existence of a modern-type cell. Most of the proteins encoded by the genes from the minimal set have eukaryotic or archaeal homologs but seven key proteins of DNA replication do not. We speculate that the last common ancestor of the three primary kingdoms had an RNA genome. Possibilities are explored to further reduce the minimal set to model a primitive cell that might have existed at a very early stage of life evolution.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mutations at position C1054 of 16S rRNA have previously been shown to cause translational suppression in Escherichia coli. To examine the effects of similar mutations in a eukaryote, all three possible base substitutions and a base deletion were generated at the position of Saccharomyces cerevisiae 18S rRNA corresponding to E. coli C1054. In yeast, as in E. coli, both C1054A (rdn-1A) and C1054G (rdn-1G) caused dominant nonsense suppression. Yeast C1054U (rdn-1T) was a recessive antisuppressor, while yeast C1054-delta (rdn-1delta) led to recessive lethality. Both C1054U and two previously described yeast 18S rRNA antisuppressor mutations, G517A (rdn-2) and U912C (rdn-4), inhibited codon-nonspecific suppression caused by mutations in eukaryotic release factors, sup45 and sup35. However, among these only C1054U inhibited UAA-specific suppressions caused by a UAA-decoding mutant tRNA-Gln (SLT3). Our data implicate eukaryotic C1054 in translational termination, thus suggesting that its function is conserved throughout evolution despite the divergence of nearby nucleotide sequences.