46 resultados para Genome Sequences
em National Center for Biotechnology Information - NCBI
Resumo:
A crucial step in exploiting the information inherent in genome sequences is to assign to each protein sequence its three-dimensional fold and biological function. Here we describe fold assignment for the proteins encoded by the small genome of Mycoplasma genitalium. The assignment was carried out by our computer server (http://www.doe-mbi.ucla.edu/people/frsvr/frsvr.html), which assigns folds to amino acid sequences by comparing sequence-derived predictions with known structures. Of the total of 468 protein ORFs, 103 (22%) can be assigned a known protein fold with high confidence, as cross-validated with tests on known structures. Of these sequences, 75 (16%) show enough sequence similarity to proteins of known structure that they can also be detected by traditional sequence–sequence comparison methods. That is, the difference of 28 sequences (6%) are assignable by the sequence–structure method of the server but not by current sequence–sequence methods. Of the remaining 78% of sequences in the genome, 18% belong to membrane proteins and the remaining 60% cannot be assigned either because these sequences correspond to no presently known fold or because of insensitivity of the method. At the current rate of determination of new folds by x-ray and NMR methods, extrapolation suggests that folds will be assigned to most soluble proteins in the next decade.
Resumo:
Despite more than a century of debate, the evolutionary position of turtles (Testudines) relative to other amniotes (reptiles, birds, and mammals) remains uncertain. One of the major impediments to resolving this important evolutionary problem is the highly distinctive and enigmatic morphology of turtles that led to their traditional placement apart from diapsid reptiles as sole descendants of presumably primitive anapsid reptiles. To address this question, the complete (16,787-bp) mitochondrial genome sequence of the African side-necked turtle (Pelomedusa subrufa) was determined. This molecule contains several unusual features: a (TA)n microsatellite in the control region, the absence of an origin of replication for the light strand in the WANCY region of five tRNA genes, an unusually long noncoding region separating the ND5 and ND6 genes, an overlap between ATPase 6 and COIII genes, and the existence of extra nucleotides in ND3 and ND4L putative ORFs. Phylogenetic analyses of the complete mitochondrial genome sequences supported the placement of turtles as the sister group of an alligator and chicken (Archosauria) clade. This result clearly rejects the Haematothermia hypothesis (a sister-group relationship between mammals and birds), as well as rejecting the placement of turtles as the most basal living amniotes. Moreover, evidence from both complete mitochondrial rRNA genes supports a sister-group relationship of turtles to Archosauria to the exclusion of Lepidosauria (tuatara, snakes, and lizards). These results challenge the classic view of turtles as the only survivors of primary anapsid reptiles and imply that turtles might have secondarily lost their skull fenestration.
Resumo:
The determination of complete genome sequences provides us with an opportunity to describe and analyze evolution at the comprehensive level of genomes. Here we compare nine genomes with respect to their protein coding genes at two levels: (i) we compare genomes as “bags of genes” and measure the fraction of orthologs shared between genomes and (ii) we quantify correlations between genes with respect to their relative positions in genomes. Distances between the genomes are related to their divergence times, measured as the number of amino acid substitutions per site in a set of 34 orthologous genes that are shared among all the genomes compared. We establish a hierarchy of rates at which genomes have changed during evolution. Protein sequence identity is the most conserved, followed by the complement of genes within the genome. Next is the degree of conservation of the order of genes, whereas gene regulation appears to evolve at the highest rate. Finally, we show that some genomes are more highly organized than others: they show a higher degree of the clustering of genes that have orthologs in other genomes.
Resumo:
Earlier studies have shown that Kaposi sarcomas contain cells infected with human herpesvirus (HHV) 6B, and in current studies we report that both AIDS-associated and classic-sporadic Kaposi sarcoma contain HHV-7 genome sequences detectable by PCR. To determine the distribution of HHV-7-infected cells relative to those infected with HHV-6, sections from paraffin-embedded tissues were allowed to react with antibodies to HHV-7 virion tegument phosphoprotein pp85 and to HHV-6B protein p101. The antibodies are specific for HHV-7 and HHV-6B, respectively, and they retained reactivity for antigens contained in formalin-fixed, paraffin-embedded tissue samples. We report that (i) HHV-7 pp85 was present in 9 of 32 AIDS-associated Kaposi sarcomas, and in 1 of 7 classical-sporadic HIV-negative Kaposi sarcomas; (ii) HHV-7 pp85 was detected primarily in cells bearing the CD68 marker characteristic of the monocyte/macrophage lineage present in or surrounding the Kaposi sarcoma lesions; and (iii) in a number of Kaposi sarcoma specimens, tumor-associated CD68+ monocytes/macrophages expressed simultaneously antigens from both HHV-7 and HHV-6B, and therefore appeared to be doubly infected with the two viruses. CD68+ monocytes/macrophages infected with HHV-7 were readily detectable in Kaposi sarcoma, but virtually absent from other normal or pathological tissues that harbor macrophages. Because all of the available data indicate that HHV-7 infects CD4+ T lymphocytes, these results suggest that the environment of the Kaposi sarcoma (i) attracts circulating peripheral lymphocytes and monocytes, triggers the replication of latent viruses, and thereby increases the local concentration of viruses, (ii) renders CD68+ monocytes/macrophages susceptible to infection with HHV-7, and (iii) the combination of both events enables double infections of cells with both HHV-6B and HHV-7.
Resumo:
A recent study of the divergence times of the major groups of organisms as gauged by amino acid sequence comparison has been expanded and the data have been reanalyzed with a distance measure that corrects for both constraints on amino acid interchange and variation in substitution rate at different sites. Beyond that, the availability of complete genome sequences for several eubacteria and an archaebacterium has had a great impact on the interpretation of certain aspects of the data. Thus, the majority of the archaebacterial sequences are not consistent with currently accepted views of the Tree of Life which cluster the archaebacteria with eukaryotes. Instead, they are either outliers or mixed in with eubacterial orthologs. The simplest resolution of the problem is to postulate that many of these sequences were carried into eukaryotes by early eubacterial endosymbionts about 2 billion years ago, only very shortly after or even coincident with the divergence of eukaryotes and archaebacteria. The strong resemblances of these same enzymes among the major eubacterial groups suggest that the cyanobacteria and Gram-positive and Gram-negative eubacteria also diverged at about this same time, whereas the much greater differences between archaebacterial and eubacterial sequences indicate these two groups may have diverged between 3 and 4 billion years ago.
Resumo:
The availability of complete genome sequences and mRNA expression data for all genes creates new opportunities and challenges for identifying DNA sequence motifs that control gene expression. An algorithm, “MobyDick,” is presented that decomposes a set of DNA sequences into the most probable dictionary of motifs or words. This method is applicable to any set of DNA sequences: for example, all upstream regions in a genome or all genes expressed under certain conditions. Identification of words is based on a probabilistic segmentation model in which the significance of longer words is deduced from the frequency of shorter ones of various lengths, eliminating the need for a separate set of reference data to define probabilities. We have built a dictionary with 1,200 words for the 6,000 upstream regulatory regions in the yeast genome; the 500 most significant words (some with as few as 10 copies in all of the upstream regions) match 114 of 443 experimentally determined sites (a significance level of 18 standard deviations). When analyzing all of the genes up-regulated during sporulation as a group, we find many motifs in addition to the few previously identified by analyzing the subclusters individually to the expression subclusters. Applying MobyDick to the genes derepressed when the general repressor Tup1 is deleted, we find known as well as putative binding sites for its regulatory partners.
Resumo:
Understanding the factors responsible for variations in mutation patterns and selection efficacy along chromosomes is a prerequisite for deciphering genome sequences. Population genetics models predict a positive correlation between the efficacy of selection at a given locus and the local rate of recombination because of Hill–Robertson effects. Codon usage is considered one of the most striking examples that support this prediction at the molecular level. In a wide range of species including Caenorhabditis elegans and Drosophila melanogaster, codon usage is essentially shaped by selection acting for translational efficiency. Codon usage bias correlates positively with recombination rate in Drosophila, apparently supporting the hypothesis that selection on codon usage is improved by recombination. Here we present an exhaustive analysis of codon usage in C. elegans and D. melanogaster complete genomes. We show that in both genomes there is a positive correlation between recombination rate and the frequency of optimal codons. However, we demonstrate that in both species, this effect is due to a mutational bias toward G and C bases in regions of high recombination rate, possibly as a direct consequence of the recombination process. The correlation between codon usage bias and recombination rate in these species appears to be essentially determined by recombination-dependent mutational patterns, rather than selective effects. This result highlights that it is necessary to take into account the mutagenic effect of recombination to understand the evolutionary role and impact of recombination.
Resumo:
Complete genome sequences are providing a framework to allow the investigation of biological processes by the use of comprehensive approaches. Genome analysis also is having a dramatic impact on medicine through its identification of genes and mutations involved in disease and the elucidation of entire microbial gene sets. Studies of the sequences of model organisms, such as that of the nematode worm Caenorhabditis elegans, are providing extraordinary insights into development and differentiation that aid the study of these processes in humans. The field of functional genomics seeks to devise and apply technologies that take advantage of the growing body of sequence information to analyze the full complement of genes and proteins encoded by an organism.
Resumo:
The past decade in molecular biology has seen remarkable advances in the study of the origin and early evolution of life. The mathematical tools for analyzing DNA and protein sequences, coupled with the availability of complete microbial genome sequences, provide insight almost as far back as the age of the nucleic acids themselves. Experimental evolution in the laboratory and especially in vitro evolution of RNA provide insight into a hypothetical world where RNA, or a close relative, may have debuted as a primary functional and informational molecule. The ability to isolate new functional RNAs from random sequences now ultimately makes the world of possible primitive chemical interactions accessible even when the molecules or reactions are no longer present in modern species. Thus we can at last form direct experimental tests of specific models for the origin of RNA–protein associations, such as those that influenced the genetic code. This marks a turning point for probing the origin and early history of life at the molecular level.
Resumo:
Insertion of introns into cloned cDNA of two isolates of the plant potyvirus pea seedborne mosaic virus facilitated plasmid amplification in Escherichia coli. Multiple stop codons in the inserted introns interrupted the open reading frame of the virus cDNA, thereby terminating undesired translation of virus proteins in E. coli. Plasmids containing the full-length virus sequences, placed under control of the cauliflower mosaic virus 35S promoter and the nopaline synthase termination signal, were stable and easy to amplify in E. coli if one or more introns were inserted into the virus sequence. These plasmids were infectious when inoculated mechanically onto Pisum sativum leaves. Examination of the cDNA-derived viruses confirmed that intron splicing of in vivo transcribed pre-mRNA had occurred as predicted, reestablishing the virus genome sequences. Symptom development and virus accumulation of the cDNA derived viruses and parental viruses were identical. It is proposed that intron insertion can be used to facilitate manipulation and amplification of cloned DNA fragments that are unstable in, or toxic to, E. coli. When transcribed in vivo in eukaryotic cells, the introns will be eliminated from the sequence and will not interfere with further analysis of protein expression or virus infection.
Resumo:
The whole genome sequence (1.83 Mbp) of Haemophilus influenzae strain Rd was searched to identify tandem oligonucleotide repeat sequences. Loss or gain of one or more nucleotide repeats through a recombination-independent slippage mechanism is known to mediate phase variation of surface molecules of pathogenic bacteria, including H. influenzae. This facilitates evasion of host defenses and adaptation to the varying microenvironments of the host. We reasoned that iterative nucleotides could identify novel genes relevant to microbe-host interactions. Our search of the Rd genome sequence identified 9 novel loci with multiple (range 6-36, mean 22) tandem tetranucleotide repeats. All were found to be located within putative open reading frames and included homologues of hemoglobin-binding proteins of Neisseria, a glycosyltransferase (IgtC gene product) of Neisseria, and an adhesin of Yersinia. These tetranucleotide repeat sequences were also shown to be present in two other epidemiologically different H. influenzae type b strains, although the number and distribution of repeats was different. Further characterization of the IgtC gene showed that it was involved in phenotypic switching of a lipopolysaccharide epitope and that this variable expression was associated with changes in the number of tetranucleotide repeats. Mutation of IgtC resulted in attenuated virulence of H. influenzae in an infant rat model of invasive infection. These data indicate the rapidity, economy, and completeness with which whole genome sequences can be used to investigate the biology of pathogenic bacteria.
Resumo:
Nuclear-localized mtDNA pseudogenes might explain a recent report describing a heteroplasmic mtDNA molecule containing five linked missense mutations dispersed over the contiguous mtDNA CO1 and CO2 genes in Alzheimer’s disease (AD) patients. To test this hypothesis, we have used the PCR primers utilized in the original report to amplify CO1 and CO2 sequences from two independent ρ° (mtDNA-less) cell lines. CO1 and CO2 sequences amplified from both of the ρ° cells, demonstrating that these sequences are also present in the human nuclear DNA. The nuclear pseudogene CO1 and CO2 sequences were then tested for each of the five “AD” missense mutations by restriction endonuclease site variant assays. All five mutations were found in the nuclear CO1 and CO2 PCR products from ρ° cells, but none were found in the PCR products obtained from cells with normal mtDNA. Moreover, when the overlapping nuclear CO1 and CO2 PCR products were cloned and sequenced, all five missense mutations were found, as well as a linked synonymous mutation. Unlike the findings in the original report, an additional 32 base substitutions were found, including two in adjacent tRNAs and a two base pair deletion in the CO2 gene. Phylogenetic analysis of the nuclear CO1 and CO2 sequences revealed that they diverged from modern human mtDNAs early in hominid evolution about 770,000 years before present. These data would be consistent with the interpretation that the missense mutations proposed to cause AD may be the product of ancient mtDNA variants preserved as nuclear pseudogenes.
Resumo:
Open reading frames in the Plasmodium falciparum genome encode domains homologous to the adhesive domains of the P. falciparum EBA-175 erythrocyte-binding protein (eba-175 gene product) and those of the Plasmodium vivax and Plasmodium knowlesi Duffy antigen-binding proteins. These domains are referred to as Duffy binding-like (DBL), after the receptor that determines P. vivax invasion of Duffy blood group-positive human erythrocytes. Using oligonucleotide primers derived from short regions of conserved sequence, we have developed a reverse transcription-PCR method that amplifies sequences encoding the DBL domains of expressed genes. Products of these reverse transcription-PCR amplifications include sequences of single-copy genes (including eba-175) and variably transcribed genes that cross-hybridize to multiple regions of the genome. Restriction patterns of the multicopy genes show a high degree of polymorphism among different parasite lines, whereas single-copy genes are generally conserved. Characterization of the single-copy genes has identified a gene (ebl-1) that is related to eba-175 and is likely to be involved in erythrocyte invasion.
Resumo:
The genome sequence of the extremely thermophilic archaeon Methanococcus jannaschii provides a wealth of data on proteins from a thermophile. In this paper, sequences of 115 proteins from M. jannaschii are compared with their homologs from mesophilic Methanococcus species. Although the growth temperatures of the mesophiles are about 50°C below that of M. jannaschii, their genomic G+C contents are nearly identical. The properties most correlated with the proteins of the thermophile include higher residue volume, higher residue hydrophobicity, more charged amino acids (especially Glu, Arg, and Lys), and fewer uncharged polar residues (Ser, Thr, Asn, and Gln). These are recurring themes, with all trends applying to 83–92% of the proteins for which complete sequences were available. Nearly all of the amino acid replacements most significantly correlated with the temperature change are the same relatively conservative changes observed in all proteins, but in the case of the mesophile/thermophile comparison there is a directional bias. We identify 26 specific pairs of amino acids with a statistically significant (P < 0.01) preferred direction of replacement.
Resumo:
The polymerase (PB2) and nucleocapsid (NP) genes encoded by the genome of influenza virus are essential for replication of the virus. When synthetic genes that express RNAs for external guide sequences targeted to the mRNAs of the PB2 and NP genes are stably incorporated into mouse cells in tissue culture, infection of these cells with influenza virus is nonproductive. Endogenous RNase P cleaves the targeted influenza virus mRNAs when they are in a complex with the external guide sequences. Targeting two different mRNAs simultaneously inhibits viral particle production more efficiently than does targeting only one mRNA.