Biblioteca Digital

47 resultados para SEQUENCE DATA

em National Center for Biotechnology Information - NCBI

PlasmoDB: An integrative database of the Plasmodium falciparum genome. Tools for accessing and analyzing finished and unfinished sequence data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Plasmodium falciparum Genome Database (http://PlasmoDB.org) integrates sequence information, automated analyses and annotation data emerging from the P.falciparum genome sequencing consortium. To date, raw sequence coverage is available for >90% of the genome, and two chromosomes have been finished and annotated. Data in PlasmoDB are organized by chromosome (1–14), and can be accessed using a variety of tools for graphical and text-based browsing or downloaded in various file formats. The GUS (Genomics Unified Schema) implementation of PlasmoDB provides a multi-species genomic relational database, incorporating data from human and mouse, as well as P.falciparum. The relational schema uses a highly structured format to accommodate diverse data sets related to genomic sequence and gene expression. Tools have been designed to facilitate complex biological queries, including many that are specific to Plasmodium parasites and malaria as a disease. Additional projects seek to integrate genomic information with the rich data sets now becoming available for RNA transcription, protein expression, metabolic pathways, genetic and physical mapping, antigenic and population diversity, and phylogenetic relationships with other apicomplexan parasites. The overall goal of PlasmoDB is to facilitate Internet- and CD-ROM-based access to both finished and unfinished sequence information by the global malaria research community.

GenEST, a powerful bidirectional link between cDNA sequence data and gene expression profiles generated by cDNA-AFLP

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The release of vast quantities of DNA sequence data by large-scale genome and expressed sequence tag (EST) projects underlines the necessity for the development of efficient and inexpensive ways to link sequence databases with temporal and spatial expression profiles. Here we demonstrate the power of linking cDNA sequence data (including EST sequences) with transcript profiles revealed by cDNA-AFLP, a highly reproducible differential display method based on restriction enzyme digests and selective amplification under high stringency conditions. We have developed a computer program (GenEST) that predicts the sizes of virtual transcript-derived fragments (TDFs) of in silico-digested cDNA sequences retrieved from databases. The vast majority of the resulting virtual TDFs could be traced back among the thousands of TDFs displayed on cDNA-AFLP gels. Sequencing of the corresponding bands excised from cDNA-AFLP gels revealed no inconsistencies. As a consequence, cDNA sequence databases can be screened very efficiently to identify genes with relevant expression profiles. The other way round, it is possible to switch from cDNA-AFLP gels to sequences in the databases. Using the restriction enzyme recognition sites, the primer extensions and the estimated TDF size as identifiers, the DNA sequence(s) corresponding to a TDF with an interesting expression pattern can be identified. In this paper we show examples in both directions by analyzing the plant parasitic nematode Globodera rostochiensis. Various novel pathogenicity factors were identified by combining ESTs from the infective stage juveniles with expression profiles of ∼4000 genes in five developmental stages produced by cDNA-AFLP.

Sequence tag identification of intact proteins by matching tanden mass spectral data against sequence data bases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Molecular and fragment ion data of intact 8- to 43-kDa proteins from electrospray Fourier-transform tandem mass spectrometry are matched against the corresponding data in sequence data bases. Extending the sequence tag concept of Mann and Wilm for matching peptides, a partial amino acid sequence in the unknown is first identified from the mass differences of a series of fragment ions, and the mass position of this sequence is defined from molecular weight and the fragment ion masses. For three studied proteins, a single sequence tag retrieved only the correct protein from the data base; a fourth protein required the input of two sequence tags. However, three of the data base proteins differed by having an extra methionine or by missing an acetyl or heme substitution. The positions of these modifications in the protein examined were greatly restricted by the mass differences of its molecular and fragment ions versus those of the data base. To characterize the primary structure of an unknown represented in the data base, this method is fast and specific and does not require prior enzymatic or chemical degradation.

Chloroplast gene sequence data suggest a single origin of the predisposition for symbiotic nitrogen fixation in angiosperms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Of the approximately 380 families of angiosperms, representatives of only 10 are known to form symbiotic associations with nitrogen-fixing bacteria in root nodules. The morphologically based classification schemes proposed by taxonomists suggest that many of these 10 families of plants are only distantly related, engendering the hypothesis that the capacity to fix nitrogen evolved independently several, if not many, times. This has in turn influenced attitudes toward the likelihood of transferring genes responsible for symbiotic nitrogen fixation to crop species lacking this ability. Phylogenetic analysis of DNA sequences for the chloroplast gene rbcL indicates, however, that representatives of all 10 families with nitrogen-fixing symbioses occur together, with several families lacking this association, in a single clade. This study therefore indicates that only one lineage of closely related taxa achieved the underlying genetic architecture necessary for symbiotic nitrogen fixation in root nodules.

The EMBL nucleotide sequence database

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT.

ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches

Relevância:

70.00% 70.00%

Publicador:

Resumo:

There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith–Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith–Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/

Investigation of the bottleneck leading to the domestication of maize

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Maize (Zea mays ssp. mays) is genetically diverse, yet it is also morphologically distinct from its wild relatives. These two observations are somewhat contradictory: the first observation is consistent with a large historical population size for maize, but the latter observation is consistent with strong, diversity-limiting selection during maize domestication. In this study, we sampled sequence diversity, coupled with simulations of the coalescent process, to study the dynamics of a population bottleneck during the domestication of maize. To do this, we determined the DNA sequence of a 1,400-bp region of the Adh1 locus from 19 individuals representing maize, its presumed progenitor (Z. mays ssp. parviglumis), and a more distant relative (Zea luxurians). The sequence data were used to guide coalescent simulations of population bottlenecks associated with domestication. Our study confirms high genetic diversity in maize—maize contains 75% of the variation found in its progenitor and is more diverse than its wild relative, Z. luxurians—but it also suggests that sequence diversity in maize can be explained by a bottleneck of short duration and very small size. For example, the breadth of genetic diversity in maize is consistent with a founding population of only 20 individuals when the domestication event is 10 generations in length.

Pro-phenol oxidase activating proteinase from an insect, Manduca sexta: A bacteria-inducible protein similar to Drosophila easter

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Activation of pro-phenol oxidase (proPO) in insects and crustaceans is important in defense against wounding and infection. The proPO zymogen is activated by a specific proteolytic cleavage. PO oxidizes phenolic compounds to produce quinones, which may help to kill pathogens and can also be used for synthesis of melanin to seal wounds and encapsulate parasites. We have isolated from the tobacco hornworm, Manduca sexta, a serine proteinase that activates proPO, and have cloned its cDNA. The isolated proPO activating proteinase (PAP) hydrolyzed artificial substrates but required other protein factors for proPO activation, suggesting that proPO-activating enzyme may exist as a protein complex, one component of which is PAP. PAP (44 kDa) is composed of two disulfide-linked polypeptide chains (31 kDa and 13 kDa). A cDNA for PAP was isolated from a hemocyte library, by using a PCR-generated probe based on the amino-terminal amino acid sequence of the 31-kDa catalytic domain. PAP belongs to a family of arthropod serine proteinases containing a carboxyl-terminal proteinase domain and an amino-terminal “clip” domain. The member of this family most similar in sequence to PAP is the product of the easter gene from Drosophila melanogaster. PAP mRNA was present at a low level in larval hemocytes and fat body, but became much more abundant in fat body after insects were injected with Escherichia coli. Sequence data and 3H-diisopropyl fluorphosphate labeling results suggest that the same PAP exists in hemolymph and cuticle.

Origin and evolution of the slime molds (Mycetozoa)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Mycetozoa include the cellular (dictyostelid), acellular (myxogastrid), and protostelid slime molds. However, available molecular data are in disagreement on both the monophyly and phylogenetic position of the group. Ribosomal RNA trees show the myxogastrid and dictyostelid slime molds as unrelated early branching lineages, but actin and β-tubulin trees place them together as a single coherent (monophyletic) group, closely related to the animal–fungal clade. We have sequenced the elongation factor-1α genes from one member of each division of the Mycetozoa, including Dictyostelium discoideum, for which cDNA sequences were previously available. Phylogenetic analyses of these sequences strongly support a monophyletic Mycetozoa, with the myxogastrid and dictyostelid slime molds most closely related to each other. All phylogenetic methods used also place this coherent Mycetozoan assemblage as emerging among the multicellular eukaryotes, tentatively supported as more closely related to animals + fungi than are green plants. With our data there are now three proteins that consistently support a monophyletic Mycetozoa and at least four that place these taxa within the “crown” of the eukaryote tree. We suggest that ribosomal RNA data should be more closely examined with regard to these questions, and we emphasize the importance of developing multiple sequence data sets.

Bootstrap confidence levels for phylogenetic trees

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Evolutionary trees are often estimated from DNA or RNA sequence data. How much confidence should we have in the estimated trees? In 1985, Felsenstein [Felsenstein, J. (1985) Evolution 39, 783–791] suggested the use of the bootstrap to answer this question. Felsenstein’s method, which in concept is a straightforward application of the bootstrap, is widely used, but has been criticized as biased in the genetics literature. This paper concerns the use of the bootstrap in the tree problem. We show that Felsenstein’s method is not biased, but that it can be corrected to better agree with standard ideas of confidence levels and hypothesis testing. These corrections can be made by using the more elaborate bootstrap method presented here, at the expense of considerably more computation.

Duplicated genes evolve independently after polyploid formation in cotton

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Of the many processes that generate gene duplications, polyploidy is unique in that entire genomes are duplicated. This process has been important in the evolution of many eukaryotic groups, and it occurs with high frequency in plants. Recent evidence suggests that polyploidization may be accompanied by rapid genomic changes, but the evolutionary fate of discrete loci recently doubled by polyploidy (homoeologues) has not been studied. Here we use locus-specific isolation techniques with comparative mapping to characterize the evolution of homoeologous loci in allopolyploid cotton (Gossypium hirsutum) and in species representing its diploid progenitors. We isolated and sequenced 16 loci from both genomes of the allopolyploid, from both progenitor diploid genomes and appropriate outgroups. Phylogenetic analysis of the resulting 73.5 kb of sequence data demonstrated that for all 16 loci (14.7 kb/genome), the topology expected from organismal history was recovered. In contrast to observations involving repetitive DNAs in cotton, there was no evidence of interaction among duplicated genes in the allopolyploid. Polyploidy was not accompanied by an obvious increase in mutations indicative of pseudogene formation. Additionally, differences in rates of divergence among homoeologues in polyploids and orthologues in diploids were indistinguishable across loci, with significant rate deviation restricted to two putative pseudogenes. Our results indicate that most duplicated genes in allopolyploid cotton evolve independently of each other and at the same rate as those of their diploid progenitors. These indications of genic stasis accompanying polyploidization provide a sharp contrast to recent examples of rapid genomic evolution in allopolyploids.

Ribosomes can slide over and beyond “hungry” codons, resuming protein chain elongation many nucleotides downstream

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In cells subjected to moderate aminoacyl-tRNA limitation, the peptidyl-tRNA–ribosome complex stalled at the “hungry” codon can slide well beyond it on the messenger RNA and resume translation further downstream. This behavior is proved by unequivocal amino acid sequence data, showing a protein that lacks the bypassed sequence encoded between the hungry codon and specific landing sites. The landing sites are codons cognate to the anticodon of the peptidyl-tRNA. The efficiency of this behavior can be as high as 10–20% but declines with the length of the slide. Interposition of “trap” sites (nonproductive landing sites) in the bypassed region reduces the frequency of successful slides, confirming that the ribosome–peptidyl-tRNA complex passes through the untranslated region of the message. This behavior appears to be quite general: it can occur at the two kinds of hungry codons tested, AUA and AAG; the sliding peptidyl-tRNA can be any of three species tested, phenylalanine, tyrosine, or leucine tRNA; the peptidyl component can be either of two very different peptide sequences; and translation can resume at any of the three codons tested.

Linking genome and proteome by mass spectrometry: Large-scale identification of yeast proteins from two dimensional gels

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The function of many of the uncharacterized open reading frames discovered by genomic sequencing can be determined at the level of expressed gene products, the proteome. However, identifying the cognate gene from minute amounts of protein has been one of the major problems in molecular biology. Using yeast as an example, we demonstrate here that mass spectrometric protein identification is a general solution to this problem given a completely sequenced genome. As a first screen, our strategy uses automated laser desorption ionization mass spectrometry of the peptide mixtures produced by in-gel tryptic digestion of a protein. Up to 90% of proteins are identified by searching sequence data bases by lists of peptide masses obtained with high accuracy. The remaining proteins are identified by partially sequencing several peptides of the unseparated mixture by nanoelectrospray tandem mass spectrometry followed by data base searching with multiple peptide sequence tags. In blind trials, the method led to unambiguous identification in all cases. In the largest individual protein identification project to date, a total of 150 gel spots—many of them at subpicomole amounts—were successfully analyzed, greatly enlarging a yeast two-dimensional gel data base. More than 32 proteins were novel and matched to previously uncharacterized open reading frames in the yeast genome. This study establishes that mass spectrometry provides the required throughput, the certainty of identification, and the general applicability to serve as the method of choice to connect genome and proteome.

Coalescent estimates of HIV-1 generation time in vivo

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The generation time of HIV Type 1 (HIV-1) in vivo has previously been estimated using a mathematical model of viral dynamics and was found to be on the order of one to two days per generation. Here, we describe a new method based on coalescence theory that allows the estimate of generation times to be derived by using nucleotide sequence data and a reconstructed genealogy of sequences obtained over time. The method is applied to sequences obtained from a long-term nonprogressing individual at five sampling occasions. The estimate of viral generation time using the coalescent method is 1.2 days per generation and is close to that obtained by mathematical modeling (1.8 days per generation), thus strengthening confidence in estimates of a short viral generation time. Apart from the estimation of relevant parameters relating to viral dynamics, coalescent modeling also allows us to simulate the evolutionary behavior of samples of sequences obtained over time.

Molecular phylogenetic analysis of evolutionary trends in stonefly wing structure and locomotor behavior

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Insects in the order Plecoptera (stoneflies) use a form of two-dimensional aerodynamic locomotion called surface skimming to move across water surfaces. Because their weight is supported by water, skimmers can achieve effective aerodynamic locomotion even with small wings and weak flight muscles. These mechanical features stimulated the hypothesis that surface skimming may have been an intermediate stage in the evolution of insect flight, which has perhaps been retained in certain modern stoneflies. Here we present a phylogeny of Plecoptera based on nucleotide sequence data from the small subunit rRNA (18S) gene. By mapping locomotor behavior and wing structural data onto the phylogeny, we distinguish between the competing hypotheses that skimming is a retained ancestral trait or, alternatively, a relatively recent loss of flight. Our results show that basal stoneflies are surface skimmers, and that various forms of surface skimming are distributed widely across the plecopteran phylogeny. Stonefly wings show evolutionary trends in the number of cross veins and the thickness of the cuticle of the longitudinal veins that are consistent with elaboration and diversification of flight-related traits. These data support the hypothesis that the first stoneflies were surface skimmers, and that wing structures important for aerial flight have become elaborated and more diverse during the radiation of modern stoneflies.

«
1
2
3
4
»