935 resultados para Complete Genome Sequence
Resumo:
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT.
Resumo:
FULL-malaria is a database for a full-length-enriched cDNA library from the human malaria parasite Plasmodium falciparum (http://133.11.149.55/). Because of its medical importance, this organism is the first target for genome sequencing of a eukaryotic pathogen; the sequences of two of its 14 chromosomes have already been determined. However, for the full exploitation of this rapidly accumulating information, correct identification of the genes and study of their expression are essential. Using the oligo-capping method, we have produced a full-length-enriched cDNA library from erythrocytic stage parasites and performed one-pass reading. The database consists of nucleotide sequences of 2490 random clones that include 390 (16%) known malaria genes according to BLASTN analysis of the nr-nt database in GenBank; these represent 98 genes, and the clones for 48 of these genes contain the complete protein-coding sequence (49%). On the other hand, comparisons with the complete chromosome 2 sequence revealed that 35 of 210 predicted genes are expressed, and in addition led to detection of three new gene candidates that were not previously known. In total, 19 of these 38 clones (50%) were full-length. From these observations, it is expected that the database contains ∼1000 genes, including 500 full-length clones. It should be an invaluable resource for the development of vaccines and novel drugs.
Resumo:
The Mouse Genome Database (MGD) is the community database resource for the laboratory mouse, a key model organism for interpreting the human genome and for understanding human biology and disease (http://www.informatics.jax.org). MGD provides standard nomenclature and consensus map positions for mouse genes and genetic markers; it provides a curated set of mammalian homology records, user-defined chromosomal maps, experimental data sets and the definitive mouse ‘gene to sequence’ reference set for the research community. The integration and standardization of these data sets facilitates the transition between mouse DNA sequence, gene and phenotype annotations. A recent focus on allele and phenotype representations enhances the ability of MGD to organize and present data for exploring the relationship between genotype and phenotype. This link between the genome and the biology of the mouse is especially important as phenotype information grows from large mutagenesis projects and genotype information grows from large-scale sequencing projects.
Resumo:
Upon the completion of the Saccharomyces cerevisiae genomic sequence in 1996 [Goffeau,A. et al. (1997) Nature, 387, 5], several creative and ambitious projects have been initiated to explore the functions of gene products or gene expression on a genome-wide scale. To help researchers take advantage of these projects, the Saccharomyces Genome Database (SGD) has created two new tools, Function Junction and Expression Connection. Together, the tools form a central resource for querying multiple large-scale analysis projects for data about individual genes. Function Junction provides information from diverse projects that shed light on the role a gene product plays in the cell, while Expression Connection delivers information produced by the ever-increasing number of microarray projects. WWW access to SGD is available at genome-www.stanford.edu/Saccharomyces/.
Resumo:
The Gene Expression Database (GXD) is a community resource of gene expression information for the laboratory mouse. By combining the different types of expression data, GXD aims to provide increasingly complete information about the expression profiles of genes in different mouse strains and mutants, thus enabling valuable insights into the molecular networks that underlie normal development and disease. GXD is integrated with the Mouse Genome Database (MGD). Extensive interconnections with sequence databases and with databases from other species, and the development and use of shared controlled vocabularies extend GXD’s utility for the analysis of gene expression information. GXD is accessible through the Mouse Genome Informatics web site at http://www.informatic s.jax.org/ or directly at http://www.informatics.jax.org/me nus/expression_menu.shtml.
Resumo:
Viruses with RNA genomes often capture and redirect host cell components to assist in mechanisms particular to RNA-dependent RNA synthesis. The nidoviruses are an order of positive-stranded RNA viruses, comprising coronaviruses and arteriviruses, that employ a unique strategy of discontinuous transcription, producing a series of subgenomic mRNAs linking a 5′ leader to distal portions of the genome. For the prototype coronavirus mouse hepatitis virus (MHV), heterogeneous nuclear ribonucleoprotein (hnRNP) A1 has been shown to be able to bind in vitro to the negative strand of the intergenic sequence, a cis-acting element found in the leader RNA and preceding each downstream ORF in the genome. hnRNP A1 thus has been proposed as a host factor in MHV transcription. To test this hypothesis genetically, we initially constructed MHV mutants with a very high-affinity hnRNP A1 binding site inserted in place of, or adjacent to, an intergenic sequence in the MHV genome. This inserted hnRNP A1 binding site was not able to functionally replace, or enhance transcription from, the intergenic sequence. This finding led us to test more directly the role of hnRNP A1 by analysis of MHV replication and RNA synthesis in a murine cell line that does not express this protein. The cellular absence of hnRNP A1 had no detectable effect on the production of infectious virus, the synthesis of genomic RNA, or the quantity or quality of subgenomic mRNAs. These results strongly suggest that hnRNP A1 is not a required host factor for MHV discontinuous transcription or genome replication.
Resumo:
Candida albicans is a diploid fungus that has become a medically important opportunistic pathogen in immunocompromised individuals. We have sequenced the C. albicans genome to 10.4-fold coverage and performed a comparative genomic analysis between C. albicans and Saccharomyces cerevisiae with the objective of assessing whether Candida possesses a genetic repertoire that could support a complete sexual cycle. Analyzing over 500 genes important for sexual differentiation in S. cerevisiae, we find many homologues of genes that are implicated in the initiation of meiosis, chromosome recombination, and the formation of synaptonemal complexes. However, others are striking in their absence. C. albicans seems to have homologues of all of the elements of a functional pheromone response pathway involved in mating in S. cerevisiae but lacks many homologues of S. cerevisiae genes for meiosis. Other meiotic gene homologues in organisms ranging from filamentous fungi to Drosophila melanogaster and Caenorhabditis elegans were also found in the C. albicans genome, suggesting potential alternative mechanisms of genetic exchange.
Resumo:
Mitochondrial dysfunction can lead to diverse cellular and organismal responses. We used DNA microarrays to characterize the transcriptional responses to different mitochondrial perturbations in Saccharomyces cerevisiae. We examined respiratory-deficient petite cells and respiratory-competent wild-type cells treated with the inhibitors of oxidative phosphorylation antimycin, carbonyl cyanide m-chlorophenylhydrazone, or oligomycin. We show that respiratory deficiency, but not inhibition of mitochondrial ATP synthesis per se, induces a suite of genes associated with both peroxisomal activities and metabolite-restoration (anaplerotic) pathways that would mitigate the loss of a complete tricarboxylic acid cycle. The array data suggested, and direct microscopic observation of cells expressing a derivative of green fluorescent protein with a peroxisomal matrix-targeting signal confirmed, that respiratory deficiency dramatically induces peroxisome biogenesis. Transcript profiling of cells harboring null alleles of RTG1, RTG2, or RTG3, genes known to control signaling from mitochondria to the nucleus, suggests that there are multiple pathways of cross-talk between these organelles in yeast.
Resumo:
The release of vast quantities of DNA sequence data by large-scale genome and expressed sequence tag (EST) projects underlines the necessity for the development of efficient and inexpensive ways to link sequence databases with temporal and spatial expression profiles. Here we demonstrate the power of linking cDNA sequence data (including EST sequences) with transcript profiles revealed by cDNA-AFLP, a highly reproducible differential display method based on restriction enzyme digests and selective amplification under high stringency conditions. We have developed a computer program (GenEST) that predicts the sizes of virtual transcript-derived fragments (TDFs) of in silico-digested cDNA sequences retrieved from databases. The vast majority of the resulting virtual TDFs could be traced back among the thousands of TDFs displayed on cDNA-AFLP gels. Sequencing of the corresponding bands excised from cDNA-AFLP gels revealed no inconsistencies. As a consequence, cDNA sequence databases can be screened very efficiently to identify genes with relevant expression profiles. The other way round, it is possible to switch from cDNA-AFLP gels to sequences in the databases. Using the restriction enzyme recognition sites, the primer extensions and the estimated TDF size as identifiers, the DNA sequence(s) corresponding to a TDF with an interesting expression pattern can be identified. In this paper we show examples in both directions by analyzing the plant parasitic nematode Globodera rostochiensis. Various novel pathogenicity factors were identified by combining ESTs from the infective stage juveniles with expression profiles of ∼4000 genes in five developmental stages produced by cDNA-AFLP.
Resumo:
Understanding the factors responsible for variations in mutation patterns and selection efficacy along chromosomes is a prerequisite for deciphering genome sequences. Population genetics models predict a positive correlation between the efficacy of selection at a given locus and the local rate of recombination because of Hill–Robertson effects. Codon usage is considered one of the most striking examples that support this prediction at the molecular level. In a wide range of species including Caenorhabditis elegans and Drosophila melanogaster, codon usage is essentially shaped by selection acting for translational efficiency. Codon usage bias correlates positively with recombination rate in Drosophila, apparently supporting the hypothesis that selection on codon usage is improved by recombination. Here we present an exhaustive analysis of codon usage in C. elegans and D. melanogaster complete genomes. We show that in both genomes there is a positive correlation between recombination rate and the frequency of optimal codons. However, we demonstrate that in both species, this effect is due to a mutational bias toward G and C bases in regions of high recombination rate, possibly as a direct consequence of the recombination process. The correlation between codon usage bias and recombination rate in these species appears to be essentially determined by recombination-dependent mutational patterns, rather than selective effects. This result highlights that it is necessary to take into account the mutagenic effect of recombination to understand the evolutionary role and impact of recombination.
Resumo:
Progress in agricultural and environmental technologies is hampered by a slower rate of gene discovery in plants than animals. The vast pool of genes in plants, however, will be an important resource for insertion of genes, via biotechnological procedures, into an array of plants, generating unique germ plasms not achievable by conventional breeding. It just became clear that genomes of grasses have evolved in a manner analogous to Lego blocks. Large chromosome segments have been reshuffled and stuffer pieces added between genes. Although some genomes have become very large, the genome with the fewest stuffer pieces, the rice genome, is the Rosetta Stone of all the bigger grass genomes. This means that sequencing the rice genome as anchor genome of the grasses will provide instantaneous access to the same genes in the same relative physical position in other grasses (e.g., corn and wheat), without the need to sequence each of these genomes independently. (i) The sequencing of the entire genome of rice as anchor genome for the grasses will accelerate plant gene discovery in many important crops (e.g., corn, wheat, and rice) by several orders of magnitudes and reduce research and development costs for government and industry at a faster pace. (ii) Costs for sequencing entire genomes have come down significantly. Because of its size, rice is only 12% of the human or the corn genome, and technology improvements by the human genome project are completely transferable, translating in another 50% reduction of the costs. (iii) The physical mapping of the rice genome by a group of Japanese researchers provides a jump start for sequencing the genome and forming an international consortium. Otherwise, other countries would do it alone and own proprietary positions.
Resumo:
Since 1991, the Rice Genome Research Program in Japan has carried out rice genomics, such as large-scale cDNA analysis, construction of a fine-scale restriction fragment length polymorphism map, and physical mapping of the rice genome with yeast artificial chromosome clones. These studies have made a great impact on research into grass genomes and made rice a model plant for other cereal crop research. Starting in 1998, the Rice Genome Research Program will step into a new stage of genomics—that of genome sequencing. This project eventually should reveal all of the genomic sequence information in the rice plant and be an indispensable aid in understanding the genomics of other grass species.
Resumo:
A whole genome cattle-hamster radiation hybrid cell panel was used to construct a map of 54 markers located on bovine chromosome 5 (BTA5). Of the 54 markers, 34 are microsatellites selected from the cattle linkage map and 20 are genes. Among the 20 mapped genes, 10 are new assignments that were made by using the comparative mapping by annotation and sequence similarity strategy. A LOD-3 radiation hybrid framework map consisting of 21 markers was constructed. The relatively low retention frequency of markers on this chromosome (19%) prevented unambiguous ordering of the other 33 markers. The length of the map is 398.7 cR, corresponding to a ratio of ≈2.8 cR5,000/cM. Type I genes were binned for comparison of gene order among cattle, humans, and mice. Multiple internal rearrangements within conserved syntenic groups were apparent upon comparison of gene order on BTA5 and HSA12 and HSA22. A similarly high number of rearrangements were observed between BTA5 and MMU6, MMU10, and MMU15. The detailed comparative map of BTA5 should facilitate identification of genes affecting economically important traits that have been mapped to this chromosome and should contribute to our understanding of mammalian chromosome evolution.
Resumo:
Following transcription and splicing, each mRNA of a mammalian cell passes into the cytoplasm where its fate is in the hands of a complex network of ribonucleoproteins (mRNPs). The success or failure of a gene to be expressed depends on the performance of this mRNP infrastructure. The entry, gating, processing, and transit of each mRNA through an mRNP network helps determine the composition of a cell's proteome. The machinery that regulates storage, turnover, and translational activation of mRNAs is not well understood, in part, because of the heterogeneous nature of mRNPs. Recently, subsets of cellular mRNAs clustered as members of mRNP complexes have been identified by using antibodies reactive with RNA-binding proteins, including ELAV/Hu, eIF-4E, and poly(A)-binding proteins. Cytoplasmic ELAV/Hu proteins are involved in the stability and translation of early response gene (ERG) transcripts and are expressed predominately in neurons. mRNAs recovered from ELAV/Hu mRNP complexes were found to have similar sequence elements, suggesting a common structural linkage among them. This approach opens the possibility of identifying transcripts physically clustered in vivo that may have similar fates or functions. Moreover, the proteins encoded by physically organized mRNAs may participate in the same biological process or structural outcome, not unlike operons and their polycistronic mRNAs do in prokaryotic organisms. Our goal is to understand the organization and flow of genetic information on an integrative systems level by analyzing the collective properties of proteins and mRNAs associated with mRNPs in vivo.
Resumo:
A key step in the regulation of networks that control gene expression is the sequence-specific binding of transcription factors to their DNA recognition sites. A more complete understanding of these DNA–protein interactions will permit a more comprehensive and quantitative mapping of the regulatory pathways within cells, as well as a deeper understanding of the potential functions of individual genes regulated by newly identified DNA-binding sites. Here we describe a DNA microarray-based method to characterize sequence-specific DNA recognition by zinc-finger proteins. A phage display library, prepared by randomizing critical amino acid residues in the second of three fingers of the mouse Zif268 domain, provided a rich source of zinc-finger proteins with variant DNA-binding specificities. Microarrays containing all possible 3-bp binding sites for the variable zinc fingers permitted the quantitation of the binding site preferences of the entire library, pools of zinc fingers corresponding to different rounds of selection from this library, as well as individual Zif268 variants that were isolated from the library by using specific DNA sequences. The results demonstrate the feasibility of using DNA microarrays for genome-wide identification of putative transcription factor-binding sites.