102 resultados para Genomic sequence database
em National Center for Biotechnology Information - NCBI
Resumo:
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT.
Resumo:
The IMGT/HLA Database (www.ebi.ac.uk/imgt/hla/) specialises in sequences of polymorphic genes of the HLA system, the human major histocompatibility complex (MHC). The HLA complex is located within the 6p21.3 region on the short arm of human chromosome 6 and contains more than 220 genes of diverse function. Many of the genes encode proteins of the immune system and these include the 21 highly polymorphic HLA genes, which influence the outcome of clinical transplantation and confer susceptibility to a wide range of non-infectious diseases. The database contains sequences for all HLA alleles officially recognised by the WHO Nomenclature Committee for Factors of the HLA System and provides users with online tools and facilities for their retrieval and analysis. These include allele reports, alignment tools and detailed descriptions of the source cells. The online IMGT/HLA submission tool allows both new and confirmatory sequences to be submitted directly to the WHO Nomenclature Committee. The latest version (release 1.7.0 July 2000) contains 1220 HLA alleles derived from over 2700 component sequences from the EMBL/GenBank/DDBJ databases. The HLA database provides a model which will be extended to provide specialist databases for polymorphic MHC genes of other species.
Resumo:
The HIV Reverse Transcriptase and Protease Sequence Database is an on-line relational database that catalogs evolutionary and drug-related sequence variation in the human immunodeficiency virus (HIV) reverse transcriptase (RT) and protease enzymes, the molecular targets of anti-HIV therapy (http://hivdb.stanford.edu). The database contains a compilation of nearly all published HIV RT and protease sequences, including submissions from International Collaboration databases and sequences published in journal articles. Sequences are linked to data about the source of the sequence sample and the antiretroviral drug treatment history of the individual from whom the isolate was obtained. During the past year 3500 sequences have been added and the data model has been expanded to include drug susceptibility data on sequenced isolates. Database content has also been integrated with didactic text and the output of two sequence analysis programs.
Resumo:
We present here the complete genome sequence of a common avian clone of Pasteurella multocida, Pm70. The genome of Pm70 is a single circular chromosome 2,257,487 base pairs in length and contains 2,014 predicted coding regions, 6 ribosomal RNA operons, and 57 tRNAs. Genome-scale evolutionary analyses based on pairwise comparisons of 1,197 orthologous sequences between P. multocida, Haemophilus influenzae, and Escherichia coli suggest that P. multocida and H. influenzae diverged ≈270 million years ago and the γ subdivision of the proteobacteria radiated about 680 million years ago. Two previously undescribed open reading frames, accounting for ≈1% of the genome, encode large proteins with homology to the virulence-associated filamentous hemagglutinin of Bordetella pertussis. Consistent with the critical role of iron in the survival of many microbial pathogens, in silico and whole-genome microarray analyses identified more than 50 Pm70 genes with a potential role in iron acquisition and metabolism. Overall, the complete genomic sequence and preliminary functional analyses provide a foundation for future research into the mechanisms of pathogenesis and host specificity of this important multispecies pathogen.
Resumo:
The Plasmodium falciparum Genome Database (http://PlasmoDB.org) integrates sequence information, automated analyses and annotation data emerging from the P.falciparum genome sequencing consortium. To date, raw sequence coverage is available for >90% of the genome, and two chromosomes have been finished and annotated. Data in PlasmoDB are organized by chromosome (1–14), and can be accessed using a variety of tools for graphical and text-based browsing or downloaded in various file formats. The GUS (Genomics Unified Schema) implementation of PlasmoDB provides a multi-species genomic relational database, incorporating data from human and mouse, as well as P.falciparum. The relational schema uses a highly structured format to accommodate diverse data sets related to genomic sequence and gene expression. Tools have been designed to facilitate complex biological queries, including many that are specific to Plasmodium parasites and malaria as a disease. Additional projects seek to integrate genomic information with the rich data sets now becoming available for RNA transcription, protein expression, metabolic pathways, genetic and physical mapping, antigenic and population diversity, and phylogenetic relationships with other apicomplexan parasites. The overall goal of PlasmoDB is to facilitate Internet- and CD-ROM-based access to both finished and unfinished sequence information by the global malaria research community.
Resumo:
There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith–Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith–Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/
Resumo:
Upon the completion of the Saccharomyces cerevisiae genomic sequence in 1996 [Goffeau,A. et al. (1997) Nature, 387, 5], several creative and ambitious projects have been initiated to explore the functions of gene products or gene expression on a genome-wide scale. To help researchers take advantage of these projects, the Saccharomyces Genome Database (SGD) has created two new tools, Function Junction and Expression Connection. Together, the tools form a central resource for querying multiple large-scale analysis projects for data about individual genes. Function Junction provides information from diverse projects that shed light on the role a gene product plays in the cell, while Expression Connection delivers information produced by the ever-increasing number of microarray projects. WWW access to SGD is available at genome-www.stanford.edu/Saccharomyces/.
Resumo:
The intensely studied MHC has become the paradigm for understanding the architectural evolution of vertebrate multigene families. The 4-Mb human MHC (also known as the HLA complex) encodes genes critically involved in the immune response, graft rejection, and disease susceptibility. Here we report the continuous 1,796,938-bp genomic sequence of the HLA class I region, linking genes between MICB and HLA-F. A total of 127 genes or potentially coding sequences were recognized within the analyzed sequence, establishing a high gene density of one per every 14.1 kb. The identification of 758 microsatellite provides tools for high-resolution mapping of HLA class I-associated disease genes. Most importantly, we establish that the repeated duplication and subsequent diversification of a minimal building block, MIC-HCGIX-3.8–1-P5-HCGIV-HLA class I-HCGII, engendered the present-day MHC. That the currently nonessential HLA-F and MICE genes have acted as progenitors to today’s immune-competent HLA-ABC and MICA/B genes provides experimental evidence for evolution by “birth and death,” which has general relevance to our understanding of the evolutionary forces driving vertebrate multigene families.
Resumo:
Null mutations at the misato locus of Drosophila melanogaster are associated with irregular chromosomal segregation at cell division. The consequences for morphogenesis are that mutant larvae are almost devoid of imaginal disk tissue, have a reduction in brain size, and die before the late third-instar larval stage. To analyze these findings, we isolated cDNAs in and around the misato locus, mapped the breakpoints of chromosomal deficiencies, determined which transcript corresponded to the misato gene, rescued the cell division defects in transgenic organisms, and sequenced the genomic DNA. Database searches revealed that misato codes for a novel protein, the N-terminal half of which contains a mixture of peptide motifs found in α-, β-, and γ-tubulins, as well as a motif related to part of the myosin heavy chain proteins. The sequence characteristics of misato indicate either that it arose from an ancestral tubulin-like gene, different parts of which underwent convergent evolution to resemble motifs in the conventional tubulins, or that it arose by the capture of motifs from different tubulin genes. The Saccharomyces cerevisiae genome lacks a true homolog of the misato gene, and this finding highlights the emerging problem of assigning functional attributes to orphan genes that occur only in some evolutionary lineages.
Resumo:
We examined the MLL genomic translocation breakpoint in acute myeloid leukemia of infant twins. Southern blot analysis in both cases showed two identical MLL gene rearrangements indicating chromosomal translocation. The rearrangements were detectable in the second twin before signs of clinical disease and the intensity relative to the normal fragment indicated that the translocation was not constitutional. Fluorescence in situ hybridization with an MLL-specific probe and karyotype analyses suggested t(11;22)(q23;q11.2) disrupting MLL. Known 5′ sequence from MLL but unknown 3′ sequence from chromosome band 22q11.2 formed the breakpoint junction on the der(11) chromosome. We used panhandle variant PCR to clone the translocation breakpoint. By ligating a single-stranded oligonucleotide that was homologous to known 5′ MLL genomic sequence to the 5′ ends of BamHI-digested DNA through a bridging oligonucleotide, we formed the stem–loop template for panhandle variant PCR which yielded products of 3.9 kb. The MLL genomic breakpoint was in intron 7. The sequence of the partner DNA from band 22q11.2 was identical to the hCDCrel (human cell division cycle related) gene that maps to the region commonly deleted in DiGeorge and velocardiofacial syndromes. Both MLL and hCDCrel contained homologous CT, TTTGTG, and GAA sequences within a few base pairs of their respective breakpoints, which may have been important in uniting these two genes by translocation. Reverse transcriptase-PCR amplified an in-frame fusion of MLL exon 7 to hCDCrel exon 3, indicating that an MLL-hCDCrel chimeric mRNA had been transcribed. Panhandle variant PCR is a powerful strategy for cloning translocation breakpoints where the partner gene is undetermined. This application of the method identified a region of chromosome band 22q11.2 involved in both leukemia and a constitutional disorder.
Resumo:
Expressed sequence tags (ESTs) are randomly sequenced cDNA clones. Currently, nearly 3 million human and 2 million mouse ESTs provide valuable resources that enable researchers to investigate the products of gene expression. The EST databases have proven to be useful tools for detecting homologous genes, for exon mapping, revealing differential splicing, etc. With the increasing availability of large amounts of poorly characterised eukaryotic (notably human) genomic sequence, ESTs have now become a vital tool for gene identification, sometimes yielding the only unambiguous evidence for the existence of a gene expression product. However, BLAST-based Web servers available to the general user have not kept pace with these developments and do not provide appropriate tools for querying EST databases with large highly spliced genes, often spanning 50 000–100 000 bases or more. Here we describe Gene2EST (http://woody.embl-heidelberg.de/gene2est/), a server that brings together a set of tools enabling efficient retrieval of ESTs matching large DNA queries and their subsequent analysis. RepeatMasker is used to mask dispersed repetitive sequences (such as Alu elements) in the query, BLAST2 for searching EST databases and Artemis for graphical display of the findings. Gene2EST combines these components into a Web resource targeted at the researcher who wishes to study one or a few genes to a high level of detail.
Resumo:
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources that operate on the data in GenBank and a variety of other biological data made available through NCBI’s Web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, HomoloGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing, Human MapViewer, GeneMap’99, Human–Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, Cancer Genome Anatomy Project (CGAP), SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB) and the Conserved Domain Database (CDD). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov.
Resumo:
The TEL (ETV6)−AML1 (CBFA2) gene fusion is the most common reciprocal chromosomal rearrangement in childhood cancer occurring in ≈25% of the most predominant subtype of leukemia— common acute lymphoblastic leukemia. The TEL-AML1 genomic sequence has been characterized in a pair of monozygotic twins diagnosed at ages 3 years, 6 months and 4 years, 10 months with common acute lymphoblastic leukemia. The twin leukemic DNA shared the same unique (or clonotypic) but nonconstitutive TEL-AML1 fusion sequence. The most plausible explanation for this finding is a single cell origin of the TEL-AML fusion in one fetus in utero, probably as a leukemia-initiating mutation, followed by intraplacental metastasis of clonal progeny to the other twin. Clonal identity is further supported by the finding that the leukemic cells in the two twins shared an identical rearranged IGH allele. These data have implications for the etiology and natural history of childhood leukemia.
Resumo:
A computational system for the prediction of polymorphic loci directly and efficiently from human genomic sequence was developed and verified. A suite of programs, collectively called pompous (polymorphic marker prediction of ubiquitous simple sequences) detects tandem repeats ranging from dinucleotides up to 250 mers, scores them according to predicted level of polymorphism, and designs appropriate flanking primers for PCR amplification. This approach was validated on an approximately 750-kilobase region of human chromosome 3p21.3, involved in lung and breast carcinoma homozygous deletions. Target DNA from 36 paired B lymphoblastoid and lung cancer lines was amplified and allelotyped for 33 loci predicted by pompous to be variable in repeat size. We found that among those 36 predominately Caucasian individuals 22 of the 33 (67%) predicted loci were polymorphic with an average heterozygosity of 0.42. Allele loss in this region was found in 27/36 (75%) of the tumor lines using these markers. pompous provides the genetic researcher with an additional tool for the rapid and efficient identification of polymorphic markers, and through a World Wide Web site, investigators can use pompous to identify polymorphic markers for their research. A catalog of 13,261 potential polymorphic markers and associated primer sets has been created from the analysis of 141,779,504 base pairs of human genomic sequence in GenBank. This data is available on our Web site (pompous.swmed.edu) and will be updated periodically as GenBank is expanded and algorithm accuracy is improved.
Resumo:
The Chinese hamster ovary (CHO) mutant UV40 cell line is hypersensitive to UV and ionizing radiation, simple alkylating agents, and DNA cross-linking agents. The mutant cells also have a high level of spontaneous chromosomal aberrations and 3-fold elevated sister chromatid exchange. We cloned and sequenced a human cDNA, designated XRCC9, that partially corrected the hypersensitivity of UV40 to mitomycin C, cisplatin, ethyl methanesulfonate, UV, and γ-radiation. The spontaneous chromosomal aberrations in XRCC9 cDNA transformants were almost fully corrected whereas sister chromatid exchanges were unchanged. The XRCC9 genomic sequence was cloned and mapped to chromosome 9p13. The translated XRCC9 sequence of 622 amino acids has no similarity with known proteins. The 2.5-kb XRCC9 mRNA seen in the parental cells was undetectable in UV40 cells. The mRNA levels in testis were up to 10-fold higher compared with other human tissues and up to 100-fold higher compared with other baboon tissues. XRCC9 is a candidate tumor suppressor gene that might operate in a postreplication repair or a cell cycle checkpoint function.