891 resultados para sequence database
Resumo:
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT.
Resumo:
The IMGT/HLA Database (www.ebi.ac.uk/imgt/hla/) specialises in sequences of polymorphic genes of the HLA system, the human major histocompatibility complex (MHC). The HLA complex is located within the 6p21.3 region on the short arm of human chromosome 6 and contains more than 220 genes of diverse function. Many of the genes encode proteins of the immune system and these include the 21 highly polymorphic HLA genes, which influence the outcome of clinical transplantation and confer susceptibility to a wide range of non-infectious diseases. The database contains sequences for all HLA alleles officially recognised by the WHO Nomenclature Committee for Factors of the HLA System and provides users with online tools and facilities for their retrieval and analysis. These include allele reports, alignment tools and detailed descriptions of the source cells. The online IMGT/HLA submission tool allows both new and confirmatory sequences to be submitted directly to the WHO Nomenclature Committee. The latest version (release 1.7.0 July 2000) contains 1220 HLA alleles derived from over 2700 component sequences from the EMBL/GenBank/DDBJ databases. The HLA database provides a model which will be extended to provide specialist databases for polymorphic MHC genes of other species.
Resumo:
The HIV Reverse Transcriptase and Protease Sequence Database is an on-line relational database that catalogs evolutionary and drug-related sequence variation in the human immunodeficiency virus (HIV) reverse transcriptase (RT) and protease enzymes, the molecular targets of anti-HIV therapy (http://hivdb.stanford.edu). The database contains a compilation of nearly all published HIV RT and protease sequences, including submissions from International Collaboration databases and sequences published in journal articles. Sequences are linked to data about the source of the sequence sample and the antiretroviral drug treatment history of the individual from whom the isolate was obtained. During the past year 3500 sequences have been added and the data model has been expanded to include drug susceptibility data on sequenced isolates. Database content has also been integrated with didactic text and the output of two sequence analysis programs.
Resumo:
With rapid advances in video processing technologies and ever fast increments in network bandwidth, the popularity of video content publishing and sharing has made similarity search an indispensable operation to retrieve videos of user interests. The video similarity is usually measured by the percentage of similar frames shared by two video sequences, and each frame is typically represented as a high-dimensional feature vector. Unfortunately, high complexity of video content has posed the following major challenges for fast retrieval: (a) effective and compact video representations, (b) efficient similarity measurements, and (c) efficient indexing on the compact representations. In this paper, we propose a number of methods to achieve fast similarity search for very large video database. First, each video sequence is summarized into a small number of clusters, each of which contains similar frames and is represented by a novel compact model called Video Triplet (ViTri). ViTri models a cluster as a tightly bounded hypersphere described by its position, radius, and density. The ViTri similarity is measured by the volume of intersection between two hyperspheres multiplying the minimal density, i.e., the estimated number of similar frames shared by two clusters. The total number of similar frames is then estimated to derive the overall similarity between two video sequences. Hence the time complexity of video similarity measure can be reduced greatly. To further reduce the number of similarity computations on ViTris, we introduce a new one dimensional transformation technique which rotates and shifts the original axis system using PCA in such a way that the original inter-distance between two high-dimensional vectors can be maximally retained after mapping. An efficient B+-tree is then built on the transformed one dimensional values of ViTris' positions. Such a transformation enables B+-tree to achieve its optimal performance by quickly filtering a large portion of non-similar ViTris. Our extensive experiments on real large video datasets prove the effectiveness of our proposals that outperform existing methods significantly.
Resumo:
The Eukaryotic Promoter Database (EPD) is an annotated non-redundant collection of eukaryotic POL II promoters for which the transcription start site has been determined experimentally. Access to promoter sequences is provided by pointers to positions in nucleotide sequence entries. The annotation part of an entry includes a description of the initiation site mapping data, exhaustive cross-references to the EMBL nucleotide sequence database, SWISS-PROT, TRANSFAC and other databases, as well as bibliographic references. EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis. WWW-based interfaces have been developed that enable the user to view EPD entries in different formats, to select and extract promoter sequences according to a variety of criteria, and to navigate to related databases exploiting different cross-references. The EPD web site also features yearly updated base frequency matrices for major eukaryotic promoter elements. EPD can be accessed at http://www.epd.isb-sib.ch
Resumo:
The construction of metagenomic libraries has permitted the study of microorganisms resistant to isolation and the analysis of 16S rDNA sequences has been used for over two decades to examine bacterial biodiversity. Here, we show that the analysis of random sequence reads (RSRs) instead of 16S is a suitable shortcut to estimate the biodiversity of a bacterial community from metagenomic libraries. We generated 10,010 RSRs from a metagenomic library of microorganisms found in human faecal samples. Then searched them using the program BLASTN against a prokaryotic sequence database to assign a taxon to each RSR. The results were compared with those obtained by screening and analysing the clones containing 16S rDNA sequences in the whole library. We found that the biodiversity observed by RSR analysis is consistent with that obtained by 16S rDNA. We also show that RSRs are suitable to compare the biodiversity between different metagenomic libraries. RSRs can thus provide a good estimate of the biodiversity of a metagenomic library and, as an alternative to 16S, this approach is both faster and cheaper.
Resumo:
We report the cloning and characterization of a long interspersed nucleotide element (LINE) fi-om a cichlid fish, Oreochromis niloticus, and show the distribution of this element, called CiLINE2 for cichlid LINE2, in the chromosomes of this species. The identification of an open reading frame in CiLINE2 with amino acid sequence similarity to reverse transcriptases encoded by LINE-like elements in Caenorhabditis elegans, Platemys spixii, Schistosoma mansoni, Gallus gallus (CRI), Drosophila melanogaster (I factor), and Homo sapiens (LINE2), as well as the structure of the element, suggest it is a member of this family of non-long terminal repeat-containing retrotransposons. Search of a DNA sequence database identified sequences similar to CiLINE2 in four other fish species (Haplotaxodon microlepis, Oreochromis mossambicus, Pseudotropheus zebra, and Fugu rubripes). Southern blot hybridization experiments revealed the presence of sequences similar to CiLINE2 in all Tilapiini species analyzed from the genera Oreochromis, Tilapia, and Sarotherodon, and gave an estimated copy number of about 5500 for the haploid genome of O. niloticus. Fluorescent in situ hybridization showed that CiLINE2 sequences were organized in small clusters dispersed over all chromosomes of O. niloticus, with a higher concentration near chromosome ends. Furthermore the long arm of chromosome 1 was strikingly enriched with this sequence. The distribution of LINE2-related elements might underlie the difference in chromosome banding patterns observed between cold-blooded vertebrates and mammals.
Resumo:
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources that operate on the data in GenBank and a variety of other biological data made available through NCBI’s Web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, HomoloGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing, Human MapViewer, GeneMap’99, Human–Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, Cancer Genome Anatomy Project (CGAP), SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB) and the Conserved Domain Database (CDD). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov.
Resumo:
We present a novel maximum-likelihood-based algorithm for estimating the distribution of alignment scores from the scores of unrelated sequences in a database search. Using a new method for measuring the accuracy of p-values, we show that our maximum-likelihood-based algorithm is more accurate than existing regression-based and lookup table methods. We explore a more sophisticated way of modeling and estimating the score distributions (using a two-component mixture model and expectation maximization), but conclude that this does not improve significantly over simply ignoring scores with small E-values during estimation. Finally, we measure the classification accuracy of p-values estimated in different ways and observe that inaccurate p-values can, somewhat paradoxically, lead to higher classification accuracy. We explain this paradox and argue that statistical accuracy, not classification accuracy, should be the primary criterion in comparisons of similarity search methods that return p-values that adjust for target sequence length.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The primary mission of UniProt is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is updated and distributed every 3 weeks and can be accessed online for searches or download at http://www.uniprot.org.
Resumo:
The primary mission of Universal Protein Resource (UniProt) is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is updated and distributed every 4 weeks and can be accessed online for searches or download at http://www.uniprot.org.
Resumo:
The importance of the study of acetic bacteria, on species of the Gluconobacter genus is based on its industrial application, as these possess the capacity of bioconversion of sorbitol to sorbose, enabling the process of vitamin C production. The study involved samples collected in industries of soft drinks, flowers, fruits and honey, followed by purification, phenotypic identification, molecular identification with the use of primer defined from Nucleotide Sequence Database consultation. Strains preserved were identified as members of the Acetobacteraceae family, Gluconobacter genus. 110 strains had been isolated of substrate: Pyrostegia venusta (ker-gawler), honey, Vitis vinifera (grape), Pyrus communis (pear), Malus sp. (apple) and in two samples of soft drinks. Of this total 57 strains had been recovered in manitol medium (manitol, yeast extract, peptone), 12 in YMG medium (glucose, manitol, yeast extract, ethanol, acetic acid), 41 in enrichment medium (De Ley and Swings) and later in the GYC medium (glucose, yeast extract and calcium carbonate). 68 strains were identified as Gram negative bacilli rods. Of these, 31 were characterized biochemically as belonging to the Acetobacteriaceae family as they were catalase positive, oxidase negative and producers of acid from glucose. The characterization of these strains was complemented with the biochemistry tests: gelatin liquefaction, nitrate reduction, indole and H2S production, oxidation of ethanol to acetic acid and molecular tests for genus identification. Only eight strains were characterized as pertaining to the Gluconobacter genus. The strains are maintained in collection cultures at the Microbiology Laboratory of the Biology Department at the São Paulo State University (UNESP) in Assis, stored in malt extract at -196 ºC.
Resumo:
Background: MS-based proteomics was applied to the analysis of the medicinal plant Artemisia annua, exploiting a recently published contig sequence database (Graham et al. (2010) Science 327, 328–331) and other genomic and proteomic sequence databases for comparison. A. annua is the predominant natural source of artemisinin, the precursor for artemisinin-based combination therapies (ACTs), which are the WHO-recommended treatment for P. falciparum malaria. Results: The comparison of various databases containing A. annua sequences (NCBInr/viridiplantae, UniProt/ viridiplantae, UniProt/A. annua, an A. annua trichome Trinity contig database, the above contig database and another A. annua EST database) revealed significant differences in respect of their suitability for proteomic analysis, showing that an organism-specific database that has undergone extensive curation, leading to longer contig sequences, can greatly increase the number of true positive protein identifications, while reducing the number of false positives. Compared to previously published data an order-of-magnitude more proteins have been identified from trichome-enriched A. annua samples, including proteins which are known to be involved in the biosynthesis of artemisinin, as well as other highly abundant proteins, which suggest additional enzymatic processes occurring within the trichomes that are important for the biosynthesis of artemisinin. Conclusions: The newly gained information allows for the possibility of an enzymatic pathway, utilizing peroxidases, for the less well understood final stages of artemisinin’s biosynthesis, as an alternative to the known non-enzymatic in vitro conversion of dihydroartemisinic acid to artemisinin. Data are available via ProteomeXchange with identifier PXD000703.
Resumo:
A importância do estudo de bactérias acéticas, em especial as do gênero Gluconobacter, está baseada em suas aplicações industriais, pois estas possuem a capacidade de bioconversão de sorbitol a sorbose, viabilizando o processo de produção de vitamina C. O estudo envolveu coletas de amostras em indústrias de refrigerante, flores, frutos e mel, seguidas de purificação, identificação fenotípica e identificação molecular, com a utilização de iniciador definido a partir de consulta ao Nucleotide Sequence Database. Preservaram-se as linhagens identificadas como membros da família Acetobacteriaceae, gênero Gluconobacter. Foi isolado um total de 110 linhagens dos substratos: Pyrostegia venusta (Cipó de São João), mel, Vitis vinifera (uva), Pyrus communis (pêra), Malus sp. (maçã) e de duas amostras de refrigerantes envasados em embalagens de PET de 2 L. Deste total, 57 linhagens foram recuperadas em meio MYP (manitol, extrato de levedura, peptona), 12 em meio YGM (glicose, manitol, extrato de levedura, etanol, ácido acético), 41 em meio de enriquecimento e, posteriormente, em meio GYC (glicose, extrato de levedura e carbonato de cálcio). Obtiveram-se 68 linhagens identificadas como bastonetes Gram negativos. Destas, 31 foram caracterizadas bioquimicamente como pertencentes à família Acetobacteriaceae por serem catalase positivas, oxidase negativas e produtoras de ácido a partir de glicose. A caracterização dessas linhagens foi complementada com os testes bioquímicos: liquefação da gelatina, redução de nitrato, formação de indol e H2S e oxidação de etanol a ácido acético. Métodos moleculares foram aplicados para identificação do gênero Gluconobacter. Finalmente, oito linhagens foram caracterizadas como pertencentes ao gênero Gluconobacter. As linhagens encontram-se depositadas em coleção de cultura do laboratório de Microbiologia do Departamento de Biologia da UNESP, campus de Assis, estocadas em extrato de malte 20 a -196 ºC.