956 resultados para Genomic sequence database


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Despite the success of conventional Sanger sequencing, significant regions of many genomes still present major obstacles to sequencing. Here we propose a novel approach with the potential to alleviate a wide range of sequencing difficulties. The technique involves extracting target DNA sequence from variants generated by introduction of random mutations. The introduction of mutations does not destroy original sequence information, but distributes it amongst multiple variants. Some of these variants lack problematic features of the target and are more amenable to conventional sequencing. The technique has been successfully demonstrated with mutation levels up to an average 18% base substitution and has been used to read previously intractable poly(A), AT-rich and GC-rich motifs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Our previous studies have shown that two distinct genotypes of Sindbis (SIN) virus occur in Australia. One of these, the Oriental/Australian type, circulates throughout most of the Australian continent, whereas the recently identified south-west (SW) genetic type appears to be restricted to a distinct geographic region located in the temperate south-west of Australia. We have now determined the complete nucleotide and translated amino acid sequences of a SW isolate of SIN virus (SW6562) and performed comparative analyses with other SIN viruses at the genomic level. The genome of SW6562 is 11,569 nucleotides in length, excluding the cap nucleotide and poly (A) tail. Overall this virus differs from the prototype SIN virus (strain AR339) by 23% in nucleotide sequence and 12.5% in amino acid sequence. Partial sequences of four regions of the genome of four SW isolates were determined and compared with the corresponding sequences from a number of SIN isolates from different regions of the World. These regions are the non-structural protein (nsP3), the E2 gene, the capsid gene, and the repeated sequence elements (RSE) of the 3'UTR. These comparisons revealed that the SW SIN viruses were more closely related to South African and European strains than to other Australian isolates of SIN virus. Thus the SW genotype of SIN virus may have been introduced into this region of Australia by viremic humans or migratory birds and subsequently evolved independently in the region. The sequence data also revealed that the SW genotype contains a unique deletion in the RSE of the 3'UTR region of the genome. Previous studies have shown that deletions in this region of the SIN genome can have significant effects on virus replication in mosquito and avian cells, which may explain the restricted distribution of this genotype of SIN virus.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The hepatitis B virus (HBV) is among the leading causes of chronic hepatitis, cirrhosis and hepatocellular carcinoma. In Brazil, genotype A is the most frequent, followed by genotypes D and F. Genotypes B and C are found in Brazil exclusively among Asian patients and their descendants. The aim of this study was to sequence the entire HBV genome of a Caucasian patient infected with HBV/C2 and to infer the origin of the virus based on sequencing analysis. The sequence of this Brazilian isolate was grouped with four other sequences described in China. The sequence of this patient is the first complete genome of HBV/C2 reported in Brazil.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Currently, the Genomic Threading Database (GTD) contains structural assignments for the proteins encoded within the genomes of nine eukaryotes and 101 prokaryotes. Structural annotations are carried out using a modified version of GenTHREADER, a reliable fold recognition method. The Gen THREADER annotation jobs are distributed across multiple clusters of processors using grid technology and the predictions are deposited in a relational database accessible via a web interface at http://bioinf.cs.ucl.ac.uk/GTD. Using this system, up to 84% of proteins encoded within a genome can be confidently assigned to known folds with 72% of the residues aligned. On average in the GTD, 64% of proteins encoded within a genome are confidently assigned to known folds and 58% of the residues are aligned to structures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Genomic Threading Database currently contains structural annotations for the genomes of over 100 recently sequenced organisms. Annotations are carried out by using our modified GenTHREADER software and through implementing grid technology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The IMGT/HLA Database (www.ebi.ac.uk/imgt/hla/) specialises in sequences of polymorphic genes of the HLA system, the human major histocompatibility complex (MHC). The HLA complex is located within the 6p21.3 region on the short arm of human chromosome 6 and contains more than 220 genes of diverse function. Many of the genes encode proteins of the immune system and these include the 21 highly polymorphic HLA genes, which influence the outcome of clinical transplantation and confer susceptibility to a wide range of non-infectious diseases. The database contains sequences for all HLA alleles officially recognised by the WHO Nomenclature Committee for Factors of the HLA System and provides users with online tools and facilities for their retrieval and analysis. These include allele reports, alignment tools and detailed descriptions of the source cells. The online IMGT/HLA submission tool allows both new and confirmatory sequences to be submitted directly to the WHO Nomenclature Committee. The latest version (release 1.7.0 July 2000) contains 1220 HLA alleles derived from over 2700 component sequences from the EMBL/GenBank/DDBJ databases. The HLA database provides a model which will be extended to provide specialist databases for polymorphic MHC genes of other species.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The HIV Reverse Transcriptase and Protease Sequence Database is an on-line relational database that catalogs evolutionary and drug-related sequence variation in the human immunodeficiency virus (HIV) reverse transcriptase (RT) and protease enzymes, the molecular targets of anti-HIV therapy (http://hivdb.stanford.edu). The database contains a compilation of nearly all published HIV RT and protease sequences, including submissions from International Collaboration databases and sequences published in journal articles. Sequences are linked to data about the source of the sequence sample and the antiretroviral drug treatment history of the individual from whom the isolate was obtained. During the past year 3500 sequences have been added and the data model has been expanded to include drug susceptibility data on sequenced isolates. Database content has also been integrated with didactic text and the output of two sequence analysis programs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present here the complete genome sequence of a common avian clone of Pasteurella multocida, Pm70. The genome of Pm70 is a single circular chromosome 2,257,487 base pairs in length and contains 2,014 predicted coding regions, 6 ribosomal RNA operons, and 57 tRNAs. Genome-scale evolutionary analyses based on pairwise comparisons of 1,197 orthologous sequences between P. multocida, Haemophilus influenzae, and Escherichia coli suggest that P. multocida and H. influenzae diverged ≈270 million years ago and the γ subdivision of the proteobacteria radiated about 680 million years ago. Two previously undescribed open reading frames, accounting for ≈1% of the genome, encode large proteins with homology to the virulence-associated filamentous hemagglutinin of Bordetella pertussis. Consistent with the critical role of iron in the survival of many microbial pathogens, in silico and whole-genome microarray analyses identified more than 50 Pm70 genes with a potential role in iron acquisition and metabolism. Overall, the complete genomic sequence and preliminary functional analyses provide a foundation for future research into the mechanisms of pathogenesis and host specificity of this important multispecies pathogen.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With rapid advances in video processing technologies and ever fast increments in network bandwidth, the popularity of video content publishing and sharing has made similarity search an indispensable operation to retrieve videos of user interests. The video similarity is usually measured by the percentage of similar frames shared by two video sequences, and each frame is typically represented as a high-dimensional feature vector. Unfortunately, high complexity of video content has posed the following major challenges for fast retrieval: (a) effective and compact video representations, (b) efficient similarity measurements, and (c) efficient indexing on the compact representations. In this paper, we propose a number of methods to achieve fast similarity search for very large video database. First, each video sequence is summarized into a small number of clusters, each of which contains similar frames and is represented by a novel compact model called Video Triplet (ViTri). ViTri models a cluster as a tightly bounded hypersphere described by its position, radius, and density. The ViTri similarity is measured by the volume of intersection between two hyperspheres multiplying the minimal density, i.e., the estimated number of similar frames shared by two clusters. The total number of similar frames is then estimated to derive the overall similarity between two video sequences. Hence the time complexity of video similarity measure can be reduced greatly. To further reduce the number of similarity computations on ViTris, we introduce a new one dimensional transformation technique which rotates and shifts the original axis system using PCA in such a way that the original inter-distance between two high-dimensional vectors can be maximally retained after mapping. An efficient B+-tree is then built on the transformed one dimensional values of ViTris' positions. Such a transformation enables B+-tree to achieve its optimal performance by quickly filtering a large portion of non-similar ViTris. Our extensive experiments on real large video datasets prove the effectiveness of our proposals that outperform existing methods significantly.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the wide diversity of unknown organisms in the environment, 99% of them cannot be grown in traditional culture medium in laboratories. Therefore, metagenomics projects are proposed to study microbial communities present in the environment, from molecular techniques, especially the sequencing. Thereby, for the coming years it is expected an accumulation of sequences produced by these projects. Thus, the sequences produced by genomics and metagenomics projects present several challenges for the treatment, storing and analysis such as: the search for clones containing genes of interest. This work presents the OCI Metagenomics, which allows defines and manages dynamically the rules of clone selection in metagenomic libraries, thought an algebraic approach based on process algebra. Furthermore, a web interface was developed to allow researchers to easily create and execute their own rules to select clones in genomic sequence database. This software has been tested in metagenomic cosmid library and it was able to select clones containing genes of interest. Copyright 2010 ACM.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Plasmodium falciparum Genome Database (http://PlasmoDB.org) integrates sequence information, automated analyses and annotation data emerging from the P.falciparum genome sequencing consortium. To date, raw sequence coverage is available for >90% of the genome, and two chromosomes have been finished and annotated. Data in PlasmoDB are organized by chromosome (1–14), and can be accessed using a variety of tools for graphical and text-based browsing or downloaded in various file formats. The GUS (Genomics Unified Schema) implementation of PlasmoDB provides a multi-species genomic relational database, incorporating data from human and mouse, as well as P.falciparum. The relational schema uses a highly structured format to accommodate diverse data sets related to genomic sequence and gene expression. Tools have been designed to facilitate complex biological queries, including many that are specific to Plasmodium parasites and malaria as a disease. Additional projects seek to integrate genomic information with the rich data sets now becoming available for RNA transcription, protein expression, metabolic pathways, genetic and physical mapping, antigenic and population diversity, and phylogenetic relationships with other apicomplexan parasites. The overall goal of PlasmoDB is to facilitate Internet- and CD-ROM-based access to both finished and unfinished sequence information by the global malaria research community.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith–Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith–Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Upon the completion of the Saccharomyces cerevisiae genomic sequence in 1996 [Goffeau,A. et al. (1997) Nature, 387, 5], several creative and ambitious projects have been initiated to explore the functions of gene products or gene expression on a genome-wide scale. To help researchers take advantage of these projects, the Saccharomyces Genome Database (SGD) has created two new tools, Function Junction and Expression Connection. Together, the tools form a central resource for querying multiple large-scale analysis projects for data about individual genes. Function Junction provides information from diverse projects that shed light on the role a gene product plays in the cell, while Expression Connection delivers information produced by the ever-increasing number of microarray projects. WWW access to SGD is available at genome-www.stanford.edu/Saccharomyces/.