963 resultados para Genomic sequence database


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Rubus yellow net virus (RYNV) was cloned and sequenced from a red raspberry (Rubus idaeus L.) plant exhibiting symptoms of mosaic and mottling in the leaves. Its genomic sequence indicates that it is a distinct member of the genus Badnavirus, with 7932. bp and seven ORFs, the first three corresponding in size and location to the ORFs found in the type member Commelina yellow mottle virus. Bioinformatic analysis of the genomic sequence detected several features including nucleic acid binding motifs, multiple zinc finger-like sequences and domains associated with cellular signaling. Subsequent sequencing of the small RNAs (sRNAs) from RYNV-infected R. idaeus leaf tissue was used to determine any RYNV sequences targeted by RNA silencing and identified abundant virus-derived small RNAs (vsRNAs). The majority of the vsRNAs were 22-nt in length. We observed a highly uneven genome-wide distribution of vsRNAs with strong clustering to small defined regions distributed over both strands of the RYNV genome. Together, our data show that sequences of the aphid-transmitted pararetrovirus RYNV are targeted in red raspberry by the interfering RNA pathway, a predominant antiviral defense mechanism in plants. © 2013.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The genomic sequences of several RNA plant viruses including cucumber mosaic virus, brome mosaic virus, alfalfa mosaic virus and tobacco mosaic virus have become available recently. The former two viruses are icosahedral while the latter two are bullet and rod shaped, respectively in particle morphology. The non-structural 3a proteins of cucumber mosaic virus and brome mosaic virus have an amino acid sequence homology of 35% and hence are evolutionarily related. In contrast, the coat proteins exhibit little homology, although the circular dichroism spectrum of these viruses are similar. The non-coding regions of the genome also exhibit variable but extensive homology. Comparison of the brome mosaic virus and alfalfa mosaic virus sequences reveals that they are probably related although with a much larger evolutionary distance. The polypeptide folds of the coat protein of three biologically distinct isometric plant viruses, tomato bushy stunt virus, southern bean mosaic virus and satellite tobacco necrosis virus have been shown to display a striking resemblance. All of them consist of a topologically similar 8-standard β-barrel. The implications of these studies to the understanding of the evolution of plant viruses will be discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The complete genome of mandarin fish Siniperca chuatsi rhabdovirus (SCRV) was cloned and sequenced. It comprises 11,545 nucleotides and contains five genes encoding the nucleoprotein N, the phosphoprotein P, the matrix protein M, the glycoprotein G, and the RNA-dependent RNA polymerase protein L. At the 3' and 5' termini of SCRV genome, leader and trailer sequences show inverse complementarity. The N, P, M and G proteins share the highest sequence identities (ranging from 14.8 to 41.5%) with the respective proteins of rhabdovirus 903/87, the L protein has the highest identity with those of vesiculoviruses, especially with Chandipura virus (44.7%). Phylogenetic analysis of L proteins showed that SCRV clustered with spring vireamia of carp virus (SVCV) and was most closely related to viruses in the genus Vesiculovirus. In addition, an overlapping open reading frame (ORF) predicted to encode a protein similar to vesicular stomatitis virus C protein is present within the P gene of SCRV. Furthermore, an unoverlapping small ORF downstream of M ORF within M gene is predicted (tentatively called orf4). Therefore, the genomic organization of SCRV can be proposed as 3' leader-N-P/C-M-(orf4)-G-L-trailer 5'. Orf4 transcription or translation products could not be detected by northern or Western blot, respectively, though one similar mRNA band to M mRNA was found. This is the first report on one small unoverlapping ORF in M gene of a fish rhabdovirus. (c) 2007 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Coccolithoviruses are giant dsDNA viruses that infect Emiliania huxleyi, the most ubiquitous marine microalga. Here, we present the genome of the latest coccolithovirus strain to be sequenced, EhV-99B1, and compare it with two other coccolithovirus genomes (EhV-86 and EhV-163). EhV-99B1 shares a pairwise nucleotide identity of 98% with EhV-163 (the two strains were isolated from the same Norwegian fjord but in different years), and just 96.5% with EhV-86 (isolated in the same spring as EhV-99B1 but in the English Channel). We confirmed and extended the list of relevant genomic differences between these EhVs from the Norwegian fjord and EhVs from the English Channel, namely the removal/insertions of: a phosphate permease, an endonuclease, a transposase, and two specific tRNAs. As a whole, this study provided new clues and insights into the diversity and mechanisms driving the evolution of these large oceanic viruses, in particular those processes involving selfish genetic elements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Currently, the Genomic Threading Database (GTD) contains structural assignments for the proteins encoded within the genomes of nine eukaryotes and 101 prokaryotes. Structural annotations are carried out using a modified version of GenTHREADER, a reliable fold recognition method. The Gen THREADER annotation jobs are distributed across multiple clusters of processors using grid technology and the predictions are deposited in a relational database accessible via a web interface at http://bioinf.cs.ucl.ac.uk/GTD. Using this system, up to 84% of proteins encoded within a genome can be confidently assigned to known folds with 72% of the residues aligned. On average in the GTD, 64% of proteins encoded within a genome are confidently assigned to known folds and 58% of the residues are aligned to structures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Genomic Threading Database currently contains structural annotations for the genomes of over 100 recently sequenced organisms. Annotations are carried out by using our modified GenTHREADER software and through implementing grid technology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The IMGT/HLA Database (www.ebi.ac.uk/imgt/hla/) specialises in sequences of polymorphic genes of the HLA system, the human major histocompatibility complex (MHC). The HLA complex is located within the 6p21.3 region on the short arm of human chromosome 6 and contains more than 220 genes of diverse function. Many of the genes encode proteins of the immune system and these include the 21 highly polymorphic HLA genes, which influence the outcome of clinical transplantation and confer susceptibility to a wide range of non-infectious diseases. The database contains sequences for all HLA alleles officially recognised by the WHO Nomenclature Committee for Factors of the HLA System and provides users with online tools and facilities for their retrieval and analysis. These include allele reports, alignment tools and detailed descriptions of the source cells. The online IMGT/HLA submission tool allows both new and confirmatory sequences to be submitted directly to the WHO Nomenclature Committee. The latest version (release 1.7.0 July 2000) contains 1220 HLA alleles derived from over 2700 component sequences from the EMBL/GenBank/DDBJ databases. The HLA database provides a model which will be extended to provide specialist databases for polymorphic MHC genes of other species.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The HIV Reverse Transcriptase and Protease Sequence Database is an on-line relational database that catalogs evolutionary and drug-related sequence variation in the human immunodeficiency virus (HIV) reverse transcriptase (RT) and protease enzymes, the molecular targets of anti-HIV therapy (http://hivdb.stanford.edu). The database contains a compilation of nearly all published HIV RT and protease sequences, including submissions from International Collaboration databases and sequences published in journal articles. Sequences are linked to data about the source of the sequence sample and the antiretroviral drug treatment history of the individual from whom the isolate was obtained. During the past year 3500 sequences have been added and the data model has been expanded to include drug susceptibility data on sequenced isolates. Database content has also been integrated with didactic text and the output of two sequence analysis programs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present here the complete genome sequence of a common avian clone of Pasteurella multocida, Pm70. The genome of Pm70 is a single circular chromosome 2,257,487 base pairs in length and contains 2,014 predicted coding regions, 6 ribosomal RNA operons, and 57 tRNAs. Genome-scale evolutionary analyses based on pairwise comparisons of 1,197 orthologous sequences between P. multocida, Haemophilus influenzae, and Escherichia coli suggest that P. multocida and H. influenzae diverged ≈270 million years ago and the γ subdivision of the proteobacteria radiated about 680 million years ago. Two previously undescribed open reading frames, accounting for ≈1% of the genome, encode large proteins with homology to the virulence-associated filamentous hemagglutinin of Bordetella pertussis. Consistent with the critical role of iron in the survival of many microbial pathogens, in silico and whole-genome microarray analyses identified more than 50 Pm70 genes with a potential role in iron acquisition and metabolism. Overall, the complete genomic sequence and preliminary functional analyses provide a foundation for future research into the mechanisms of pathogenesis and host specificity of this important multispecies pathogen.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With rapid advances in video processing technologies and ever fast increments in network bandwidth, the popularity of video content publishing and sharing has made similarity search an indispensable operation to retrieve videos of user interests. The video similarity is usually measured by the percentage of similar frames shared by two video sequences, and each frame is typically represented as a high-dimensional feature vector. Unfortunately, high complexity of video content has posed the following major challenges for fast retrieval: (a) effective and compact video representations, (b) efficient similarity measurements, and (c) efficient indexing on the compact representations. In this paper, we propose a number of methods to achieve fast similarity search for very large video database. First, each video sequence is summarized into a small number of clusters, each of which contains similar frames and is represented by a novel compact model called Video Triplet (ViTri). ViTri models a cluster as a tightly bounded hypersphere described by its position, radius, and density. The ViTri similarity is measured by the volume of intersection between two hyperspheres multiplying the minimal density, i.e., the estimated number of similar frames shared by two clusters. The total number of similar frames is then estimated to derive the overall similarity between two video sequences. Hence the time complexity of video similarity measure can be reduced greatly. To further reduce the number of similarity computations on ViTris, we introduce a new one dimensional transformation technique which rotates and shifts the original axis system using PCA in such a way that the original inter-distance between two high-dimensional vectors can be maximally retained after mapping. An efficient B+-tree is then built on the transformed one dimensional values of ViTris' positions. Such a transformation enables B+-tree to achieve its optimal performance by quickly filtering a large portion of non-similar ViTris. Our extensive experiments on real large video datasets prove the effectiveness of our proposals that outperform existing methods significantly.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the wide diversity of unknown organisms in the environment, 99% of them cannot be grown in traditional culture medium in laboratories. Therefore, metagenomics projects are proposed to study microbial communities present in the environment, from molecular techniques, especially the sequencing. Thereby, for the coming years it is expected an accumulation of sequences produced by these projects. Thus, the sequences produced by genomics and metagenomics projects present several challenges for the treatment, storing and analysis such as: the search for clones containing genes of interest. This work presents the OCI Metagenomics, which allows defines and manages dynamically the rules of clone selection in metagenomic libraries, thought an algebraic approach based on process algebra. Furthermore, a web interface was developed to allow researchers to easily create and execute their own rules to select clones in genomic sequence database. This software has been tested in metagenomic cosmid library and it was able to select clones containing genes of interest. Copyright 2010 ACM.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Plasmodium falciparum Genome Database (http://PlasmoDB.org) integrates sequence information, automated analyses and annotation data emerging from the P.falciparum genome sequencing consortium. To date, raw sequence coverage is available for >90% of the genome, and two chromosomes have been finished and annotated. Data in PlasmoDB are organized by chromosome (1–14), and can be accessed using a variety of tools for graphical and text-based browsing or downloaded in various file formats. The GUS (Genomics Unified Schema) implementation of PlasmoDB provides a multi-species genomic relational database, incorporating data from human and mouse, as well as P.falciparum. The relational schema uses a highly structured format to accommodate diverse data sets related to genomic sequence and gene expression. Tools have been designed to facilitate complex biological queries, including many that are specific to Plasmodium parasites and malaria as a disease. Additional projects seek to integrate genomic information with the rich data sets now becoming available for RNA transcription, protein expression, metabolic pathways, genetic and physical mapping, antigenic and population diversity, and phylogenetic relationships with other apicomplexan parasites. The overall goal of PlasmoDB is to facilitate Internet- and CD-ROM-based access to both finished and unfinished sequence information by the global malaria research community.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith–Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith–Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Growth is a fundamental aspect of life cycle of all organisms. Body size varies highly in most animal groups, such as mammals. Moreover, growth of a multicellular organism is not uniform enlargement of size, but different body parts and organs grow to their characteristic sizes at different times. Currently very little is known about the molecular mechanisms governing this organ-specific growth. The genome sequencing projects have provided complete genomic DNA sequences of several species over the past decade. The amount of genomic sequence information, including sequence variants within species, is constantly increasing. Based on the universal genetic code, we can make sense of this sequence information as far as it codes proteins. However, less is known about the molecular mechanisms that control expression of genes, and about the variations in gene expression that underlie many pathological states in humans. This is caused in part by lack of information about the second genetic code that consists of the binding specificities of transcription factors and the combinatorial code by which transcription factor binding sites are assembled to form tissue-specific and/or ligand-regulated enhancer elements. This thesis presents a high-throughput assay for identification of transcription factor binding specificities, which were then used to measure the DNA binding profiles of transcription factors involved in growth control. We developed ‘enhancer element locator’, a computational tool, which can be used to predict functional enhancer elements. A genome-wide prediction of human and mouse enhancer elements generated a large database of enhancer elements. This database can be used to identify target genes of signaling pathways, and to predict activated transcription factors based on changes in gene expression. Predictions validated in transgenic mouse embryos revealed the presence of multiple tissue-specific enhancers in mouse c- and N-Myc genes, which has implications to organ specific growth control and tumor type specificity of oncogenes. Furthermore, we were able to locate a variation in a single nucleotide, which carries a susceptibility to colorectal cancer, to an enhancer element and propose a mechanism by which this SNP might be involved in generation of colorectal cancer.