22 resultados para Annotation de génomes


Relevância:

10.00% 10.00%

Publicador:

Resumo:

One challenge presented by large-scale genome sequencing efforts is effective display of uniform information to the scientific community. The Comprehensive Microbial Resource (CMR) contains robust annotation of all complete microbial genomes and allows for a wide variety of data retrievals. The bacterial information has been placed on the Web at http://www.tigr.org/CMR for retrieval using standard web browsing technology. Retrievals can be based on protein properties such as molecular weight or hydrophobicity, GC-content, functional role assignments and taxonomy. The CMR also has special web-based tools to allow data mining using pre-run homology searches, whole genome dot-plots, batch downloading and traversal across genomes using a variety of datatypes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The PlantsP database is a curated database that combines information derived from sequences with experimental functional genomics information. PlantsP focuses on plant protein kinases and protein phosphatases. The database will specifically provide a resource for information on a collection of T-DNA insertion mutants (knockouts) in each protein kinase and phosphatase in Arabidopsis thaliana. PlantsP also provides a curated view of each protein that includes a comprehensive annotation of functionally related sequence motifs, sequence family definitions, alignments and phylogenetic trees, and descriptive information drawn directly from the literature. PlantsP is available at http://PlantsP.sdsc.edu.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

As the number of protein folds is quite limited, a mode of analysis that will be increasingly common in the future, especially with the advent of structural genomics, is to survey and re-survey the finite parts list of folds from an expanding number of perspectives. We have developed a new resource, called PartsList, that lets one dynamically perform these comparative fold surveys. It is available on the web at http://bioinfo.mbb.yale.edu/partslist and http://www.partslist.org. The system is based on the existing fold classifications and functions as a form of companion annotation for them, providing ‘global views’ of many already completed fold surveys. The central idea in the system is that of comparison through ranking; PartsList will rank the approximately 420 folds based on more than 180 attributes. These include: (i) occurrence in a number of completely sequenced genomes (e.g. it will show the most common folds in the worm versus yeast); (ii) occurrence in the structure databank (e.g. most common folds in the PDB); (iii) both absolute and relative gene expression information (e.g. most changing folds in expression over the cell cycle); (iv) protein–protein interactions, based on experimental data in yeast and comprehensive PDB surveys (e.g. most interacting fold); (v) sensitivity to inserted transposons; (vi) the number of functions associated with the fold (e.g. most multi-functional folds); (vii) amino acid composition (e.g. most Cys-rich folds); (viii) protein motions (e.g. most mobile folds); and (ix) the level of similarity based on a comprehensive set of structural alignments (e.g. most structurally variable folds). The integration of whole-genome expression and protein–protein interaction data with structural information is a particularly novel feature of our system. We provide three ways of visualizing the rankings: a profiler emphasizing the progression of high and low ranks across many pre-selected attributes, a dynamic comparer for custom comparisons and a numerical rankings correlator. These allow one to directly compare very different attributes of a fold (e.g. expression level, genome occurrence and maximum motion) in the uniform numerical format of ranks. This uniform framework, in turn, highlights the way that the frequency of many of the attributes falls off with approximate power-law behavior (i.e. according to V–b, for attribute value V and constant exponent b), with a few folds having large values and most having small values.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Accurate multiple alignments of 86 domains that occur in signaling proteins have been constructed and used to provide a Web-based tool (SMART: simple modular architecture research tool) that allows rapid identification and annotation of signaling domain sequences. The majority of signaling proteins are multidomain in character with a considerable variety of domain combinations known. Comparison with established databases showed that 25% of our domain set could not be deduced from SwissProt and 41% could not be annotated by Pfam. SMART is able to determine the modular architectures of single sequences or genomes; application to the entire yeast genome revealed that at least 6.7% of its genes contain one or more signaling domains, approximately 350 greater than previously annotated. The process of constructing SMART predicted (i) novel domain homologues in unexpected locations such as band 4.1-homologous domains in focal adhesion kinases; (ii) previously unknown domain families, including a citron-homology domain; (iii) putative functions of domain families after identification of additional family members, for example, a ubiquitin-binding role for ubiquitin-associated domains (UBA); (iv) cellular roles for proteins, such predicted DEATH domains in netrin receptors further implicating these molecules in axonal guidance; (v) signaling domains in known disease genes such as SPRY domains in both marenostrin/pyrin and Midline 1; (vi) domains in unexpected phylogenetic contexts such as diacylglycerol kinase homologues in yeast and bacteria; and (vii) likely protein misclassifications exemplified by a predicted pleckstrin homology domain in a Candida albicans protein, previously described as an integrin.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Functional annotation of novel genes can be achieved by detection of interactions of their encoded proteins with known proteins followed by assays to validate that the gene participates in a specific cellular function. We report an experimental strategy that allows for detection of protein interactions and functional assays with a single reporter system. Interactions among biochemical network component proteins are detected and probed with stimulators and inhibitors of the network. In addition, the cellular location of the interacting proteins is determined. We used this strategy to map a signal transduction network that controls initiation of translation in eukaryotes. We analyzed 35 different pairs of full-length proteins and identified 14 interactions, of which five have not been observed previously, suggesting that the organization of the pathway is more ramified and integrated than previously shown. Our results demonstrate the feasibility of using this strategy in efforts of genomewide functional annotation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Various types of physical mapping data were assembled by developing a set of computer programs (Integrated Mapping Package) to derive a detailed, annotated map of a 4-Mb region of human chromosome 13 that includes the BRCA2 locus. The final assembly consists of a yeast artificial chromosome (YAC) contig with 42 members spanning the 13q12-13 region and aligned contigs of 399 cosmids established by cross-hybridization between the cosmids, which were selected from a chromosome 13-specific cosmid library using inter-Alu PCR probes from the YACs. The end sequences of 60 cosmids spaced nearly evenly across the map were used to generate sequence-tagged sites (STSs), which were mapped to the YACs by PCR. A contig framework was generated by STS content mapping, and the map was assembled on this scaffold. Additional annotation was provided by 72 expressed sequences and 10 genetic markers that were positioned on the map by hybridization to cosmids.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The field of natural language processing (NLP) has seen a dramatic shift in both research direction and methodology in the past several years. In the past, most work in computational linguistics tended to focus on purely symbolic methods. Recently, more and more work is shifting toward hybrid methods that combine new empirical corpus-based methods, including the use of probabilistic and information-theoretic techniques, with traditional symbolic methods. This work is made possible by the recent availability of linguistic databases that add rich linguistic annotation to corpora of natural language text. Already, these methods have led to a dramatic improvement in the performance of a variety of NLP systems with similar improvement likely in the coming years. This paper focuses on these trends, surveying in particular three areas of recent progress: part-of-speech tagging, stochastic parsing, and lexical semantics.