10 resultados para Annotations sémantiques
em CentAUR: Central Archive University of Reading - UK
Resumo:
Currently, the Genomic Threading Database (GTD) contains structural assignments for the proteins encoded within the genomes of nine eukaryotes and 101 prokaryotes. Structural annotations are carried out using a modified version of GenTHREADER, a reliable fold recognition method. The Gen THREADER annotation jobs are distributed across multiple clusters of processors using grid technology and the predictions are deposited in a relational database accessible via a web interface at http://bioinf.cs.ucl.ac.uk/GTD. Using this system, up to 84% of proteins encoded within a genome can be confidently assigned to known folds with 72% of the residues aligned. On average in the GTD, 64% of proteins encoded within a genome are confidently assigned to known folds and 58% of the residues are aligned to structures.
Resumo:
The past years have shown an enormous advancement in sequencing and array-based technologies, producing supplementary or alternative views of the genome stored in various formats and databases. Their sheer volume and different data scope pose a challenge to jointly visualize and integrate diverse data types. We present AmalgamScope a new interactive software tool focusing on assisting scientists with the annotation of the human genome and particularly the integration of the annotation files from multiple data types, using gene identifiers and genomic coordinates. Supported platforms include next-generation sequencing and microarray technologies. The available features of AmalgamScope range from the annotation of diverse data types across the human genome to integration of the data based on the annotational information and visualization of the merged files within chromosomal regions or the whole genome. Additionally, users can define custom transcriptome library files for any species and use the file exchanging distant server options of the tool.
Resumo:
In this work a new method for clustering and building a topographic representation of a bacteria taxonomy is presented. The method is based on the analysis of stable parts of the genome, the so-called “housekeeping genes”. The proposed method generates topographic maps of the bacteria taxonomy, where relations among different type strains can be visually inspected and verified. Two well known DNA alignement algorithms are applied to the genomic sequences. Topographic maps are optimized to represent the similarity among the sequences according to their evolutionary distances. The experimental analysis is carried out on 147 type strains of the Gammaprotebacteria class by means of the 16S rRNA housekeeping gene. Complete sequences of the gene have been retrieved from the NCBI public database. In the experimental tests the maps show clusters of homologous type strains and present some singular cases potentially due to incorrect classification or erroneous annotations in the database.
Resumo:
Motivation: There is a frequent need to apply a large range of local or remote prediction and annotation tools to one or more sequences. We have created a tool able to dispatch one or more sequences to assorted services by defining a consistent XML format for data and annotations. Results: By analyzing annotation tools, we have determined that annotations can be described using one or more of the six forms of data: numeric or textual annotation of residues, domains (residue ranges) or whole sequences. With this in mind, XML DTDs have been designed to store the input and output of any server. Plug-in wrappers to a number of services have been written which are called from a master script. The resulting APATML is then formatted for display in HTML. Alternatively further tools may be written to perform post-analysis.
Resumo:
BACKGROUND: In order to maintain the most comprehensive structural annotation databases we must carry out regular updates for each proteome using the latest profile-profile fold recognition methods. The ability to carry out these updates on demand is necessary to keep pace with the regular updates of sequence and structure databases. Providing the highest quality structural models requires the most intensive profile-profile fold recognition methods running with the very latest available sequence databases and fold libraries. However, running these methods on such a regular basis for every sequenced proteome requires large amounts of processing power.In this paper we describe and benchmark the JYDE (Job Yield Distribution Environment) system, which is a meta-scheduler designed to work above cluster schedulers, such as Sun Grid Engine (SGE) or Condor. We demonstrate the ability of JYDE to distribute the load of genomic-scale fold recognition across multiple independent Grid domains. We use the most recent profile-profile version of our mGenTHREADER software in order to annotate the latest version of the Human proteome against the latest sequence and structure databases in as short a time as possible. RESULTS: We show that our JYDE system is able to scale to large numbers of intensive fold recognition jobs running across several independent computer clusters. Using our JYDE system we have been able to annotate 99.9% of the protein sequences within the Human proteome in less than 24 hours, by harnessing over 500 CPUs from 3 independent Grid domains. CONCLUSION: This study clearly demonstrates the feasibility of carrying out on demand high quality structural annotations for the proteomes of major eukaryotic organisms. Specifically, we have shown that it is now possible to provide complete regular updates of profile-profile based fold recognition models for entire eukaryotic proteomes, through the use of Grid middleware such as JYDE.
Resumo:
An automatic method for recognizing natively disordered regions from amino acid sequence is described and benchmarked against predictors that were assessed at the latest critical assessment of techniques for protein structure prediction (CASP) experiment. The method attains a Wilcoxon score of 90.0, which represents a statistically significant improvement on the methods evaluated on the same targets at CASP. The classifier, DISOPRED2, was used to estimate the frequency of native disorder in several representative genomes from the three kingdoms of life. Putative, long (>30 residue) disordered segments are found to occur in 2.0% of archaean, 4.2% of eubacterial and 33.0% of eukaryotic proteins. The function of proteins with long predicted regions of disorder was investigated using the gene ontology annotations supplied with the Saccharomyces genome database. The analysis of the yeast proteome suggests that proteins containing disorder are often located in the cell nucleus and are involved in the regulation of transcription and cell signalling. The results also indicate that native disorder is associated with the molecular functions of kinase activity and nucleic acid binding.
Resumo:
The Genomic Threading Database currently contains structural annotations for the genomes of over 100 recently sequenced organisms. Annotations are carried out by using our modified GenTHREADER software and through implementing grid technology.
Resumo:
Abstract Background: The analysis of the Auditory Brainstem Response (ABR) is of fundamental importance to the investigation of the auditory system behaviour, though its interpretation has a subjective nature because of the manual process employed in its study and the clinical experience required for its analysis. When analysing the ABR, clinicians are often interested in the identification of ABR signal components referred to as Jewett waves. In particular, the detection and study of the time when these waves occur (i.e., the wave latency) is a practical tool for the diagnosis of disorders affecting the auditory system. Significant differences in inter-examiner results may lead to completely distinct clinical interpretations of the state of the auditory system. In this context, the aim of this research was to evaluate the inter-examiner agreement and variability in the manual classification of ABR. Methods: A total of 160 ABR data samples were collected, for four different stimulus intensity (80dBHL, 60dBHL, 40dBHL and 20dBHL), from 10 normal-hearing subjects (5 men and 5 women, from 20 to 52 years). Four examiners with expertise in the manual classification of ABR components participated in the study. The Bland-Altman statistical method was employed for the assessment of inter-examiner agreement and variability. The mean, standard deviation and error for the bias, which is the difference between examiners’ annotations, were estimated for each pair of examiners. Scatter plots and histograms were employed for data visualization and analysis. Results: In most comparisons the differences between examiner’s annotations were below 0.1 ms, which is clinically acceptable. In four cases, it was found a large error and standard deviation (>0.1 ms) that indicate the presence of outliers and thus, discrepancies between examiners. Conclusions: Our results quantify the inter-examiner agreement and variability of the manual analysis of ABR data, and they also allows for the determination of different patterns of manual ABR analysis.
Resumo:
For users of climate services, the ability to quickly determine the datasets that best fit one's needs would be invaluable. The volume, variety and complexity of climate data makes this judgment difficult. The ambition of CHARMe ("Characterization of metadata to enable high-quality climate services") is to give a wider interdisciplinary community access to a range of supporting information, such as journal articles, technical reports or feedback on previous applications of the data. The capture and discovery of this "commentary" information, often created by data users rather than data providers, and currently not linked to the data themselves, has not been significantly addressed previously. CHARMe applies the principles of Linked Data and open web standards to associate, record, search and publish user-derived annotations in a way that can be read both by users and automated systems. Tools have been developed within the CHARMe project that enable annotation capability for data delivery systems already in wide use for discovering climate data. In addition, the project has developed advanced tools for exploring data and commentary in innovative ways, including an interactive data explorer and comparator ("CHARMe Maps") and a tool for correlating climate time series with external "significant events" (e.g. instrument failures or large volcanic eruptions) that affect the data quality. Although the project focuses on climate science, the concepts are general and could be applied to other fields. All CHARMe system software is open-source, released under a liberal licence, permitting future projects to re-use the source code as they wish.
Resumo:
The slow-growing genus Bradyrhizobium is biologically important in soils, with different representatives found to perform a range of biochemical functions including photosynthesis, induction of root nodules and symbiotic nitrogen fixation and denitrification. Consequently, the role of the genus in soil ecology and biogeochemical transformations is of agricultural and environmental significance. Some isolates of Bradyrhizobium have been shown to be non-symbiotic and do not possess the ability to form nodules. Here we present the genome and gene annotations of two such free-living Bradyrhizobium isolates, named G22 and BF49, from soils with differing long-term management regimes (grassland and bare fallow respectively) in addition to carbon metabolism analysis. These Bradyrhizobium isolates are the first to be isolated and sequenced from European soil and are the first free-living Bradyrhizobium isolates, lacking both nodulation and nitrogen fixation genes, to have their genomes sequenced and assembled from cultured samples. The G22 and BF49 genomes are distinctly different with respect to size and number of genes; the grassland isolate also contains a plasmid. There are also a number of functional differences between these isolates and other published genomes, suggesting that this ubiquitous genus is extremely heterogeneous and has roles within the community not including symbiotic nitrogen fixation.