931 resultados para Bioinformatics


Relevância:

10.00% 10.00%

Publicador:

Resumo:

PopABC is a computer package for inferring the pattern of demographic divergence of closely related populations and species. The software performs coalescent simulation in the framework of approximate Bayesian computation (ABC). PopABC can also be used to perform Bayesian model choice to discriminate between different demographic scenarios. The program can be used either for research or for education and teaching purposes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

MOTIVATION: The accurate prediction of the quality of 3D models is a key component of successful protein tertiary structure prediction methods. Currently, clustering or consensus based Model Quality Assessment Programs (MQAPs) are the most accurate methods for predicting 3D model quality; however they are often CPU intensive as they carry out multiple structural alignments in order to compare numerous models. In this study, we describe ModFOLDclustQ - a novel MQAP that compares 3D models of proteins without the need for CPU intensive structural alignments by utilising the Q measure for model comparisons. The ModFOLDclustQ method is benchmarked against the top established methods in terms of both accuracy and speed. In addition, the ModFOLDclustQ scores are combined with those from our older ModFOLDclust method to form a new method, ModFOLDclust2, that aims to provide increased prediction accuracy with negligible computational overhead. RESULTS: The ModFOLDclustQ method is competitive with leading clustering based MQAPs for the prediction of global model quality, yet it is up to 150 times faster than the previous version of the ModFOLDclust method at comparing models of small proteins (<60 residues) and over 5 times faster at comparing models of large proteins (>800 residues). Furthermore, a significant improvement in accuracy can be gained over the previous clustering based MQAPs by combining the scores from ModFOLDclustQ and ModFOLDclust to form the new ModFOLDclust2 method, with little impact on the overall time taken for each prediction. AVAILABILITY: The ModFOLDclustQ and ModFOLDclust2 methods are available to download from: http://www.reading.ac.uk/bioinf/downloads/ CONTACT: l.j.mcguffin@reading.ac.uk.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The development of effective methods for predicting the quality of three-dimensional (3D) models is fundamentally important for the success of tertiary structure (TS) prediction strategies. Since CASP7, the Quality Assessment (QA) category has existed to gauge the ability of various model quality assessment programs (MQAPs) at predicting the relative quality of individual 3D models. For the CASP8 experiment, automated predictions were submitted in the QA category using two methods from the ModFOLD server-ModFOLD version 1.1 and ModFOLDclust. ModFOLD version 1.1 is a single-model machine learning based method, which was used for automated predictions of global model quality (QMODE1). ModFOLDclust is a simple clustering based method, which was used for automated predictions of both global and local quality (QMODE2). In addition, manual predictions of model quality were made using ModFOLD version 2.0-an experimental method that combines the scores from ModFOLDclust and ModFOLD v1.1. Predictions from the ModFOLDclust method were the most successful of the three in terms of the global model quality, whilst the ModFOLD v1.1 method was comparable in performance to other single-model based methods. In addition, the ModFOLDclust method performed well at predicting the per-residue, or local, model quality scores. Predictions of the per-residue errors in our own 3D models, selected using the ModFOLD v2.0 method, were also the most accurate compared with those from other methods. All of the MQAPs described are publicly accessible via the ModFOLD server at: http://www.reading.ac.uk/bioinf/ModFOLD/. The methods are also freely available to download from: http://www.reading.ac.uk/bioinf/downloads/.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Genetic data obtained on population samples convey information about their evolutionary history. Inference methods can extract part of this information but they require sophisticated statistical techniques that have been made available to the biologist community (through computer programs) only for simple and standard situations typically involving a small number of samples. We propose here a computer program (DIY ABC) for inference based on approximate Bayesian computation (ABC), in which scenarios can be customized by the user to fit many complex situations involving any number of populations and samples. Such scenarios involve any combination of population divergences, admixtures and population size changes. DIY ABC can be used to compare competing scenarios, estimate parameters for one or more scenarios and compute bias and precision measures for a given scenario and known values of parameters (the current version applies to unlinked microsatellite data). This article describes key methods used in the program and provides its main features. The analysis of one simulated and one real dataset, both with complex evolutionary scenarios, illustrates the main possibilities of DIY ABC.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Motivation: Intrinsic protein disorder is functionally implicated in numerous biological roles and is, therefore, ubiquitous in proteins from all three kingdoms of life. Determining the disordered regions in proteins presents a challenge for experimental methods and so recently there has been much focus on the development of improved predictive methods. In this article, a novel technique for disorder prediction, called DISOclust, is described, which is based on the analysis of multiple protein fold recognition models. The DISOclust method is rigorously benchmarked against the top.ve methods from the CASP7 experiment. In addition, the optimal consensus of the tested methods is determined and the added value from each method is quantified. Results: The DISOclust method is shown to add the most value to a simple consensus of methods, even in the absence of target sequence homology to known structures. A simple consensus of methods that includes DISOclust can significantly outperform all of the previous individual methods tested.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The reliable assessment of the quality of protein structural models is fundamental to the progress of structural bioinformatics. The ModFOLD server provides access to two accurate techniques for the global and local prediction of the quality of 3D models of proteins. Firstly ModFOLD, which is a fast Model Quality Assessment Program (MQAP) used for the global assessment of either single or multiple models. Secondly ModFOLDclust, which is a more intensive method that carries out clustering of multiple models and provides per-residue local quality assessment.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Severe acute respiratory syndrome (SARS) coronavirus infection and growth are dependent on initiating signaling and enzyme actions upon viral entry into the host cell. Proteins packaged during virus assembly may subsequently form the first line of attack and host manipulation upon infection. A complete characterization of virion components is therefore important to understanding the dynamics of early stages of infection. Mass spectrometry and kinase profiling techniques identified nearly 200 incorporated host and viral proteins. We used published interaction data to identify hubs of connectivity with potential significance for virion formation. Surprisingly, the hub with the most potential connections was not the viral M protein but the nonstructurall protein 3 (nsp3), which is one of the novel virion components identified by mass spectrometry. Based on new experimental data and a bioinformatics analysis across the Coronaviridae, we propose a higher-resolution functional domain architecture for nsp3 that determines the interaction capacity of this protein. Using recombinant protein domains expressed in Escherichia coli, we identified two additional RNA-binding domains of nsp3. One of these domains is located within the previously described SARS-unique domain, and there is a nucleic acid chaperone-like domain located immediately downstream of the papain-like proteinase domain. We also identified a novel cysteine-coordinated metal ion-binding domain. Analyses of interdomain interactions and provisional functional annotation of the remaining, so-far-uncharacterized domains are presented. Overall, the ensemble of data surveyed here paint a more complete picture of nsp3 as a conserved component of the viral protein processing machinery, which is intimately associated with viral RNA in its role as a virion component.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Motivation: We compare phylogenetic approaches for inferring functional gene links. The approaches detect independent instances of the correlated gain and loss of pairs of genes from species' genomes. We investigate the effect on results of basing evidence of correlations on two phylogenetic approaches, Dollo parsminony and maximum likelihood (ML). We further examine the effect of constraining the ML model by fixing the rate of gene gain at a low value, rather than estimating it from the data. Results: We detect correlated evolution among a test set of pairs of yeast (Saccharomyces cerevisiae) genes, with a case study of 21 eukaryotic genomes and test data derived from known yeast protein complexes. If the rate at which genes are gained is constrained to be low, ML achieves by far the best results at detecting known functional links. The model then has fewer parameters but it is more realistic by preventing genes from being gained more than once. Availability: BayesTraits by M. Pagel and A. Meade, and a script to configure and repeatedly launch it by D. Barker and M. Pagel, are available at http://www.evolution.reading.ac.uk .

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Selecting the highest quality 3D model of a protein structure from a number of alternatives remains an important challenge in the field of structural bioinformatics. Many Model Quality Assessment Programs (MQAPs) have been developed which adopt various strategies in order to tackle this problem, ranging from the so called "true" MQAPs capable of producing a single energy score based on a single model, to methods which rely on structural comparisons of multiple models or additional information from meta-servers. However, it is clear that no current method can separate the highest accuracy models from the lowest consistently. In this paper, a number of the top performing MQAP methods are benchmarked in the context of the potential value that they add to protein fold recognition. Two novel methods are also described: ModSSEA, which based on the alignment of predicted secondary structure elements and ModFOLD which combines several true MQAP methods using an artificial neural network. Results: The ModSSEA method is found to be an effective model quality assessment program for ranking multiple models from many servers, however further accuracy can be gained by using the consensus approach of ModFOLD. The ModFOLD method is shown to significantly outperform the true MQAPs tested and is competitive with methods which make use of clustering or additional information from multiple servers. Several of the true MQAPs are also shown to add value to most individual fold recognition servers by improving model selection, when applied as a post filter in order to re-rank models. Conclusion: MQAPs should be benchmarked appropriately for the practical context in which they are intended to be used. Clustering based methods are the top performing MQAPs where many models are available from many servers; however, they often do not add value to individual fold recognition servers when limited models are available. Conversely, the true MQAP methods tested can often be used as effective post filters for re-ranking few models from individual fold recognition servers and further improvements can be achieved using a consensus of these methods.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

ANeCA is a fully automated implementation of Nested Clade Phylogeographic Analysis. This was originally developed by Templeton and colleagues, and has been used to infer, from the pattern of gene sequence polymorphisms in a geographically structured population, the historical demographic processes that have shaped its evolution. Until now it has been necessary to perform large parts of the procedure manually. We provide a program that will take data in Nexus sequential format, and directly output a set of inferences. The software also includes TCS v1.18 and GeoDis v2.2 as part of automation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Motivation: Hydrogen bonds are one of the most important inter-atomic interactions in biology. Previous experimental, theoretical and bioinformatics analyses have shown that the hydrogen bonding potential of amino acids is generally satisfied and that buried unsatisfied hydrogen-bond-capable residues are destabilizing. When studying mutant proteins, or introducing mutations to residues involved in hydrogen bonding, one needs to know whether a hydrogen bond can be maintained. Our aim, therefore, was to develop a rapid method to evaluate whether a sidechain can form a hydrogen-bond. Results: A novel knowledge-based approach was developed in which the conformations accessible to the residues involved are taken into account. Residues involved in hydrogen bonds in a set of high resolution crystal structures were analyzed and this analysis is then applied to a given protein. The program was applied to assess mutations in the tumour-suppressor protein, p53. This raised the number of distinct mutations identified as disrupting sidechain-sidechain hydrogen bonding from 181 in our previous analysis to 202 in this analysis.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Motivation: There is a frequent need to apply a large range of local or remote prediction and annotation tools to one or more sequences. We have created a tool able to dispatch one or more sequences to assorted services by defining a consistent XML format for data and annotations. Results: By analyzing annotation tools, we have determined that annotations can be described using one or more of the six forms of data: numeric or textual annotation of residues, domains (residue ranges) or whole sequences. With this in mind, XML DTDs have been designed to store the input and output of any server. Plug-in wrappers to a number of services have been written which are called from a master script. The resulting APATML is then formatted for display in HTML. Alternatively further tools may be written to perform post-analysis.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: In order to maintain the most comprehensive structural annotation databases we must carry out regular updates for each proteome using the latest profile-profile fold recognition methods. The ability to carry out these updates on demand is necessary to keep pace with the regular updates of sequence and structure databases. Providing the highest quality structural models requires the most intensive profile-profile fold recognition methods running with the very latest available sequence databases and fold libraries. However, running these methods on such a regular basis for every sequenced proteome requires large amounts of processing power.In this paper we describe and benchmark the JYDE (Job Yield Distribution Environment) system, which is a meta-scheduler designed to work above cluster schedulers, such as Sun Grid Engine (SGE) or Condor. We demonstrate the ability of JYDE to distribute the load of genomic-scale fold recognition across multiple independent Grid domains. We use the most recent profile-profile version of our mGenTHREADER software in order to annotate the latest version of the Human proteome against the latest sequence and structure databases in as short a time as possible. RESULTS: We show that our JYDE system is able to scale to large numbers of intensive fold recognition jobs running across several independent computer clusters. Using our JYDE system we have been able to annotate 99.9% of the protein sequences within the Human proteome in less than 24 hours, by harnessing over 500 CPUs from 3 independent Grid domains. CONCLUSION: This study clearly demonstrates the feasibility of carrying out on demand high quality structural annotations for the proteomes of major eukaryotic organisms. Specifically, we have shown that it is now possible to provide complete regular updates of profile-profile based fold recognition models for entire eukaryotic proteomes, through the use of Grid middleware such as JYDE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An important element of the developing field of proteomics is to understand protein-protein interactions and other functional links amongst genes. Across-species correlation methods for detecting functional links work on the premise that functionally linked proteins will tend to show a common pattern of presence and absence across a range of genomes. We describe a maximum likelihood statistical model for predicting functional gene linkages. The method detects independent instances of the correlated gain or loss of pairs of proteins on phylogenetic trees, reducing the high rates of false positives observed in conventional across-species methods that do not explicitly incorporate a phylogeny. We show, in a dataset of 10,551 protein pairs, that the phylogenetic method improves by up to 35% on across-species analyses at identifying known functionally linked proteins. The method shows that protein pairs with at least two to three correlated events of gain or loss are almost certainly functionally linked. Contingent evolution, in which one gene's presence or absence depends upon the presence of another, can also be detected phylogenetically, and may identify genes whose functional significance depends upon its interaction with other genes. Incorporating phylogenetic information improves the prediction of functional linkages. The improvement derives from having a lower rate of false positives and from detecting trends that across-species analyses miss. Phylogenetic methods can easily be incorporated into the screening of large-scale bioinformatics datasets to identify sets of protein links and to characterise gene networks.

Relevância:

10.00% 10.00%

Publicador: