38 resultados para structure prediction
Resumo:
The accurate prediction of the biochemical function of a protein is becoming increasingly important, given the unprecedented growth of both structural and sequence databanks. Consequently, computational methods are required to analyse such data in an automated manner to ensure genomes are annotated accurately. Protein structure prediction methods, for example, are capable of generating approximate structural models on a genome-wide scale. However, the detection of functionally important regions in such crude models, as well as structural genomics targets, remains an extremely important problem. The method described in the current study, MetSite, represents a fully automatic approach for the detection of metal-binding residue clusters applicable to protein models of moderate quality. The method involves using sequence profile information in combination with approximate structural data. Several neural network classifiers are shown to be able to distinguish metal sites from non-sites with a mean accuracy of 94.5%. The method was demonstrated to identify metal-binding sites correctly in LiveBench targets where no obvious metal-binding sequence motifs were detectable using InterPro. Accurate detection of metal sites was shown to be feasible for low-resolution predicted structures generated using mGenTHREADER where no side-chain information was available. High-scoring predictions were observed for a recently solved hypothetical protein from Haemophilus influenzae, indicating a putative metal-binding site.
Resumo:
The results of applying a fragment-based protein tertiary structure prediction method to the prediction of 14 CASP5 target domains are described. The method is based on the assembly of supersecondary structural fragments taken from highly resolved protein structures using a simulated annealing algorithm. A number of good predictions for proteins with novel folds were produced, although not always as the first model. For two fold recognition targets, FRAGFOLD produced the most accurate model in both cases, despite the fact that the predictions were not based on a template structure. Although clear progress has been made in improving FRAGFOLD since CASP4, the ranking of final models still seems to be the main problem that needs to be addressed before the next CASP experiment
Resumo:
Motivation: Modelling the 3D structures of proteins can often be enhanced if more than one fold template is used during the modelling process. However, in many cases, this may also result in poorer model quality for a given target or alignment method. There is a need for modelling protocols that can both consistently and significantly improve 3D models and provide an indication of when models might not benefit from the use of multiple target-template alignments. Here, we investigate the use of both global and local model quality prediction scores produced by ModFOLDclust2, to improve the selection of target-template alignments for the construction of multiple-template models. Additionally, we evaluate clustering the resulting population of multi- and single-template models for the improvement of our IntFOLD-TS tertiary structure prediction method. Results: We find that using accurate local model quality scores to guide alignment selection is the most consistent way to significantly improve models for each of the sequence to structure alignment methods tested. In addition, using accurate global model quality for re-ranking alignments, prior to selection, further improves the majority of multi-template modelling methods tested. Furthermore, subsequent clustering of the resulting population of multiple-template models significantly improves the quality of selected models compared with the previous version of our tertiary structure prediction method, IntFOLD-TS.
Resumo:
The estimation of prediction quality is important because without quality measures, it is difficult to determine the usefulness of a prediction. Currently, methods for ligand binding site residue predictions are assessed in the function prediction category of the biennial Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiment, utilizing the Matthews Correlation Coefficient (MCC) and Binding-site Distance Test (BDT) metrics. However, the assessment of ligand binding site predictions using such metrics requires the availability of solved structures with bound ligands. Thus, we have developed a ligand binding site quality assessment tool, FunFOLDQA, which utilizes protein feature analysis to predict ligand binding site quality prior to the experimental solution of the protein structures and their ligand interactions. The FunFOLDQA feature scores were combined using: simple linear combinations, multiple linear regression and a neural network. The neural network produced significantly better results for correlations to both the MCC and BDT scores, according to Kendall’s τ, Spearman’s ρ and Pearson’s r correlation coefficients, when tested on both the CASP8 and CASP9 datasets. The neural network also produced the largest Area Under the Curve score (AUC) when Receiver Operator Characteristic (ROC) analysis was undertaken for the CASP8 dataset. Furthermore, the FunFOLDQA algorithm incorporating the neural network, is shown to add value to FunFOLD, when both methods are employed in combination. This results in a statistically significant improvement over all of the best server methods, the FunFOLD method (6.43%), and one of the top manual groups (FN293) tested on the CASP8 dataset. The FunFOLDQA method was also found to be competitive with the top server methods when tested on the CASP9 dataset. To the best of our knowledge, FunFOLDQA is the first attempt to develop a method that can be used to assess ligand binding site prediction quality, in the absence of experimental data.
Resumo:
Protein structure prediction methods aim to predict the structures of proteins from their amino acid sequences, utilizing various computational algorithms. Structural genome annotation is the process of attaching biological information to every protein encoded within a genome via the production of three-dimensional protein models.
Resumo:
Model quality assessment programs (MQAPs) aim to assess the quality of modelled 3D protein structures. The provision of quality scores, describing both global and local (per-residue) accuracy are extremely important, as without quality scores we are unable to determine the usefulness of a 3D model for further computational and experimental wet lab studies.Here, we briefly discuss protein tertiary structure prediction, along with the biennial Critical Assessment of Techniques for Protein Structure Prediction (CASP) competition and their key role in driving the field of protein model quality assessment methods (MQAPs). We also briefly discuss the top MQAPs from the previous CASP competitions. Additionally, we describe our downloadable and webserver-based model quality assessment methods: ModFOLD3, ModFOLDclust, ModFOLDclustQ, ModFOLDclust2, and IntFOLD-QA. We provide a practical step-by-step guide on using our downloadable and webserver-based tools and include examples of their application for improving tertiary structure prediction, ligand binding site residue prediction, and oligomer predictions.
Resumo:
IntFOLD is an independent web server that integrates our leading methods for structure and function prediction. The server provides a simple unified interface that aims to make complex protein modelling data more accessible to life scientists. The server web interface is designed to be intuitive and integrates a complex set of quantitative data, so that 3D modelling results can be viewed on a single page and interpreted by non-expert modellers at a glance. The only required input to the server is an amino acid sequence for the target protein. Here we describe major performance and user interface updates to the server, which comprises an integrated pipeline of methods for: tertiary structure prediction, global and local 3D model quality assessment, disorder prediction, structural domain prediction, function prediction and modelling of protein-ligand interactions. The server has been independently validated during numerous CASP (Critical Assessment of Techniques for Protein Structure Prediction) experiments, as well as being continuously evaluated by the CAMEO (Continuous Automated Model Evaluation) project. The IntFOLD server is available at: http://www.reading.ac.uk/bioinf/IntFOLD/
Resumo:
Protein–ligand binding site prediction methods aim to predict, from amino acid sequence, protein–ligand interactions, putative ligands, and ligand binding site residues using either sequence information, structural information, or a combination of both. In silico characterization of protein–ligand interactions has become extremely important to help determine a protein’s functionality, as in vivo-based functional elucidation is unable to keep pace with the current growth of sequence databases. Additionally, in vitro biochemical functional elucidation is time-consuming, costly, and may not be feasible for large-scale analysis, such as drug discovery. Thus, in silico prediction of protein–ligand interactions must be utilized to aid in functional elucidation. Here, we briefly discuss protein function prediction, prediction of protein–ligand interactions, the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated EvaluatiOn (CAMEO) competitions, along with their role in shaping the field. We also discuss, in detail, our cutting-edge web-server method, FunFOLD for the structurally informed prediction of protein–ligand interactions. Furthermore, we provide a step-by-step guide on using the FunFOLD web server and FunFOLD3 downloadable application, along with some real world examples, where the FunFOLD methods have been used to aid functional elucidation.
Resumo:
The Observing System Research and Predictability Experiment (THORPEX) Interactive Grand Global Ensemble (TIGGE) is a World Weather Research Programme project. One of its main objectives is to enhance collaboration on the development of ensemble prediction between operational centers and universities by increasing the availability of ensemble prediction system (EPS) data for research. This study analyzes the prediction of Northern Hemisphere extratropical cyclones by nine different EPSs archived as part of the TIGGE project for the 6-month time period of 1 February 2008–31 July 2008, which included a sample of 774 cyclones. An objective feature tracking method has been used to identify and track the cyclones along the forecast trajectories. Forecast verification statistics have then been produced [using the European Centre for Medium-Range Weather Forecasts (ECMWF) operational analysis as the truth] for cyclone position, intensity, and propagation speed, showing large differences between the different EPSs. The results show that the ECMWF ensemble mean and control have the highest level of skill for all cyclone properties. The Japanese Meteorological Administration (JMA), the National Centers for Environmental Prediction (NCEP), the Met Office (UKMO), and the Canadian Meteorological Centre (CMC) have 1 day less skill for the position of cyclones throughout the forecast range. The relative performance of the different EPSs remains the same for cyclone intensity except for NCEP, which has larger errors than for position. NCEP, the Centro de Previsão de Tempo e Estudos Climáticos (CPTEC), and the Australian Bureau of Meteorology (BoM) all have faster intensity error growth in the earlier part of the forecast. They are also very underdispersive and significantly underpredict intensities, perhaps due to the comparatively low spatial resolutions of these EPSs not being able to accurately model the tilted structure essential to cyclone growth and decay. There is very little difference between the levels of skill of the ensemble mean and control for cyclone position, but the ensemble mean provides an advantage over the control for all EPSs except CPTEC in cyclone intensity and there is an advantage for propagation speed for all EPSs. ECMWF and JMA have an excellent spread–skill relationship for cyclone position. The EPSs are all much more underdispersive for cyclone intensity and propagation speed than for position, with ECMWF and CMC performing best for intensity and CMC performing best for propagation speed. ECMWF is the only EPS to consistently overpredict cyclone intensity, although the bias is small. BoM, NCEP, UKMO, and CPTEC significantly underpredict intensity and, interestingly, all the EPSs underpredict the propagation speed, that is, the cyclones move too slowly on average in all EPSs.
Resumo:
Declining biodiversity in agro-ecosystems, caused by intensification of production or expansion of monocultures, is associated with the emergence of agricultural pests. Understanding how land-use and management control crop-associated biodiversity is, therefore, one of the key steps towards the prediction and maintenance of natural pest-control. Here we report on relationships between land-use variables and arthropod community attributes (for example, species diversity, abundance and guild structure) across a diversification gradient in a rice-dominated landscape in the Mekong delta, Vietnam. We show that rice habitats contained the most diverse arthropod communities, compared with other uncultivated and cultivated land-use types. In addition, arthropod species density and Simpson's diversity in flower, vegetable and fruit habitats was positively related to rice cover in the local landscape. However, across the landscape as a whole, reduction in heterogeneity and the amount of uncultivated cover was associated, generally, with a loss of diversity. Furthermore, arthropod species density in tillering and flowering stages of rice was positively related to crop and vegetation richness, respectively, in the local landscape. Differential effects on feeding guilds were also observed in rice-associated communities with the proportional abundance of predators increasing and the proportional abundance of detritivores decreasing with increased landscape rice cover. Thus, we identify a range of rather complex, sometimes contradictory patterns concerning the impact of rice cover and landscape heterogeneity on arthropod community attributes. Importantly, we conclude that that land-use change associated with expansion of monoculture rice need not automatically impact diversity and functioning of the arthropod community.
Resumo:
The development of effective methods for predicting the quality of three-dimensional (3D) models is fundamentally important for the success of tertiary structure (TS) prediction strategies. Since CASP7, the Quality Assessment (QA) category has existed to gauge the ability of various model quality assessment programs (MQAPs) at predicting the relative quality of individual 3D models. For the CASP8 experiment, automated predictions were submitted in the QA category using two methods from the ModFOLD server-ModFOLD version 1.1 and ModFOLDclust. ModFOLD version 1.1 is a single-model machine learning based method, which was used for automated predictions of global model quality (QMODE1). ModFOLDclust is a simple clustering based method, which was used for automated predictions of both global and local quality (QMODE2). In addition, manual predictions of model quality were made using ModFOLD version 2.0-an experimental method that combines the scores from ModFOLDclust and ModFOLD v1.1. Predictions from the ModFOLDclust method were the most successful of the three in terms of the global model quality, whilst the ModFOLD v1.1 method was comparable in performance to other single-model based methods. In addition, the ModFOLDclust method performed well at predicting the per-residue, or local, model quality scores. Predictions of the per-residue errors in our own 3D models, selected using the ModFOLD v2.0 method, were also the most accurate compared with those from other methods. All of the MQAPs described are publicly accessible via the ModFOLD server at: http://www.reading.ac.uk/bioinf/ModFOLD/. The methods are also freely available to download from: http://www.reading.ac.uk/bioinf/downloads/.
Resumo:
Structure activity relationships (SARs) are presented for the gas-phase reactions of RO2 with HO2, and the self- and cross-reactions of RO2. For RO2+HO2 the SAR is based upon a correlation between the logarithm of the measured rate coefficient and a calculated ionisation potential for the molecule R-CH=CH2, R being the same group in both the radical and molecular analogue. The correlation observed is strong and only for one RO2 species does the measured rate coefficient deviate by more than a factor of two from the linear least-squares regression line. For the self- and cross-reactions of RO2 radicals, the SAR is based upon a correlation between the logarithm of the measured rate coefficient and the calculated electrostatic potential (ESP) at the equivalent carbon atom in the RH molecule to which oxygen is attached in RO2, again R being the same group in the molecule and the radical. For cases where R is a simple alkyl-group, a strong linear correlation observed. For RO2 radicals which contain lone pair-bearing substituents and for which the calculated ESP<-0.05 self-reaction rate coefficients appear to be insensitive to the value of the ESP. For RO2 of this type with ESP>-0.05 a linear relationship between log k and the ESP is again observed. Using the relationships, 84 out of the 85 rate coefficients used to develop the SARs are predicted to within a factor of three of their measured values. A relationship is also presented that allows the prediction of the Arrhenius parameters for the self-reactions of simple alkyl RO2 radicals. On the basis of the correlations, predictions of room-temperature rate coefficients are made for a number of atmospherically important peroxyl-peroxyl radical reactions. (C) 2003 Elsevier Ltd. All rights reserved.
Resumo:
A new structure of Radial Basis Function (RBF) neural network called the Dual-orthogonal RBF Network (DRBF) is introduced for nonlinear time series prediction. The hidden nodes of a conventional RBF network compare the Euclidean distance between the network input vector and the centres, and the node responses are radially symmetrical. But in time series prediction where the system input vectors are lagged system outputs, which are usually highly correlated, the Euclidean distance measure may not be appropriate. The DRBF network modifies the distance metric by introducing a classification function which is based on the estimation data set. Training the DRBF networks consists of two stages. Learning the classification related basis functions and the important input nodes, followed by selecting the regressors and learning the weights of the hidden nodes. In both cases, a forward Orthogonal Least Squares (OLS) selection procedure is applied, initially to select the important input nodes and then to select the important centres. Simulation results of single-step and multi-step ahead predictions over a test data set are included to demonstrate the effectiveness of the new approach.
Resumo:
A number of new and newly improved methods for predicting protein structure developed by the Jones–University College London group were used to make predictions for the CASP6 experiment. Structures were predicted with a combination of fold recognition methods (mGenTHREADER, nFOLD, and THREADER) and a substantially enhanced version of FRAGFOLD, our fragment assembly method. Attempts at automatic domain parsing were made using DomPred and DomSSEA, which are based on a secondary structure parsing algorithm and additionally for DomPred, a simple local sequence alignment scoring function. Disorder prediction was carried out using a new SVM-based version of DISOPRED. Attempts were also made at domain docking and “microdomain” folding in order to build complete chain models for some targets.
Resumo:
Dynamically disordered regions appear to be relatively abundant in eukaryotic proteomes. The DISOPRED server allows users to submit a protein sequence, and returns a probability estimate of each residue in the sequence being disordered. The results are sent in both plain text and graphical formats, and the server can also supply predictions of secondary structure to provide further structural information.