10 resultados para clustering quality metrics
em CentAUR: Central Archive University of Reading - UK
Resumo:
The Self-Organizing Map (SOM) is a popular unsupervised neural network able to provide effective clustering and data visualization for data represented in multidimensional input spaces. In this paper, we describe Fast Learning SOM (FLSOM) which adopts a learning algorithm that improves the performance of the standard SOM with respect to the convergence time in the training phase. We show that FLSOM also improves the quality of the map by providing better clustering quality and topology preservation of multidimensional input data. Several tests have been carried out on different multidimensional datasets, which demonstrate better performances of the algorithm in comparison with the original SOM.
Resumo:
The Self-Organizing Map (SOM) is a popular unsupervised neural network able to provide effective clustering and data visualization for multidimensional input datasets. In this paper, we present an application of the simulated annealing procedure to the SOM learning algorithm with the aim to obtain a fast learning and better performances in terms of quantization error. The proposed learning algorithm is called Fast Learning Self-Organized Map, and it does not affect the easiness of the basic learning algorithm of the standard SOM. The proposed learning algorithm also improves the quality of resulting maps by providing better clustering quality and topology preservation of input multi-dimensional data. Several experiments are used to compare the proposed approach with the original algorithm and some of its modification and speed-up techniques.
The capability-affordance model: a method for analysis and modelling of capabilities and affordances
Resumo:
Existing capability models lack qualitative and quantitative means to compare business capabilities. This paper extends previous work and uses affordance theories to consistently model and analyse capabilities. We use the concept of objective and subjective affordances to model capability as a tuple of a set of resource affordance system mechanisms and action paths, dependent on one or more critical affordance factors. We identify an affordance chain of subjective affordances by which affordances work together to enable an action and an affordance path that links action affordances to create a capability system. We define the mechanism and path underlying capability. We show how affordance modelling notation, AMN, can represent affordances comprising a capability. We propose a method to quantitatively and qualitatively compare capabilities using efficiency, effectiveness and quality metrics. The method is demonstrated by a medical example comparing the capability of syringe and needless anaesthetic systems.
Resumo:
MOTIVATION: The accurate prediction of the quality of 3D models is a key component of successful protein tertiary structure prediction methods. Currently, clustering or consensus based Model Quality Assessment Programs (MQAPs) are the most accurate methods for predicting 3D model quality; however they are often CPU intensive as they carry out multiple structural alignments in order to compare numerous models. In this study, we describe ModFOLDclustQ - a novel MQAP that compares 3D models of proteins without the need for CPU intensive structural alignments by utilising the Q measure for model comparisons. The ModFOLDclustQ method is benchmarked against the top established methods in terms of both accuracy and speed. In addition, the ModFOLDclustQ scores are combined with those from our older ModFOLDclust method to form a new method, ModFOLDclust2, that aims to provide increased prediction accuracy with negligible computational overhead. RESULTS: The ModFOLDclustQ method is competitive with leading clustering based MQAPs for the prediction of global model quality, yet it is up to 150 times faster than the previous version of the ModFOLDclust method at comparing models of small proteins (<60 residues) and over 5 times faster at comparing models of large proteins (>800 residues). Furthermore, a significant improvement in accuracy can be gained over the previous clustering based MQAPs by combining the scores from ModFOLDclustQ and ModFOLDclust to form the new ModFOLDclust2 method, with little impact on the overall time taken for each prediction. AVAILABILITY: The ModFOLDclustQ and ModFOLDclust2 methods are available to download from: http://www.reading.ac.uk/bioinf/downloads/ CONTACT: l.j.mcguffin@reading.ac.uk.
Resumo:
The development of effective methods for predicting the quality of three-dimensional (3D) models is fundamentally important for the success of tertiary structure (TS) prediction strategies. Since CASP7, the Quality Assessment (QA) category has existed to gauge the ability of various model quality assessment programs (MQAPs) at predicting the relative quality of individual 3D models. For the CASP8 experiment, automated predictions were submitted in the QA category using two methods from the ModFOLD server-ModFOLD version 1.1 and ModFOLDclust. ModFOLD version 1.1 is a single-model machine learning based method, which was used for automated predictions of global model quality (QMODE1). ModFOLDclust is a simple clustering based method, which was used for automated predictions of both global and local quality (QMODE2). In addition, manual predictions of model quality were made using ModFOLD version 2.0-an experimental method that combines the scores from ModFOLDclust and ModFOLD v1.1. Predictions from the ModFOLDclust method were the most successful of the three in terms of the global model quality, whilst the ModFOLD v1.1 method was comparable in performance to other single-model based methods. In addition, the ModFOLDclust method performed well at predicting the per-residue, or local, model quality scores. Predictions of the per-residue errors in our own 3D models, selected using the ModFOLD v2.0 method, were also the most accurate compared with those from other methods. All of the MQAPs described are publicly accessible via the ModFOLD server at: http://www.reading.ac.uk/bioinf/ModFOLD/. The methods are also freely available to download from: http://www.reading.ac.uk/bioinf/downloads/.
Resumo:
The reliable assessment of the quality of protein structural models is fundamental to the progress of structural bioinformatics. The ModFOLD server provides access to two accurate techniques for the global and local prediction of the quality of 3D models of proteins. Firstly ModFOLD, which is a fast Model Quality Assessment Program (MQAP) used for the global assessment of either single or multiple models. Secondly ModFOLDclust, which is a more intensive method that carries out clustering of multiple models and provides per-residue local quality assessment.
Resumo:
Background: Selecting the highest quality 3D model of a protein structure from a number of alternatives remains an important challenge in the field of structural bioinformatics. Many Model Quality Assessment Programs (MQAPs) have been developed which adopt various strategies in order to tackle this problem, ranging from the so called "true" MQAPs capable of producing a single energy score based on a single model, to methods which rely on structural comparisons of multiple models or additional information from meta-servers. However, it is clear that no current method can separate the highest accuracy models from the lowest consistently. In this paper, a number of the top performing MQAP methods are benchmarked in the context of the potential value that they add to protein fold recognition. Two novel methods are also described: ModSSEA, which based on the alignment of predicted secondary structure elements and ModFOLD which combines several true MQAP methods using an artificial neural network. Results: The ModSSEA method is found to be an effective model quality assessment program for ranking multiple models from many servers, however further accuracy can be gained by using the consensus approach of ModFOLD. The ModFOLD method is shown to significantly outperform the true MQAPs tested and is competitive with methods which make use of clustering or additional information from multiple servers. Several of the true MQAPs are also shown to add value to most individual fold recognition servers by improving model selection, when applied as a post filter in order to re-rank models. Conclusion: MQAPs should be benchmarked appropriately for the practical context in which they are intended to be used. Clustering based methods are the top performing MQAPs where many models are available from many servers; however, they often do not add value to individual fold recognition servers when limited models are available. Conversely, the true MQAP methods tested can often be used as effective post filters for re-ranking few models from individual fold recognition servers and further improvements can be achieved using a consensus of these methods.
Resumo:
Motivation: Modelling the 3D structures of proteins can often be enhanced if more than one fold template is used during the modelling process. However, in many cases, this may also result in poorer model quality for a given target or alignment method. There is a need for modelling protocols that can both consistently and significantly improve 3D models and provide an indication of when models might not benefit from the use of multiple target-template alignments. Here, we investigate the use of both global and local model quality prediction scores produced by ModFOLDclust2, to improve the selection of target-template alignments for the construction of multiple-template models. Additionally, we evaluate clustering the resulting population of multi- and single-template models for the improvement of our IntFOLD-TS tertiary structure prediction method. Results: We find that using accurate local model quality scores to guide alignment selection is the most consistent way to significantly improve models for each of the sequence to structure alignment methods tested. In addition, using accurate global model quality for re-ranking alignments, prior to selection, further improves the majority of multi-template modelling methods tested. Furthermore, subsequent clustering of the resulting population of multiple-template models significantly improves the quality of selected models compared with the previous version of our tertiary structure prediction method, IntFOLD-TS.
Resumo:
The estimation of prediction quality is important because without quality measures, it is difficult to determine the usefulness of a prediction. Currently, methods for ligand binding site residue predictions are assessed in the function prediction category of the biennial Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiment, utilizing the Matthews Correlation Coefficient (MCC) and Binding-site Distance Test (BDT) metrics. However, the assessment of ligand binding site predictions using such metrics requires the availability of solved structures with bound ligands. Thus, we have developed a ligand binding site quality assessment tool, FunFOLDQA, which utilizes protein feature analysis to predict ligand binding site quality prior to the experimental solution of the protein structures and their ligand interactions. The FunFOLDQA feature scores were combined using: simple linear combinations, multiple linear regression and a neural network. The neural network produced significantly better results for correlations to both the MCC and BDT scores, according to Kendall’s τ, Spearman’s ρ and Pearson’s r correlation coefficients, when tested on both the CASP8 and CASP9 datasets. The neural network also produced the largest Area Under the Curve score (AUC) when Receiver Operator Characteristic (ROC) analysis was undertaken for the CASP8 dataset. Furthermore, the FunFOLDQA algorithm incorporating the neural network, is shown to add value to FunFOLD, when both methods are employed in combination. This results in a statistically significant improvement over all of the best server methods, the FunFOLD method (6.43%), and one of the top manual groups (FN293) tested on the CASP8 dataset. The FunFOLDQA method was also found to be competitive with the top server methods when tested on the CASP9 dataset. To the best of our knowledge, FunFOLDQA is the first attempt to develop a method that can be used to assess ligand binding site prediction quality, in the absence of experimental data.
Resumo:
This paper presents a summary of the work done within the European Union's Seventh Framework Programme project ECLIPSE (Evaluating the Climate and Air Quality Impacts of Short-Lived Pollutants). ECLIPSE had a unique systematic concept for designing a realistic and effective mitigation scenario for short-lived climate pollutants (SLCPs; methane, aerosols and ozone, and their precursor species) and quantifying its climate and air quality impacts, and this paper presents the results in the context of this overarching strategy. The first step in ECLIPSE was to create a new emission inventory based on current legislation (CLE) for the recent past and until 2050. Substantial progress compared to previous work was made by including previously unaccounted types of sources such as flaring of gas associated with oil production, and wick lamps. These emission data were used for present-day reference simulations with four advanced Earth system models (ESMs) and six chemistry transport models (CTMs). The model simulations were compared with a variety of ground-based and satellite observational data sets from Asia, Europe and the Arctic. It was found that the models still underestimate the measured seasonality of aerosols in the Arctic but to a lesser extent than in previous studies. Problems likely related to the emissions were identified for northern Russia and India, in particular. To estimate the climate impacts of SLCPs, ECLIPSE followed two paths of research: the first path calculated radiative forcing (RF) values for a large matrix of SLCP species emissions, for different seasons and regions independently. Based on these RF calculations, the Global Temperature change Potential metric for a time horizon of 20 years (GTP20) was calculated for each SLCP emission type. This climate metric was then used in an integrated assessment model to identify all emission mitigation measures with a beneficial air quality and short-term (20-year) climate impact. These measures together defined a SLCP mitigation (MIT) scenario. Compared to CLE, the MIT scenario would reduce global methane (CH4) and black carbon (BC) emissions by about 50 and 80 %, respectively. For CH4, measures on shale gas production, waste management and coal mines were most important. For non-CH4 SLCPs, elimination of high-emitting vehicles and wick lamps, as well as reducing emissions from gas flaring, coal and biomass stoves, agricultural waste, solvents and diesel engines were most important. These measures lead to large reductions in calculated surface concentrations of ozone and particulate matter. We estimate that in the EU, the loss of statistical life expectancy due to air pollution was 7.5 months in 2010, which will be reduced to 5.2 months by 2030 in the CLE scenario. The MIT scenario would reduce this value by another 0.9 to 4.3 months. Substantially larger reductions due to the mitigation are found for China (1.8 months) and India (11–12 months). The climate metrics cannot fully quantify the climate response. Therefore, a second research path was taken. Transient climate ensemble simulations with the four ESMs were run for the CLE and MIT scenarios, to determine the climate impacts of the mitigation. In these simulations, the CLE scenario resulted in a surface temperature increase of 0.70 ± 0.14 K between the years 2006 and 2050. For the decade 2041–2050, the warming was reduced by 0.22 ± 0.07 K in the MIT scenario, and this result was in almost exact agreement with the response calculated based on the emission metrics (reduced warming of 0.22 ± 0.09 K). The metrics calculations suggest that non-CH4 SLCPs contribute ~ 22 % to this response and CH4 78 %. This could not be fully confirmed by the transient simulations, which attributed about 90 % of the temperature response to CH4 reductions. Attribution of the observed temperature response to non-CH4 SLCP emission reductions and BC specifically is hampered in the transient simulations by small forcing and co-emitted species of the emission basket chosen. Nevertheless, an important conclusion is that our mitigation basket as a whole would lead to clear benefits for both air quality and climate. The climate response from BC reductions in our study is smaller than reported previously, possibly because our study is one of the first to use fully coupled climate models, where unforced variability and sea ice responses cause relatively strong temperature fluctuations that may counteract (and, thus, mask) the impacts of small emission reductions. The temperature responses to the mitigation were generally stronger over the continents than over the oceans, and with a warming reduction of 0.44 K (0.39–0.49) K the largest over the Arctic. Our calculations suggest particularly beneficial climate responses in southern Europe, where surface warming was reduced by about 0.3 K and precipitation rates were increased by about 15 (6–21) mm yr−1 (more than 4 % of total precipitation) from spring to autumn. Thus, the mitigation could help to alleviate expected future drought and water shortages in the Mediterranean area. We also report other important results of the ECLIPSE project.