891 resultados para Hierarchical clustering model
Resumo:
MOTIVATION: The accurate prediction of the quality of 3D models is a key component of successful protein tertiary structure prediction methods. Currently, clustering or consensus based Model Quality Assessment Programs (MQAPs) are the most accurate methods for predicting 3D model quality; however they are often CPU intensive as they carry out multiple structural alignments in order to compare numerous models. In this study, we describe ModFOLDclustQ - a novel MQAP that compares 3D models of proteins without the need for CPU intensive structural alignments by utilising the Q measure for model comparisons. The ModFOLDclustQ method is benchmarked against the top established methods in terms of both accuracy and speed. In addition, the ModFOLDclustQ scores are combined with those from our older ModFOLDclust method to form a new method, ModFOLDclust2, that aims to provide increased prediction accuracy with negligible computational overhead. RESULTS: The ModFOLDclustQ method is competitive with leading clustering based MQAPs for the prediction of global model quality, yet it is up to 150 times faster than the previous version of the ModFOLDclust method at comparing models of small proteins (<60 residues) and over 5 times faster at comparing models of large proteins (>800 residues). Furthermore, a significant improvement in accuracy can be gained over the previous clustering based MQAPs by combining the scores from ModFOLDclustQ and ModFOLDclust to form the new ModFOLDclust2 method, with little impact on the overall time taken for each prediction. AVAILABILITY: The ModFOLDclustQ and ModFOLDclust2 methods are available to download from: http://www.reading.ac.uk/bioinf/downloads/ CONTACT: l.j.mcguffin@reading.ac.uk.
Resumo:
The development of effective methods for predicting the quality of three-dimensional (3D) models is fundamentally important for the success of tertiary structure (TS) prediction strategies. Since CASP7, the Quality Assessment (QA) category has existed to gauge the ability of various model quality assessment programs (MQAPs) at predicting the relative quality of individual 3D models. For the CASP8 experiment, automated predictions were submitted in the QA category using two methods from the ModFOLD server-ModFOLD version 1.1 and ModFOLDclust. ModFOLD version 1.1 is a single-model machine learning based method, which was used for automated predictions of global model quality (QMODE1). ModFOLDclust is a simple clustering based method, which was used for automated predictions of both global and local quality (QMODE2). In addition, manual predictions of model quality were made using ModFOLD version 2.0-an experimental method that combines the scores from ModFOLDclust and ModFOLD v1.1. Predictions from the ModFOLDclust method were the most successful of the three in terms of the global model quality, whilst the ModFOLD v1.1 method was comparable in performance to other single-model based methods. In addition, the ModFOLDclust method performed well at predicting the per-residue, or local, model quality scores. Predictions of the per-residue errors in our own 3D models, selected using the ModFOLD v2.0 method, were also the most accurate compared with those from other methods. All of the MQAPs described are publicly accessible via the ModFOLD server at: http://www.reading.ac.uk/bioinf/ModFOLD/. The methods are also freely available to download from: http://www.reading.ac.uk/bioinf/downloads/.
Resumo:
Mathematical modeling of bacterial chemotaxis systems has been influential and insightful in helping to understand experimental observations. We provide here a comprehensive overview of the range of mathematical approaches used for modeling, within a single bacterium, chemotactic processes caused by changes to external gradients in its environment. Specific areas of the bacterial system which have been studied and modeled are discussed in detail, including the modeling of adaptation in response to attractant gradients, the intracellular phosphorylation cascade, membrane receptor clustering, and spatial modeling of intracellular protein signal transduction. The importance of producing robust models that address adaptation, gain, and sensitivity are also discussed. This review highlights that while mathematical modeling has aided in understanding bacterial chemotaxis on the individual cell scale and guiding experimental design, no single model succeeds in robustly describing all of the basic elements of the cell. We conclude by discussing the importance of this and the future of modeling in this area.
Resumo:
Background: Selecting the highest quality 3D model of a protein structure from a number of alternatives remains an important challenge in the field of structural bioinformatics. Many Model Quality Assessment Programs (MQAPs) have been developed which adopt various strategies in order to tackle this problem, ranging from the so called "true" MQAPs capable of producing a single energy score based on a single model, to methods which rely on structural comparisons of multiple models or additional information from meta-servers. However, it is clear that no current method can separate the highest accuracy models from the lowest consistently. In this paper, a number of the top performing MQAP methods are benchmarked in the context of the potential value that they add to protein fold recognition. Two novel methods are also described: ModSSEA, which based on the alignment of predicted secondary structure elements and ModFOLD which combines several true MQAP methods using an artificial neural network. Results: The ModSSEA method is found to be an effective model quality assessment program for ranking multiple models from many servers, however further accuracy can be gained by using the consensus approach of ModFOLD. The ModFOLD method is shown to significantly outperform the true MQAPs tested and is competitive with methods which make use of clustering or additional information from multiple servers. Several of the true MQAPs are also shown to add value to most individual fold recognition servers by improving model selection, when applied as a post filter in order to re-rank models. Conclusion: MQAPs should be benchmarked appropriately for the practical context in which they are intended to be used. Clustering based methods are the top performing MQAPs where many models are available from many servers; however, they often do not add value to individual fold recognition servers when limited models are available. Conversely, the true MQAP methods tested can often be used as effective post filters for re-ranking few models from individual fold recognition servers and further improvements can be achieved using a consensus of these methods.
Resumo:
In this work a method for building multiple-model structures is presented. A clustering algorithm that uses data from the system is employed to define the architecture of the multiple-model, including the size of the region covered by each model, and the number of models. A heating ventilation and air conditioning system is used as a testbed of the proposed method.
Resumo:
In this work a method for building multiple-model structures is presented. A clustering algorithm that uses data from the system is employed to define the architecture of the multiple-model, including the size of the region covered by each model, and the number of models. A heating ventilation and air conditioning system is used as a testbed of the proposed method.
Resumo:
The identification and visualization of clusters formed by motor unit action potentials (MUAPs) is an essential step in investigations seeking to explain the control of the neuromuscular system. This work introduces the generative topographic mapping (GTM), a novel machine learning tool, for clustering of MUAPs, and also it extends the GTM technique to provide a way of visualizing MUAPs. The performance of GTM was compared to that of three other clustering methods: the self-organizing map (SOM), a Gaussian mixture model (GMM), and the neural-gas network (NGN). The results, based on the study of experimental MUAPs, showed that the rate of success of both GTM and SOM outperformed that of GMM and NGN, and also that GTM may in practice be used as a principled alternative to the SOM in the study of MUAPs. A visualization tool, which we called GTM grid, was devised for visualization of MUAPs lying in a high-dimensional space. The visualization provided by the GTM grid was compared to that obtained from principal component analysis (PCA). (c) 2005 Elsevier Ireland Ltd. All rights reserved.
Resumo:
Based on integrated system optimisation and parameter estimation a method is described for on-line steady state optimisation which compensates for model-plant mismatch and solves a non-linear optimisation problem by iterating on a linear - quadratic representation. The method requires real process derivatives which are estimated using a dynamic identification technique. The utility of the method is demonstrated using a simulation of the Tennessee Eastman benchmark chemical process.
Resumo:
Motivation: Modelling the 3D structures of proteins can often be enhanced if more than one fold template is used during the modelling process. However, in many cases, this may also result in poorer model quality for a given target or alignment method. There is a need for modelling protocols that can both consistently and significantly improve 3D models and provide an indication of when models might not benefit from the use of multiple target-template alignments. Here, we investigate the use of both global and local model quality prediction scores produced by ModFOLDclust2, to improve the selection of target-template alignments for the construction of multiple-template models. Additionally, we evaluate clustering the resulting population of multi- and single-template models for the improvement of our IntFOLD-TS tertiary structure prediction method. Results: We find that using accurate local model quality scores to guide alignment selection is the most consistent way to significantly improve models for each of the sequence to structure alignment methods tested. In addition, using accurate global model quality for re-ranking alignments, prior to selection, further improves the majority of multi-template modelling methods tested. Furthermore, subsequent clustering of the resulting population of multiple-template models significantly improves the quality of selected models compared with the previous version of our tertiary structure prediction method, IntFOLD-TS.
Resumo:
Ensemble clustering (EC) can arise in data assimilation with ensemble square root filters (EnSRFs) using non-linear models: an M-member ensemble splits into a single outlier and a cluster of M−1 members. The stochastic Ensemble Kalman Filter does not present this problem. Modifications to the EnSRFs by a periodic resampling of the ensemble through random rotations have been proposed to address it. We introduce a metric to quantify the presence of EC and present evidence to dispel the notion that EC leads to filter failure. Starting from a univariate model, we show that EC is not a permanent but transient phenomenon; it occurs intermittently in non-linear models. We perform a series of data assimilation experiments using a standard EnSRF and a modified EnSRF by a resampling though random rotations. The modified EnSRF thus alleviates issues associated with EC at the cost of traceability of individual ensemble trajectories and cannot use some of algorithms that enhance performance of standard EnSRF. In the non-linear regimes of low-dimensional models, the analysis root mean square error of the standard EnSRF slowly grows with ensemble size if the size is larger than the dimension of the model state. However, we do not observe this problem in a more complex model that uses an ensemble size much smaller than the dimension of the model state, along with inflation and localisation. Overall, we find that transient EC does not handicap the performance of the standard EnSRF.
Resumo:
We present, pedagogically, the Bayesian approach to composed error models under alternative, hierarchical characterizations; demonstrate, briefly, the Bayesian approach to model comparison using recent advances in Markov Chain Monte Carlo (MCMC) methods; and illustrate, empirically, the value of these techniques to natural resource economics and coastal fisheries management, in particular. The Bayesian approach to fisheries efficiency analysis is interesting for at least three reasons. First, it is a robust and highly flexible alternative to commonly applied, frequentist procedures, which dominate the literature. Second,the Bayesian approach is extremely simple to implement, requiring only a modest addition to most natural-resource economist tool-kits. Third, despite its attractions, applications of Bayesian methodology in coastal fisheries management are few.
Resumo:
This work proposes a unified neurofuzzy modelling scheme. To begin with, the initial fuzzy base construction method is based on fuzzy clustering utilising a Gaussian mixture model (GMM) combined with the analysis of covariance (ANOVA) decomposition in order to obtain more compact univariate and bivariate membership functions over the subspaces of the input features. The mean and covariance of the Gaussian membership functions are found by the expectation maximisation (EM) algorithm with the merit of revealing the underlying density distribution of system inputs. The resultant set of membership functions forms the basis of the generalised fuzzy model (GFM) inference engine. The model structure and parameters of this neurofuzzy model are identified via the supervised subspace orthogonal least square (OLS) learning. Finally, instead of providing deterministic class label as model output by convention, a logistic regression model is applied to present the classifier’s output, in which the sigmoid type of logistic transfer function scales the outputs of the neurofuzzy model to the class probability. Experimental validation results are presented to demonstrate the effectiveness of the proposed neurofuzzy modelling scheme.
Resumo:
Under particular large-scale atmospheric conditions, several windstorms may affect Europe within a short time period. The occurrence of such cyclone families leads to large socioeconomic impacts and cumulative losses. The serial clustering of windstorms is analyzed for the North Atlantic/western Europe. Clustering is quantified as the dispersion (ratio variance/mean) of cyclone passages over a certain area. Dispersion statistics are derived for three reanalysis data sets and a 20-run European Centre Hamburg Version 5 /Max Planck Institute Version–Ocean Model Version 1 global climate model (ECHAM5/MPI-OM1 GCM) ensemble. The dependence of the seriality on cyclone intensity is analyzed. Confirming previous studies, serial clustering is identified in reanalysis data sets primarily on both flanks and downstream regions of the North Atlantic storm track. This pattern is a robust feature in the reanalysis data sets. For the whole area, extreme cyclones cluster more than nonextreme cyclones. The ECHAM5/MPI-OM1 GCM is generally able to reproduce the spatial patterns of clustering under recent climate conditions, but some biases are identified. Under future climate conditions (A1B scenario), the GCM ensemble indicates that serial clustering may decrease over the North Atlantic storm track area and parts of western Europe. This decrease is associated with an extension of the polar jet toward Europe, which implies a tendency to a more regular occurrence of cyclones over parts of the North Atlantic Basin poleward of 50°N and western Europe. An increase of clustering of cyclones is projected south of Newfoundland. The detected shifts imply a change in the risk of occurrence of cumulative events over Europe under future climate conditions.
Resumo:
During the last decades, several windstorm series hit Europe leading to large aggregated losses. Such storm series are examples of serial clustering of extreme cyclones, presenting a considerable risk for the insurance industry. Clustering of events and return periods of storm series for Germany are quantified based on potential losses using empirical models. Two reanalysis data sets and observations from German weather stations are considered for 30 winters. Histograms of events exceeding selected return levels (1-, 2- and 5-year) are derived. Return periods of historical storm series are estimated based on the Poisson and the negative binomial distributions. Over 4000 years of general circulation model (GCM) simulations forced with current climate conditions are analysed to provide a better assessment of historical return periods. Estimations differ between distributions, for example 40 to 65 years for the 1990 series. For such less frequent series, estimates obtained with the Poisson distribution clearly deviate from empirical data. The negative binomial distribution provides better estimates, even though a sensitivity to return level and data set is identified. The consideration of GCM data permits a strong reduction of uncertainties. The present results support the importance of considering explicitly clustering of losses for an adequate risk assessment for economical applications.
Resumo:
The detection of physiological signals from the motor system (electromyographic signals) is being utilized in the practice clinic to guide the therapist in a more precise and accurate diagnosis of motor disorders. In this context, the process of decomposition of EMG (electromyographic) signals that includes the identification and classification of MUAP (Motor Unit Action Potential) of a EMG signal, is very important to help the therapist in the evaluation of motor disorders. The EMG decomposition is a complex task due to EMG features depend on the electrode type (needle or surface), its placement related to the muscle, the contraction level and the health of the Neuromuscular System. To date, the majority of researches on EMG decomposition utilize EMG signals acquired by needle electrodes, due to their advantages in processing this type of signal. However, relatively few researches have been conducted using surface EMG signals. Thus, this article aims to contribute to the clinical practice by presenting a technique that permit the decomposition of surface EMG signal via the use of Hidden Markov Models. This process is supported by the use of differential evolution and spectral clustering techniques. The developed system presented coherent results in: (1) identification of the number of Motor Units actives in the EMG signal; (2) presentation of the morphological patterns of MUAPs in the EMG signal; (3) identification of the firing sequence of the Motor Units. The model proposed in this work is an advance in the research area of decomposition of surface EMG signals.