51 resultados para Structural bioinformatics
em CentAUR: Central Archive University of Reading - UK
Resumo:
The reliable assessment of the quality of protein structural models is fundamental to the progress of structural bioinformatics. The ModFOLD server provides access to two accurate techniques for the global and local prediction of the quality of 3D models of proteins. Firstly ModFOLD, which is a fast Model Quality Assessment Program (MQAP) used for the global assessment of either single or multiple models. Secondly ModFOLDclust, which is a more intensive method that carries out clustering of multiple models and provides per-residue local quality assessment.
Resumo:
Background: Selecting the highest quality 3D model of a protein structure from a number of alternatives remains an important challenge in the field of structural bioinformatics. Many Model Quality Assessment Programs (MQAPs) have been developed which adopt various strategies in order to tackle this problem, ranging from the so called "true" MQAPs capable of producing a single energy score based on a single model, to methods which rely on structural comparisons of multiple models or additional information from meta-servers. However, it is clear that no current method can separate the highest accuracy models from the lowest consistently. In this paper, a number of the top performing MQAP methods are benchmarked in the context of the potential value that they add to protein fold recognition. Two novel methods are also described: ModSSEA, which based on the alignment of predicted secondary structure elements and ModFOLD which combines several true MQAP methods using an artificial neural network. Results: The ModSSEA method is found to be an effective model quality assessment program for ranking multiple models from many servers, however further accuracy can be gained by using the consensus approach of ModFOLD. The ModFOLD method is shown to significantly outperform the true MQAPs tested and is competitive with methods which make use of clustering or additional information from multiple servers. Several of the true MQAPs are also shown to add value to most individual fold recognition servers by improving model selection, when applied as a post filter in order to re-rank models. Conclusion: MQAPs should be benchmarked appropriately for the practical context in which they are intended to be used. Clustering based methods are the top performing MQAPs where many models are available from many servers; however, they often do not add value to individual fold recognition servers when limited models are available. Conversely, the true MQAP methods tested can often be used as effective post filters for re-ranking few models from individual fold recognition servers and further improvements can be achieved using a consensus of these methods.
Resumo:
MOTIVATION: The accurate prediction of the quality of 3D models is a key component of successful protein tertiary structure prediction methods. Currently, clustering or consensus based Model Quality Assessment Programs (MQAPs) are the most accurate methods for predicting 3D model quality; however they are often CPU intensive as they carry out multiple structural alignments in order to compare numerous models. In this study, we describe ModFOLDclustQ - a novel MQAP that compares 3D models of proteins without the need for CPU intensive structural alignments by utilising the Q measure for model comparisons. The ModFOLDclustQ method is benchmarked against the top established methods in terms of both accuracy and speed. In addition, the ModFOLDclustQ scores are combined with those from our older ModFOLDclust method to form a new method, ModFOLDclust2, that aims to provide increased prediction accuracy with negligible computational overhead. RESULTS: The ModFOLDclustQ method is competitive with leading clustering based MQAPs for the prediction of global model quality, yet it is up to 150 times faster than the previous version of the ModFOLDclust method at comparing models of small proteins (<60 residues) and over 5 times faster at comparing models of large proteins (>800 residues). Furthermore, a significant improvement in accuracy can be gained over the previous clustering based MQAPs by combining the scores from ModFOLDclustQ and ModFOLDclust to form the new ModFOLDclust2 method, with little impact on the overall time taken for each prediction. AVAILABILITY: The ModFOLDclustQ and ModFOLDclust2 methods are available to download from: http://www.reading.ac.uk/bioinf/downloads/ CONTACT: l.j.mcguffin@reading.ac.uk.
An LDA and probability-based classifier for the diagnosis of Alzheimer's Disease from structural MRI
Resumo:
In this paper a custom classification algorithm based on linear discriminant analysis and probability-based weights is implemented and applied to the hippocampus measurements of structural magnetic resonance images from healthy subjects and Alzheimer’s Disease sufferers; and then attempts to diagnose them as accurately as possible. The classifier works by classifying each measurement of a hippocampal volume as healthy controlsized or Alzheimer’s Disease-sized, these new features are then weighted and used to classify the subject as a healthy control or suffering from Alzheimer’s Disease. The preliminary results obtained reach an accuracy of 85.8% and this is a similar accuracy to state-of-the-art methods such as a Naive Bayes classifier and a Support Vector Machine. An advantage of the method proposed in this paper over the aforementioned state of the art classifiers is the descriptive ability of the classifications it produces. The descriptive model can be of great help to aid a doctor in the diagnosis of Alzheimer’s Disease, or even further the understand of how Alzheimer’s Disease affects the hippocampus.
Resumo:
This work investigates the problem of feature selection in neuroimaging features from structural MRI brain images for the classification of subjects as healthy controls, suffering from Mild Cognitive Impairment or Alzheimer’s Disease. A Genetic Algorithm wrapper method for feature selection is adopted in conjunction with a Support Vector Machine classifier. In very large feature sets, feature selection is found to be redundant as the accuracy is often worsened when compared to an Support Vector Machine with no feature selection. However, when just the hippocampal subfields are used, feature selection shows a significant improvement of the classification accuracy. Three-class Support Vector Machines and two-class Support Vector Machines combined with weighted voting are also compared with the former and found more useful. The highest accuracy achieved at classifying the test data was 65.5% using a genetic algorithm for feature selection with a three-class Support Vector Machine classifier.
Resumo:
Terpene synthases are responsible for the biosynthesis of the complex chemical defense arsenal of plants and microorganisms. How do these enzymes, which all appear to share a common terpene synthase fold, specify the many different products made almost entirely from one of only three substrates? Elucidation of the structure of 1,8-cineole synthase from Salvia fruticosa (Sf-CinS1) combined with analysis of functional and phylogenetic relationships of enzymes within Salvia species identified active-site residues responsible for product specificity. Thus, Sf-CinS1 was successfully converted to a sabinene synthase with a minimum number of rationally predicted substitutions, while identification of the Asn side chain essential for water activation introduced 1,8-cineole and alpha-terpineol activity to Salvia pomifera sabinene synthase. A major contribution to product specificity in Sf-CinS1 appears to come from a local deformation within one of the helices forming the active site. This deformation is observed in all other mono- or sesquiterpene structures available, pointing to a conserved mechanism. Moreover, a single amino acid substitution enlarged the active-site cavity enough to accommodate the larger farnesyl pyrophosphate substrate and led to the efficient synthesis of sesquiterpenes, while alternate single substitutions of this critical amino acid yielded five additional terpene synthases.
Resumo:
The pharmaceutical material chlorothiazide (6-chloro-4H-1,2,4-benzothiadiazine-7-sulfonamide 1,1-dioxide) has been studied under high-pressure conditions using single-crystal and powder X-ray diffraction. An isosymmetric phase transition to a second polymorph occurs at 4.4 GPa. An analysis of the structural changes that occur during this phase transition has been performed.
Resumo:
This paper exploits a structural time series approach to model the time pattern of multiple and resurgent food scares and their direct and cross-product impacts on consumer response. A structural time series Almost Ideal Demand System (STS-AIDS) is embedded in a vector error correction framework to allow for dynamic effects (VEC-STS-AIDS). Italian aggregate household data on meat demand is used to assess the time-varying impact of a resurgent BSE crisis (1996 and 2000) and the 1999 Dioxin crisis. The VEC-STS-AIDS model monitors the short-run impacts and performs satisfactorily in terms of residuals diagnostics, overcoming the major problems encountered by the customary vector error correction approach.
Resumo:
The production of sufficient quantities of protein is an essential prelude to a structure determination, but for many viral and human proteins this cannot be achieved using prokaryotic expression systems. Groups in the Structural Proteomics In Europe ( SPINE) consortium have developed and implemented high- throughput ( HTP) methodologies for cloning, expression screening and protein production in eukaryotic systems. Studies focused on three systems: yeast ( Pichia pastoris and Saccharomyces cerevisiae), baculovirusinfected insect cells and transient expression in mammalian cells. Suitable vectors for HTP cloning are described and results from their use in expression screening and protein-production pipelines are reported. Strategies for coexpression, selenomethionine labelling ( in all three eukaryotic systems) and control of glycosylation ( for secreted proteins in mammalian cells) are assessed.
Resumo:
BACKGROUND: In order to maintain the most comprehensive structural annotation databases we must carry out regular updates for each proteome using the latest profile-profile fold recognition methods. The ability to carry out these updates on demand is necessary to keep pace with the regular updates of sequence and structure databases. Providing the highest quality structural models requires the most intensive profile-profile fold recognition methods running with the very latest available sequence databases and fold libraries. However, running these methods on such a regular basis for every sequenced proteome requires large amounts of processing power.In this paper we describe and benchmark the JYDE (Job Yield Distribution Environment) system, which is a meta-scheduler designed to work above cluster schedulers, such as Sun Grid Engine (SGE) or Condor. We demonstrate the ability of JYDE to distribute the load of genomic-scale fold recognition across multiple independent Grid domains. We use the most recent profile-profile version of our mGenTHREADER software in order to annotate the latest version of the Human proteome against the latest sequence and structure databases in as short a time as possible. RESULTS: We show that our JYDE system is able to scale to large numbers of intensive fold recognition jobs running across several independent computer clusters. Using our JYDE system we have been able to annotate 99.9% of the protein sequences within the Human proteome in less than 24 hours, by harnessing over 500 CPUs from 3 independent Grid domains. CONCLUSION: This study clearly demonstrates the feasibility of carrying out on demand high quality structural annotations for the proteomes of major eukaryotic organisms. Specifically, we have shown that it is now possible to provide complete regular updates of profile-profile based fold recognition models for entire eukaryotic proteomes, through the use of Grid middleware such as JYDE.