997 resultados para CONSENSUS PREDICTION
Resumo:
Consistent in silico models for ADME properties are useful tools in early drug discovery. Here, we report the hologram QSAR modeling of human intestinal absorption using a dataset of 638 compounds with experimental data associated. The final validated models are consistent and robust for the consensus prediction of this important pharmacokinetic property and are suitable for virtual screening applications. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
Protein structure prediction is a cornerstone of bioinformatics research. Membrane proteins require their own prediction methods due to their intrinsically different composition. A variety of tools exist for topology prediction of membrane proteins, many of them available on the Internet. The server described in this paper, BPROMPT (Bayesian PRediction Of Membrane Protein Topology), uses a Bayesian Belief Network to combine the results of other prediction methods, providing a more accurate consensus prediction. Topology predictions with accuracies of 70% for prokaryotes and 53% for eukaryotes were achieved. BPROMPT can be accessed at http://www.jenner.ac.uk/BPROMPT.
Resumo:
MapReduce frameworks such as Hadoop are well suited to handling large sets of data which can be processed separately and independently, with canonical applications in information retrieval and sales record analysis. Rapid advances in sequencing technology have ensured an explosion in the availability of genomic data, with a consequent rise in the importance of large scale comparative genomics, often involving operations and data relationships which deviate from the classical Map Reduce structure. This work examines the application of Hadoop to patterns of this nature, using as our focus a wellestablished workflow for identifying promoters - binding sites for regulatory proteins - Across multiple gene regions and organisms, coupled with the unifying step of assembling these results into a consensus sequence. Our approach demonstrates the utility of Hadoop for problems of this nature, showing how the tyranny of the "dominant decomposition" can be at least partially overcome. It also demonstrates how load balance and the granularity of parallelism can be optimized by pre-processing that splits and reorganizes input files, allowing a wide range of related problems to be brought under the same computational umbrella.
Resumo:
Sequence-structure correlation studies are important in deciphering the relationships between various structural aspects, which may shed light on the protein-folding problem. The first step of this process is the prediction of secondary structure for a protein sequence of unknown three-dimensional structure. To this end, a web server has been created to predict the consensus secondary structure using well known algorithms from the literature. Furthermore, the server allows users to see the occurrence of predicted secondary structural elements in other structure and sequence databases and to visualize predicted helices as a helical wheel plot. The web server is accessible at http://bioserver1.physics.iisc.ernet.in/cssp/.
Resumo:
P2Y(1) is an ADP-activated G protein-coupled receptor (GPCR). Its antagonists impede platelet aggregation in vivo and are potential antithrombotic agents. Combining ligand and structure-based modeling we generated a consensus model (LIST-CM) correlating antagonist structures with their potencies. We docked 45 antagonists into our rhodopsin-based human P2Y(1) homology model and calculated docking scores and free binding energies with the Linear Interaction Energy (LIE) method in continuum-solvent. The resulting alignment was also used to build QSAR based on CoMFA, CoMSIA, and molecular descriptors. To benefit from the strength of each technique and compensate for their limitations, we generated our LIST-CM with a PLS regression based on the predictions of each methodology. A test set featuring untested substituents was synthesized and assayed in inhibition of 2-MeSADP-stimulated PLC activity and in radioligand binding. LIST-CM outperformed internal and external predictivity of any individual model to predict accurately the potency of 75% of the test set.
Resumo:
Ria de Aveiro is a large and shallow lagoon on the west coast of Portugal (40º38’N, 8º45´W), characterized by a complex geometry. It includes large areas of intertidal flats and a network of narrow channels which are connected to the Atlantic by an artificial inlet. Tides are the main forcing of the hydrology and physical processes of the lagoon. The deeper areas near the inlet are characterized by strong marine influence through tidal inflow, with high values of current velocity (>1m/s) and tidal range (2–3 m at spring tides), while in remote shallow areas, the circulation and the sea water inflow are reduced. These remote areas are more influenced by fresh waters received from several rivers and several small streams. The Aveiro lagoon is a very important ecosystem but as been used as recipient for various kinds of anthropogenic wastes resulting from the high population density, urban activities and industrial development. One of the most important Portuguese industrial centre is located in the lagoon margins. Ria de Aveiro is a coastal lagoon under huge direct antropization. This system also suffers strong diffuse antropization. This work is related with diffuse antropization linked with chemical pollution which may lead to biological stress and collapse.
Resumo:
Accurate in silico models for the quantitative prediction of the activity of G protein-coupled receptor (GPCR) ligands would greatly facilitate the process of drug discovery and development. Several methodologies have been developed based on the properties of the ligands, the direct study of the receptor-ligand interactions, or a combination of both approaches. Ligand-based three-dimensional quantitative structure-activity relationships (3D-QSAR) techniques, not requiring knowledge of the receptor structure, have been historically the first to be applied to the prediction of the activity of GPCR ligands. They are generally endowed with robustness and good ranking ability; however they are highly dependent on training sets. Structure-based techniques generally do not provide the level of accuracy necessary to yield meaningful rankings when applied to GPCR homology models. However, they are essentially independent from training sets and have a sufficient level of accuracy to allow an effective discrimination between binders and nonbinders, thus qualifying as viable lead discovery tools. The combination of ligand and structure-based methodologies in the form of receptor-based 3D-QSAR and ligand and structure-based consensus models results in robust and accurate quantitative predictions. The contribution of the structure-based component to these combined approaches is expected to become more substantial and effective in the future, as more sophisticated scoring functions are developed and more detailed structural information on GPCRs is gathered.
Resumo:
Coastal and estuarine landforms provide a physical template that not only accommodates diverse ecosystem functions and human activities, but also mediates flood and erosion risks that are expected to increase with climate change. In this paper, we explore some of the issues associated with the conceptualisation and modelling of coastal morphological change at time and space scales relevant to managers and policy makers. Firstly, we revisit the question of how to define the most appropriate scales at which to seek quantitative predictions of landform change within an age defined by human interference with natural sediment systems and by the prospect of significant changes in climate and ocean forcing. Secondly, we consider the theoretical bases and conceptual frameworks for determining which processes are most important at a given scale of interest and the related problem of how to translate this understanding into models that are computationally feasible, retain a sound physical basis and demonstrate useful predictive skill. In particular, we explore the limitations of a primary scale approach and the extent to which these can be resolved with reference to the concept of the coastal tract and application of systems theory. Thirdly, we consider the importance of different styles of landform change and the need to resolve not only incremental evolution of morphology but also changes in the qualitative dynamics of a system and/or its gross morphological configuration. The extreme complexity and spatially distributed nature of landform systems means that quantitative prediction of future changes must necessarily be approached through mechanistic modelling of some form or another. Geomorphology has increasingly embraced so-called ‘reduced complexity’ models as a means of moving from an essentially reductionist focus on the mechanics of sediment transport towards a more synthesist view of landform evolution. However, there is little consensus on exactly what constitutes a reduced complexity model and the term itself is both misleading and, arguably, unhelpful. Accordingly, we synthesise a set of requirements for what might be termed ‘appropriate complexity modelling’ of quantitative coastal morphological change at scales commensurate with contemporary management and policy-making requirements: 1) The system being studied must be bounded with reference to the time and space scales at which behaviours of interest emerge and/or scientific or management problems arise; 2) model complexity and comprehensiveness must be appropriate to the problem at hand; 3) modellers should seek a priori insights into what kind of behaviours are likely to be evident at the scale of interest and the extent to which the behavioural validity of a model may be constrained by its underlying assumptions and its comprehensiveness; 4) informed by qualitative insights into likely dynamic behaviour, models should then be formulated with a view to resolving critical state changes; and 5) meso-scale modelling of coastal morphological change should reflect critically on the role of modelling and its relation to the observable world.
Resumo:
An emerging consensus in cognitive science views the biological brain as a hierarchically-organized predictive processing system. This is a system in which higher-order regions are continuously attempting to predict the activity of lower-order regions at a variety of (increasingly abstract) spatial and temporal scales. The brain is thus revealed as a hierarchical prediction machine that is constantly engaged in the effort to predict the flow of information originating from the sensory surfaces. Such a view seems to afford a great deal of explanatory leverage when it comes to a broad swathe of seemingly disparate psychological phenomena (e.g., learning, memory, perception, action, emotion, planning, reason, imagination, and conscious experience). In the most positive case, the predictive processing story seems to provide our first glimpse at what a unified (computationally-tractable and neurobiological plausible) account of human psychology might look like. This obviously marks out one reason why such models should be the focus of current empirical and theoretical attention. Another reason, however, is rooted in the potential of such models to advance the current state-of-the-art in machine intelligence and machine learning. Interestingly, the vision of the brain as a hierarchical prediction machine is one that establishes contact with work that goes under the heading of 'deep learning'. Deep learning systems thus often attempt to make use of predictive processing schemes and (increasingly abstract) generative models as a means of supporting the analysis of large data sets. But are such computational systems sufficient (by themselves) to provide a route to general human-level analytic capabilities? I will argue that they are not and that closer attention to a broader range of forces and factors (many of which are not confined to the neural realm) may be required to understand what it is that gives human cognition its distinctive (and largely unique) flavour. The vision that emerges is one of 'homomimetic deep learning systems', systems that situate a hierarchically-organized predictive processing core within a larger nexus of developmental, behavioural, symbolic, technological and social influences. Relative to that vision, I suggest that we should see the Web as a form of 'cognitive ecology', one that is as much involved with the transformation of machine intelligence as it is with the progressive reshaping of our own cognitive capabilities.
Resumo:
Motivation: Intrinsic protein disorder is functionally implicated in numerous biological roles and is, therefore, ubiquitous in proteins from all three kingdoms of life. Determining the disordered regions in proteins presents a challenge for experimental methods and so recently there has been much focus on the development of improved predictive methods. In this article, a novel technique for disorder prediction, called DISOclust, is described, which is based on the analysis of multiple protein fold recognition models. The DISOclust method is rigorously benchmarked against the top.ve methods from the CASP7 experiment. In addition, the optimal consensus of the tested methods is determined and the added value from each method is quantified. Results: The DISOclust method is shown to add the most value to a simple consensus of methods, even in the absence of target sequence homology to known structures. A simple consensus of methods that includes DISOclust can significantly outperform all of the previous individual methods tested.
Resumo:
Motivation: A new method that uses support vector machines (SVMs) to predict protein secondary structure is described and evaluated. The study is designed to develop a reliable prediction method using an alternative technique and to investigate the applicability of SVMs to this type of bioinformatics problem. Methods: Binary SVMs are trained to discriminate between two structural classes. The binary classifiers are combined in several ways to predict multi-class secondary structure. Results: The average three-state prediction accuracy per protein (Q3) is estimated by cross-validation to be 77.07 ± 0.26% with a segment overlap (Sov) score of 73.32 ± 0.39%. The SVM performs similarly to the 'state-of-the-art' PSIPRED prediction method on a non-homologous test set of 121 proteins despite being trained on substantially fewer examples. A simple consensus of the SVM, PSIPRED and PROFsec achieves significantly higher prediction accuracy than the individual methods. Availability: The SVM classifier is available from the authors. Work is in progress to make the method available on-line and to integrate the SVM predictions into the PSIPRED server.
Resumo:
We have recently shown that the majority of allergens can be represented by allergen motifs. This observation prompted us to experimentally investigate the synthesized peptides corresponding to the in silico motifs with regard to potential IgE binding and cross-reactions with allergens. Two motifs were selected as examples to conduct in vitro studies. From the first motif, derived from allergenic MnSOD sequences, the motif stretch of the allergen Asp f 6 was selected and synthesized as a peptide (MnSOD Mot). The corresponding full-length MnSOD was also expressed in Escherichia coli and both were compared for IgE reactivity with sera of patients reacting to the MnSOD of Aspergillus fumigatus or Malassezia sympodialis. For the second motif, the invertebrate tropomyosin sequences were aligned and a motif consensus sequence was expressed as a recombinant protein (Trop Mot). The IgE reactivity of Trop Mot was analyzed in ELISA and compared to that of recombinant tropomyosin from the shrimp Penaeus aztecus (rPen a 1) in ImmunoCAP. MnSOD Mot was weakly recognized by some of the tested sera, suggesting that the IgE binding epitopes of a multimeric globular protein such as MnSOD cannot be fully represented by a motif peptide. In contrast, the motif Trop Mot showed the same IgE reactivity as shrimp full-length tropomyosin, indicating that the major allergenic reactivity of a repetitive structure such as tropomyosin can be covered by a motif peptide. Our results suggest that the motif-generating algorithm may be used for identifying major IgE binding structures of coiled-coil proteins.
Resumo:
In the work [1] we proposed an approach of forming a consensus of experts’ statements in pattern recognition. In this paper, we present a method of aggregating sets of individual statements into a collective one for the case of forecasting of quantitative variable.
Resumo:
* The work was supported by the RFBR under Grant N07-01-00331a.
Resumo:
Background: Statistical analysis of DNA microarray data provides a valuable diagnostic tool for the investigation of genetic components of diseases. To take advantage of the multitude of available data sets and analysis methods, it is desirable to combine both different algorithms and data from different studies. Applying ensemble learning, consensus clustering and cross-study normalization methods for this purpose in an almost fully automated process and linking different analysis modules together under a single interface would simplify many microarray analysis tasks. Results: We present ArrayMining.net, a web-application for microarray analysis that provides easy access to a wide choice of feature selection, clustering, prediction, gene set analysis and cross-study normalization methods. In contrast to other microarray-related web-tools, multiple algorithms and data sets for an analysis task can be combined using ensemble feature selection, ensemble prediction, consensus clustering and cross-platform data integration. By interlinking different analysis tools in a modular fashion, new exploratory routes become available, e.g. ensemble sample classification using features obtained from a gene set analysis and data from multiple studies. The analysis is further simplified by automatic parameter selection mechanisms and linkage to web tools and databases for functional annotation and literature mining. Conclusion: ArrayMining.net is a free web-application for microarray analysis combining a broad choice of algorithms based on ensemble and consensus methods, using automatic parameter selection and integration with annotation databases.