903 resultados para 080110 Simulation and Modelling
Resumo:
Motivation: Targeting peptides direct nascent proteins to their specific subcellular compartment. Knowledge of targeting signals enables informed drug design and reliable annotation of gene products. However, due to the low similarity of such sequences and the dynamical nature of the sorting process, the computational prediction of subcellular localization of proteins is challenging. Results: We contrast the use of feed forward models as employed by the popular TargetP/SignalP predictors with a sequence-biased recurrent network model. The models are evaluated in terms of performance at the residue level and at the sequence level, and demonstrate that recurrent networks improve the overall prediction performance. Compared to the original results reported for TargetP, an ensemble of the tested models increases the accuracy by 6 and 5% on non-plant and plant data, respectively.
Resumo:
Motivation: Conformational flexibility is essential to the function of many proteins, e.g. catalytic activity. To assist efforts in determining and exploring the functional properties of a protein, it is desirable to automatically identify regions that are prone to undergo conformational changes. It was recently shown that a probabilistic predictor of continuum secondary structure is more accurate than categorical predictors for structurally ambivalent sequence regions, suggesting that such models are suited to characterize protein flexibility. Results: We develop a computational method for identifying regions that are prone to conformational change directly from the amino acid sequence. The method uses the entropy of the probabilistic output of an 8-class continuum secondary structure predictor. Results for 171 unique amino acid sequences with well-characterized variable structure (identified in the 'Macromolecular movements database') indicate that the method is highly sensitive at identifying flexible protein regions, but false positives remain a problem. The method can be used to explore conformational flexibility of proteins (including hypothetical or synthetic ones) whose structure is yet to be determined experimentally.
Resumo:
Background: The structure of proteins may change as a result of the inherent flexibility of some protein regions. We develop and explore probabilistic machine learning methods for predicting a continuum secondary structure, i.e. assigning probabilities to the conformational states of a residue. We train our methods using data derived from high-quality NMR models. Results: Several probabilistic models not only successfully estimate the continuum secondary structure, but also provide a categorical output on par with models directly trained on categorical data. Importantly, models trained on the continuum secondary structure are also better than their categorical counterparts at identifying the conformational state for structurally ambivalent residues. Conclusion: Cascaded probabilistic neural networks trained on the continuum secondary structure exhibit better accuracy in structurally ambivalent regions of proteins, while sustaining an overall classification accuracy on par with standard, categorical prediction methods.
Resumo:
Systems biology is based on computational modelling and simulation of large networks of interacting components. Models may be intended to capture processes, mechanisms, components and interactions at different levels of fidelity. Input data are often large and geographically disperse, and may require the computation to be moved to the data, not vice versa. In addition, complex system-level problems require collaboration across institutions and disciplines. Grid computing can offer robust, scaleable solutions for distributed data, compute and expertise. We illustrate some of the range of computational and data requirements in systems biology with three case studies: one requiring large computation but small data (orthologue mapping in comparative genomics), a second involving complex terabyte data (the Visible Cell project) and a third that is both computationally and data-intensive (simulations at multiple temporal and spatial scales). Authentication, authorisation and audit systems are currently not well scalable and may present bottlenecks for distributed collaboration particularly where outcomes may be commercialised. Challenges remain in providing lightweight standards to facilitate the penetration of robust, scalable grid-type computing into diverse user communities to meet the evolving demands of systems biology.
Resumo:
Boolean models of genetic regulatory networks (GRNs) have been shown to exhibit many of the characteristic dynamics of real GRNs, with gene expression patterns settling to point attractors or limit cycles, or displaying chaotic behaviour, depending upon the connectivity of the network and the relative proportions of excitatory and inhibitory interactions. This range of behaviours is only apparent, however, when the nodes of the GRN are updated synchronously, a biologically implausible state of affairs. In this paper we demonstrate that evolution can produce GRNs with interesting dynamics under an asynchronous update scheme. We use an Artificial Genome to generate networks which exhibit limit cycle dynamics when updated synchronously, but collapse to a point attractor when updated asynchronously. Using a hill climbing algorithm the networks are then evolved using a fitness function which rewards patterns of gene expression which revisit as many previously seen states as possible. The final networks exhibit “fuzzy limit cycle” dynamics when updated asynchronously.