980 resultados para Computational Biology


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Background: Large-scale biological jobs on high-performance computing systems require manual intervention if one or more computing cores on which they execute fail. This places not only a cost on the maintenance of the job, but also a cost on the time taken for reinstating the job and the risk of losing data and execution accomplished by the job before it failed. Approaches which can proactively detect computing core failures and take action to relocate the computing core's job onto reliable cores can make a significant step towards automating fault tolerance. Method: This paper describes an experimental investigation into the use of multi-agent approaches for fault tolerance. Two approaches are studied, the first at the job level and the second at the core level. The approaches are investigated for single core failure scenarios that can occur in the execution of parallel reduction algorithms on computer clusters. A third approach is proposed that incorporates multi-agent technology both at the job and core level. Experiments are pursued in the context of genome searching, a popular computational biology application. Result: The key conclusion is that the approaches proposed are feasible for automating fault tolerance in high-performance computing systems with minimal human intervention. In a typical experiment in which the fault tolerance is studied, centralised and decentralised checkpointing approaches on an average add 90% to the actual time for executing the job. On the other hand, in the same experiment the multi-agent approaches add only 10% to the overall execution time

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Understanding how stem and progenitor cells choose between alternative cell fates is a major challenge in developmental biology. Efforts to tackle this problem have been hampered by the scarcity of markers that can be used to predict cell division outcomes. Here we present a computational method, based on algorithmic information theory, to analyze dynamic features of living cells over time. Using this method, we asked whether rat retinal progenitor cells (RPCs) display characteristic phenotypes before undergoing mitosis that could foretell their fate. We predicted whether RPCs will undergo a self-renewing or terminal division with 99% accuracy, or whether they will produce two photoreceptors or another combination of offspring with 87% accuracy. Our implementation can segment, track and generate predictions for 40 cells simultaneously on a standard computer at 5 min per frame. This method could be used to isolate cell populations with specific developmental potential, enabling previously impossible investigations.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We present a data-driven mathematical model of a key initiating step in platelet activation, a central process in the prevention of bleeding following Injury. In vascular disease, this process is activated inappropriately and causes thrombosis, heart attacks and stroke. The collagen receptor GPVI is the primary trigger for platelet activation at sites of injury. Understanding the complex molecular mechanisms initiated by this receptor is important for development of more effective antithrombotic medicines. In this work we developed a series of nonlinear ordinary differential equation models that are direct representations of biological hypotheses surrounding the initial steps in GPVI-stimulated signal transduction. At each stage model simulations were compared to our own quantitative, high-temporal experimental data that guides further experimental design, data collection and model refinement. Much is known about the linear forward reactions within platelet signalling pathways but knowledge of the roles of putative reverse reactions are poorly understood. An initial model, that includes a simple constitutively active phosphatase, was unable to explain experimental data. Model revisions, incorporating a complex pathway of interactions (and specifically the phosphatase TULA-2), provided a good description of the experimental data both based on observations of phosphorylation in samples from one donor and in those of a wider population. Our model was used to investigate the levels of proteins involved in regulating the pathway and the effect of low GPVI levels that have been associated with disease. Results indicate a clear separation in healthy and GPVI deficient states in respect of the signalling cascade dynamics associated with Syk tyrosine phosphorylation and activation. Our approach reveals the central importance of this negative feedback pathway that results in the temporal regulation of a specific class of protein tyrosine phosphatases in controlling the rate, and therefore extent, of GPVI-stimulated platelet activation.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Mathematical models, as instruments for understanding the workings of nature, are a traditional tool of physics, but they also play an ever increasing role in biology - in the description of fundamental processes as well as that of complex systems. In this review, the authors discuss two examples of the application of group theoretical methods, which constitute the mathematical discipline for a quantitative description of the idea of symmetry, to genetics. The first one appears, in the form of a pseudo-orthogonal (Lorentz like) symmetry, in the stochastic modelling of what may be regarded as the simplest possible example of a genetic network and, hopefully, a building block for more complicated ones: a single self-interacting or externally regulated gene with only two possible states: ` on` and ` off`. The second is the algebraic approach to the evolution of the genetic code, according to which the current code results from a dynamical symmetry breaking process, starting out from an initial state of complete symmetry and ending in the presently observed final state of low symmetry. In both cases, symmetry plays a decisive role: in the first, it is a characteristic feature of the dynamics of the gene switch and its decay to equilibrium, whereas in the second, it provides the guidelines for the evolution of the coding rules.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper describes a data mining environment for knowledge discovery in bioinformatics applications. The system has a generic kernel that implements the mining functions to be applied to input primary databases, with a warehouse architecture, of biomedical information. Both supervised and unsupervised classification can be implemented within the kernel and applied to data extracted from the primary database, with the results being suitably stored in a complex object database for knowledge discovery. The kernel also includes a specific high-performance library that allows designing and applying the mining functions in parallel machines. The experimental results obtained by the application of the kernel functions are reported. © 2003 Elsevier Ltd. All rights reserved.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Motivation An actual issue of great interest, both under a theoretical and an applicative perspective, is the analysis of biological sequences for disclosing the information that they encode. The development of new technologies for genome sequencing in the last years, opened new fundamental problems since huge amounts of biological data still deserve an interpretation. Indeed, the sequencing is only the first step of the genome annotation process that consists in the assignment of biological information to each sequence. Hence given the large amount of available data, in silico methods became useful and necessary in order to extract relevant information from sequences. The availability of data from Genome Projects gave rise to new strategies for tackling the basic problems of computational biology such as the determination of the tridimensional structures of proteins, their biological function and their reciprocal interactions. Results The aim of this work has been the implementation of predictive methods that allow the extraction of information on the properties of genomes and proteins starting from the nucleotide and aminoacidic sequences, by taking advantage of the information provided by the comparison of the genome sequences from different species. In the first part of the work a comprehensive large scale genome comparison of 599 organisms is described. 2,6 million of sequences coming from 551 prokaryotic and 48 eukaryotic genomes were aligned and clustered on the basis of their sequence identity. This procedure led to the identification of classes of proteins that are peculiar to the different groups of organisms. Moreover the adopted similarity threshold produced clusters that are homogeneous on the structural point of view and that can be used for structural annotation of uncharacterized sequences. The second part of the work focuses on the characterization of thermostable proteins and on the development of tools able to predict the thermostability of a protein starting from its sequence. By means of Principal Component Analysis the codon composition of a non redundant database comprising 116 prokaryotic genomes has been analyzed and it has been showed that a cross genomic approach can allow the extraction of common determinants of thermostability at the genome level, leading to an overall accuracy in discriminating thermophilic coding sequences equal to 95%. This result outperform those obtained in previous studies. Moreover, we investigated the effect of multiple mutations on protein thermostability. This issue is of great importance in the field of protein engineering, since thermostable proteins are generally more suitable than their mesostable counterparts in technological applications. A Support Vector Machine based method has been trained to predict if a set of mutations can enhance the thermostability of a given protein sequence. The developed predictor achieves 88% accuracy.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The vast majority of known proteins have not yet been experimentally characterized and little is known about their function. The design and implementation of computational tools can provide insight into the function of proteins based on their sequence, their structure, their evolutionary history and their association with other proteins. Knowledge of the three-dimensional (3D) structure of a protein can lead to a deep understanding of its mode of action and interaction, but currently the structures of <1% of sequences have been experimentally solved. For this reason, it became urgent to develop new methods that are able to computationally extract relevant information from protein sequence and structure. The starting point of my work has been the study of the properties of contacts between protein residues, since they constrain protein folding and characterize different protein structures. Prediction of residue contacts in proteins is an interesting problem whose solution may be useful in protein folding recognition and de novo design. The prediction of these contacts requires the study of the protein inter-residue distances related to the specific type of amino acid pair that are encoded in the so-called contact map. An interesting new way of analyzing those structures came out when network studies were introduced, with pivotal papers demonstrating that protein contact networks also exhibit small-world behavior. In order to highlight constraints for the prediction of protein contact maps and for applications in the field of protein structure prediction and/or reconstruction from experimentally determined contact maps, I studied to which extent the characteristic path length and clustering coefficient of the protein contacts network are values that reveal characteristic features of protein contact maps. Provided that residue contacts are known for a protein sequence, the major features of its 3D structure could be deduced by combining this knowledge with correctly predicted motifs of secondary structure. In the second part of my work I focused on a particular protein structural motif, the coiled-coil, known to mediate a variety of fundamental biological interactions. Coiled-coils are found in a variety of structural forms and in a wide range of proteins including, for example, small units such as leucine zippers that drive the dimerization of many transcription factors or more complex structures such as the family of viral proteins responsible for virus-host membrane fusion. The coiled-coil structural motif is estimated to account for 5-10% of the protein sequences in the various genomes. Given their biological importance, in my work I introduced a Hidden Markov Model (HMM) that exploits the evolutionary information derived from multiple sequence alignments, to predict coiled-coil regions and to discriminate coiled-coil sequences. The results indicate that the new HMM outperforms all the existing programs and can be adopted for the coiled-coil prediction and for large-scale genome annotation. Genome annotation is a key issue in modern computational biology, being the starting point towards the understanding of the complex processes involved in biological networks. The rapid growth in the number of protein sequences and structures available poses new fundamental problems that still deserve an interpretation. Nevertheless, these data are at the basis of the design of new strategies for tackling problems such as the prediction of protein structure and function. Experimental determination of the functions of all these proteins would be a hugely time-consuming and costly task and, in most instances, has not been carried out. As an example, currently, approximately only 20% of annotated proteins in the Homo sapiens genome have been experimentally characterized. A commonly adopted procedure for annotating protein sequences relies on the "inheritance through homology" based on the notion that similar sequences share similar functions and structures. This procedure consists in the assignment of sequences to a specific group of functionally related sequences which had been grouped through clustering techniques. The clustering procedure is based on suitable similarity rules, since predicting protein structure and function from sequence largely depends on the value of sequence identity. However, additional levels of complexity are due to multi-domain proteins, to proteins that share common domains but that do not necessarily share the same function, to the finding that different combinations of shared domains can lead to different biological roles. In the last part of this study I developed and validate a system that contributes to sequence annotation by taking advantage of a validated transfer through inheritance procedure of the molecular functions and of the structural templates. After a cross-genome comparison with the BLAST program, clusters were built on the basis of two stringent constraints on sequence identity and coverage of the alignment. The adopted measure explicity answers to the problem of multi-domain proteins annotation and allows a fine grain division of the whole set of proteomes used, that ensures cluster homogeneity in terms of sequence length. A high level of coverage of structure templates on the length of protein sequences within clusters ensures that multi-domain proteins when present can be templates for sequences of similar length. This annotation procedure includes the possibility of reliably transferring statistically validated functions and structures to sequences considering information available in the present data bases of molecular functions and structures.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Mechanisms that allow pathogens to colonize the host are not the product of isolated genes, but instead emerge from the concerted operation of regulatory networks. Therefore, identifying components and the systemic behavior of networks is necessary to a better understanding of gene regulation and pathogenesis. To this end, I have developed systems biology approaches to study transcriptional and post-transcriptional gene regulation in bacteria, with an emphasis in the human pathogen Mycobacterium tuberculosis (Mtb). First, I developed a network response method to identify parts of the Mtb global transcriptional regulatory network utilized by the pathogen to counteract phagosomal stresses and survive within resting macrophages. As a result, the method unveiled transcriptional regulators and associated regulons utilized by Mtb to establish a successful infection of macrophages throughout the first 14 days of infection. Additionally, this network-based analysis identified the production of Fe-S proteins coupled to lipid metabolism through the alkane hydroxylase complex as a possible strategy employed by Mtb to survive in the host. Second, I developed a network inference method to infer the small non-coding RNA (sRNA) regulatory network in Mtb. The method identifies sRNA-mRNA interactions by integrating a priori knowledge of possible binding sites with structure-driven identification of binding sites. The reconstructed network was useful to predict functional roles for the multitude of sRNAs recently discovered in the pathogen, being that several sRNAs were postulated to be involved in virulence-related processes. Finally, I applied a combined experimental and computational approach to study post-transcriptional repression mediated by small non-coding RNAs in bacteria. Specifically, a probabilistic ranking methodology termed rank-conciliation was developed to infer sRNA-mRNA interactions based on multiple types of data. The method was shown to improve target prediction in Escherichia coli, and therefore is useful to prioritize candidate targets for experimental validation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Chronic wounds are a significant socioeconomic problem for governments worldwide. Approximately 15% of people who suffer from diabetes will experience a lower-limb ulcer at some stage of their lives, and 24% of these wounds will ultimately result in amputation of the lower limb. Hyperbaric Oxygen Therapy (HBOT) has been shown to aid the healing of chronic wounds; however, the causal reasons for the improved healing remain unclear and hence current HBOT protocols remain empirical. Here we develop a three-species mathematical model of wound healing that is used to simulate the application of hyperbaric oxygen therapy in the treatment of wounds. Based on our modelling, we predict that intermittent HBOT will assist chronic wound healing while normobaric oxygen is ineffective in treating such wounds. Furthermore, treatment should continue until healing is complete, and HBOT will not stimulate healing under all circumstances, leading us to conclude that finding the right protocol for an individual patient is crucial if HBOT is to be effective. We provide constraints that depend on the model parameters for the range of HBOT protocols that will stimulate healing. More specifically, we predict that patients with a poor arterial supply of oxygen, high consumption of oxygen by the wound tissue, chronically hypoxic wounds, and/or a dysfunctional endothelial cell response to oxygen are at risk of nonresponsiveness to HBOT. The work of this paper can, in some way, highlight which patients are most likely to respond well to HBOT (for example, those with a good arterial supply), and thus has the potential to assist in improving both the success rate and hence the costeffectiveness of this therapy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Computational biology increasingly demands the sharing of sophisticated data and annotations between research groups. Web 2.0 style sharing and publication requires that biological systems be described in well-defined, yet flexible and extensible formats which enhance exchange and re-use. In contrast to many of the standards for exchange in the genomic sciences, descriptions of biological sequences show a great diversity in format and function, impeding the definition and exchange of sequence patterns. In this presentation, we introduce BioPatML, an XML-based pattern description language that supports a wide range of patterns and allows the construction of complex, hierarchically structured patterns and pattern libraries. BioPatML unifies the diversity of current pattern description languages and fills a gap in the set of XML-based description languages for biological systems. We discuss the structure and elements of the language, and demonstrate its advantages on a series of applications, showing lightweight integration between the BioPatML parser and search engine, and the SilverGene genome browser. We conclude by describing our site to enable large scale pattern sharing, and our efforts to seed this repository.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

To successfully navigate their habitats, many mammals use a combination of two mechanisms, path integration and calibration using landmarks, which together enable them to estimate their location and orientation, or pose. In large natural environments, both these mechanisms are characterized by uncertainty: the path integration process is subject to the accumulation of error, while landmark calibration is limited by perceptual ambiguity. It remains unclear how animals form coherent spatial representations in the presence of such uncertainty. Navigation research using robots has determined that uncertainty can be effectively addressed by maintaining multiple probabilistic estimates of a robot's pose. Here we show how conjunctive grid cells in dorsocaudal medial entorhinal cortex (dMEC) may maintain multiple estimates of pose using a brain-based robot navigation system known as RatSLAM. Based both on rodent spatially-responsive cells and functional engineering principles, the cells at the core of the RatSLAM computational model have similar characteristics to rodent grid cells, which we demonstrate by replicating the seminal Moser experiments. We apply the RatSLAM model to a new experimental paradigm designed to examine the responses of a robot or animal in the presence of perceptual ambiguity. Our computational approach enables us to observe short-term population coding of multiple location hypotheses, a phenomenon which would not be easily observable in rodent recordings. We present behavioral and neural evidence demonstrating that the conjunctive grid cells maintain and propagate multiple estimates of pose, enabling the correct pose estimate to be resolved over time even without uniquely identifying cues. While recent research has focused on the grid-like firing characteristics, accuracy and representational capacity of grid cells, our results identify a possible critical and unique role for conjunctive grid cells in filtering sensory uncertainty. We anticipate our study to be a starting point for animal experiments that test navigation in perceptually ambiguous environments.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In silico experimental modeling of cancer involves combining findings from biological literature with computer-based models of biological systems in order to conduct investigations of hypotheses entirely in the computer laboratory. In this paper, we discuss the use of in silico modeling as a precursor to traditional clinical and laboratory research, allowing researchers to refine their experimental programs with an aim to reducing costs and increasing research efficiency. We explain the methodology of in silico experimental trials before providing an example of in silico modeling from the biomathematical literature with a view to promoting more widespread use and understanding of this research strategy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Discrete stochastic simulations are a powerful tool for understanding the dynamics of chemical kinetics when there are small-to-moderate numbers of certain molecular species. In this paper we introduce delays into the stochastic simulation algorithm, thus mimicking delays associated with transcription and translation. We then show that this process may well explain more faithfully than continuous deterministic models the observed sustained oscillations in expression levels of hes1 mRNA and Hes1 protein.