973 resultados para Computational biology
Resumo:
The Cancer Genome Project intends to search every human gene for cancer-related mutations. Its first success is the discovery of such mutations in the BRAF gene.
Resumo:
-
Resumo:
Experimental and theoretical studies have shown the importance of stochastic processes in genetic regulatory networks and cellular processes. Cellular networks and genetic circuits often involve small numbers of key proteins such as transcriptional factors and signaling proteins. In recent years stochastic models have been used successfully for studying noise in biological pathways, and stochastic modelling of biological systems has become a very important research field in computational biology. One of the challenge problems in this field is the reduction of the huge computing time in stochastic simulations. Based on the system of the mitogen-activated protein kinase cascade that is activated by epidermal growth factor, this work give a parallel implementation by using OpenMP and parallelism across the simulation. Special attention is paid to the independence of the generated random numbers in parallel computing, that is a key criterion for the success of stochastic simulations. Numerical results indicate that parallel computers can be used as an efficient tool for simulating the dynamics of large-scale genetic regulatory networks and cellular processes
Resumo:
Spatial navigation requires the processing of complex, disparate and often ambiguous sensory data. The neurocomputations underpinning this vital ability remain poorly understood. Controversy remains as to whether multimodal sensory information must be combined into a unified representation, consistent with Tolman's "cognitive map", or whether differential activation of independent navigation modules suffice to explain observed navigation behaviour. Here we demonstrate that key neural correlates of spatial navigation in darkness cannot be explained if the path integration system acted independently of boundary (landmark) information. In vivo recordings demonstrate that the rodent head direction (HD) system becomes unstable within three minutes without vision. In contrast, rodents maintain stable place fields and grid fields for over half an hour without vision. Using a simple HD error model, we show analytically that idiothetic path integration (iPI) alone cannot be used to maintain any stable place representation beyond two to three minutes. We then use a measure of place stability based on information theoretic principles to prove that featureless boundaries alone cannot be used to improve localization above chance level. Having shown that neither iPI nor boundaries alone are sufficient, we then address the question of whether their combination is sufficient and - we conjecture - necessary to maintain place stability for prolonged periods without vision. We addressed this question in simulations and robot experiments using a navigation model comprising of a particle filter and boundary map. The model replicates published experimental results on place field and grid field stability without vision, and makes testable predictions including place field splitting and grid field rescaling if the true arena geometry differs from the acquired boundary map. We discuss our findings in light of current theories of animal navigation and neuronal computation, and elaborate on their implications and significance for the design, analysis and interpretation of experiments.
Resumo:
This paper gives a review of recent progress in the design of numerical methods for computing the trajectories (sample paths) of solutions to stochastic differential equations. We give a brief survey of the area focusing on a number of application areas where approximations to strong solutions are important, with a particular focus on computational biology applications, and give the necessary analytical tools for understanding some of the important concepts associated with stochastic processes. We present the stochastic Taylor series expansion as the fundamental mechanism for constructing effective numerical methods, give general results that relate local and global order of convergence and mention the Magnus expansion as a mechanism for designing methods that preserve the underlying structure of the problem. We also present various classes of explicit and implicit methods for strong solutions, based on the underlying structure of the problem. Finally, we discuss implementation issues relating to maintaining the Brownian path, efficient simulation of stochastic integrals and variable-step-size implementations based on various types of control.
Resumo:
The generation of a correlation matrix from a large set of long gene sequences is a common requirement in many bioinformatics problems such as phylogenetic analysis. The generation is not only computationally intensive but also requires significant memory resources as, typically, few gene sequences can be simultaneously stored in primary memory. The standard practice in such computation is to use frequent input/output (I/O) operations. Therefore, minimizing the number of these operations will yield much faster run-times. This paper develops an approach for the faster and scalable computing of large-size correlation matrices through the full use of available memory and a reduced number of I/O operations. The approach is scalable in the sense that the same algorithms can be executed on different computing platforms with different amounts of memory and can be applied to different problems with different correlation matrix sizes. The significant performance improvement of the approach over the existing approaches is demonstrated through benchmark examples.
Resumo:
In Arabidopsis thaliana (Arabidopsis), DICER-LIKE1 (DCL1) functions together with the double-stranded RNA binding protein (dsRBP), DRB1, to process microRNAs (miRNAs) from their precursor transcripts prior to their transfer to the RNA-induced silencing complex (RISC). miRNA-loaded RISC directs RNA silencing of cognate mRNAs via ARGONAUTE1 (AGO1)-catalyzed cleavage. Short interefering RNAs (siRNAs) are processed from viral-derived or transgene-encoded molecules of doublestranded RNA (dsRNA) by the DCL/dsRBP partnership, DCL4/DRB4, and are also loaded to AGO1-catalyzed RISC for cleavage of complementary mRNAs. Here, we use an artificial miRNA (amiRNA) technology, transiently expressed in Nicotiana benthamiana, to produce a series of amiRNA duplexes with differing intermolecular thermostabilities at the 5′ end of duplex strands. Analyses of amiRNA duplex strand accumulation and target transcript expression revealed that strand selection (amiRNA and amiRNA*) is directed by asymmetric thermostability of the duplex termini. The duplex strand possessing a lower 59 thermostability was preferentially retained by RISC to guide mRNA cleavage of the corresponding target transgene. In addition, analysis of endogenous miRNA duplex strand accumulation in Arabidopsis drb1 and drb2345 mutant plants revealed that DRB1 dictates strand selection, presumably by directional loading of the miRNA duplex onto RISC for passenger strand degradation. Bioinformatic and Northern blot analyses of DCL4/DRB4-dependent small RNAs (miRNAs and siRNAs) revealed that small RNAs produced by this DCL/dsRBP combination do not conform to the same terminal thermostability rules as those governing DCL1/DRB1-processed miRNAs. This suggests that small RNA processing in the DCL1/DRB1-directed miRNA and DCL4/DRB4-directed sRNA biogenesis pathways operates via different mechanisms.
Resumo:
tRNA-derived RNA fragments (tRFs) are 19mer small RNAs that associate with Argonaute (AGO) proteins in humans. However, in plants, it is unknown if tRFs bind with AGO proteins. Here, using public deep sequencing libraries of immunoprecipitated Argonaute proteins (AGO-IP) and bioinformatics approaches, we identified the Arabidopsis thaliana AGO-IP tRFs. Moreover, using three degradome deep sequencing libraries, we identified four putative tRF targets. The expression pattern of tRFs, based on deep sequencing data, was also analyzed under abiotic and biotic stresses. The results obtained here represent a useful starting point for future studies on tRFs in plants. © 2013 Loss-Morais et al.; licensee BioMed Central Ltd.
Resumo:
Post-transcriptional control of gene expression has gone from a curiosity involving a few special genes to a highly diverse and widespread set of processes that is truly pervasive in plant gene expression. Thus, Plant Cell readers interested in almost any aspect of plant gene expression in response to any environmental influence, or in development, are advised to read on. In May 2001, what has become the de facto third biennial Symposium on Post-Transcriptional Control of Gene Expression in Plants was held in Ames, Iowa. The meeting was hosted by the new Plant Sciences Institute of Iowa State University with additional funding from the National Science Foundation and the United States Department of Agriculture. In 1997, the annual University of California-Riverside Plant Physiology Symposium was devoted to this topic. This provided a wake-up call to the plant world, summarized in this journal (Gallie and Bailey-Serres, 1997), that not all gene expression is controlled at the level of transcription. This was expanded upon at a European Molecular Biology Organization Workshop in Leysin, Switzerland, in 1999 (Bailey-Serres et al., 1999). The 3-day meeting in Ames brought together a strong and diverse contingent of plant biologists from four continents. The participants represented an unusually heterogeneous group of disciplines ranging from virology to stress response to computational biology. The research approaches and techniques represented were similarly diverse. Here we discuss a sample of the many fascinating aspects of post-transcriptional control that were presented at this meeting; we apologize to those whose work is not described here.
Resumo:
We present a machine learning model that predicts a structural disruption score from a protein s primary structure. SCHEMA was introduced by Frances Arnold and colleagues as a method for determining putative recombination sites of a protein on the basis of the full (PDB) description of its structure. The present method provides an alternative to SCHEMA that is able to determine the same score from sequence data only. Circumventing the need for resolving the full structure enables the exploration of yet unresolved and even hypothetical sequences for protein design efforts. Deriving the SCHEMA score from a primary structure is achieved using a two step approach: first predicting a secondary structure from the sequence and then predicting the SCHEMA score from the predicted secondary structure. The correlation coefficient for the prediction is 0.88 and indicates the feasibility of replacing SCHEMA with little loss of precision.
Resumo:
Determination of sequence similarity is a central issue in computational biology, a problem addressed primarily through BLAST, an alignment based heuristic which has underpinned much of the analysis and annotation of the genomic era. Despite their success, alignment-based approaches scale poorly with increasing data set size, and are not robust under structural sequence rearrangements. Successive waves of innovation in sequencing technologies – so-called Next Generation Sequencing (NGS) approaches – have led to an explosion in data availability, challenging existing methods and motivating novel approaches to sequence representation and similarity scoring, including adaptation of existing methods from other domains such as information retrieval. In this work, we investigate locality-sensitive hashing of sequences through binary document signatures, applying the method to a bacterial protein classification task. Here, the goal is to predict the gene family to which a given query protein belongs. Experiments carried out on a pair of small but biologically realistic datasets (the full protein repertoires of families of Chlamydia and Staphylococcus aureus genomes respectively) show that a measure of similarity obtained by locality sensitive hashing gives highly accurate results while offering a number of avenues which will lead to substantial performance improvements over BLAST..
Resumo:
Bayesian networks (BNs) are graphical probabilistic models used for reasoning under uncertainty. These models are becoming increasing popular in a range of fields including ecology, computational biology, medical diagnosis, and forensics. In most of these cases, the BNs are quantified using information from experts, or from user opinions. An interest therefore lies in the way in which multiple opinions can be represented and used in a BN. This paper proposes the use of a measurement error model to combine opinions for use in the quantification of a BN. The multiple opinions are treated as a realisation of measurement error and the model uses the posterior probabilities ascribed to each node in the BN which are computed from the prior information given by each expert. The proposed model addresses the issues associated with current methods of combining opinions such as the absence of a coherent probability model, the lack of the conditional independence structure of the BN being maintained, and the provision of only a point estimate for the consensus. The proposed model is applied an existing Bayesian Network and performed well when compared to existing methods of combining opinions.