225 resultados para computational biology


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The generation of a correlation matrix from a large set of long gene sequences is a common requirement in many bioinformatics problems such as phylogenetic analysis. The generation is not only computationally intensive but also requires significant memory resources as, typically, few gene sequences can be simultaneously stored in primary memory. The standard practice in such computation is to use frequent input/output (I/O) operations. Therefore, minimizing the number of these operations will yield much faster run-times. This paper develops an approach for the faster and scalable computing of large-size correlation matrices through the full use of available memory and a reduced number of I/O operations. The approach is scalable in the sense that the same algorithms can be executed on different computing platforms with different amounts of memory and can be applied to different problems with different correlation matrix sizes. The significant performance improvement of the approach over the existing approaches is demonstrated through benchmark examples.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In Arabidopsis thaliana (Arabidopsis), DICER-LIKE1 (DCL1) functions together with the double-stranded RNA binding protein (dsRBP), DRB1, to process microRNAs (miRNAs) from their precursor transcripts prior to their transfer to the RNA-induced silencing complex (RISC). miRNA-loaded RISC directs RNA silencing of cognate mRNAs via ARGONAUTE1 (AGO1)-catalyzed cleavage. Short interefering RNAs (siRNAs) are processed from viral-derived or transgene-encoded molecules of doublestranded RNA (dsRNA) by the DCL/dsRBP partnership, DCL4/DRB4, and are also loaded to AGO1-catalyzed RISC for cleavage of complementary mRNAs. Here, we use an artificial miRNA (amiRNA) technology, transiently expressed in Nicotiana benthamiana, to produce a series of amiRNA duplexes with differing intermolecular thermostabilities at the 5′ end of duplex strands. Analyses of amiRNA duplex strand accumulation and target transcript expression revealed that strand selection (amiRNA and amiRNA*) is directed by asymmetric thermostability of the duplex termini. The duplex strand possessing a lower 59 thermostability was preferentially retained by RISC to guide mRNA cleavage of the corresponding target transgene. In addition, analysis of endogenous miRNA duplex strand accumulation in Arabidopsis drb1 and drb2345 mutant plants revealed that DRB1 dictates strand selection, presumably by directional loading of the miRNA duplex onto RISC for passenger strand degradation. Bioinformatic and Northern blot analyses of DCL4/DRB4-dependent small RNAs (miRNAs and siRNAs) revealed that small RNAs produced by this DCL/dsRBP combination do not conform to the same terminal thermostability rules as those governing DCL1/DRB1-processed miRNAs. This suggests that small RNA processing in the DCL1/DRB1-directed miRNA and DCL4/DRB4-directed sRNA biogenesis pathways operates via different mechanisms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

tRNA-derived RNA fragments (tRFs) are 19mer small RNAs that associate with Argonaute (AGO) proteins in humans. However, in plants, it is unknown if tRFs bind with AGO proteins. Here, using public deep sequencing libraries of immunoprecipitated Argonaute proteins (AGO-IP) and bioinformatics approaches, we identified the Arabidopsis thaliana AGO-IP tRFs. Moreover, using three degradome deep sequencing libraries, we identified four putative tRF targets. The expression pattern of tRFs, based on deep sequencing data, was also analyzed under abiotic and biotic stresses. The results obtained here represent a useful starting point for future studies on tRFs in plants. © 2013 Loss-Morais et al.; licensee BioMed Central Ltd.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Post-transcriptional control of gene expression has gone from a curiosity involving a few special genes to a highly diverse and widespread set of processes that is truly pervasive in plant gene expression. Thus, Plant Cell readers interested in almost any aspect of plant gene expression in response to any environmental influence, or in development, are advised to read on. In May 2001, what has become the de facto third biennial Symposium on Post-Transcriptional Control of Gene Expression in Plants was held in Ames, Iowa. The meeting was hosted by the new Plant Sciences Institute of Iowa State University with additional funding from the National Science Foundation and the United States Department of Agriculture. In 1997, the annual University of California-Riverside Plant Physiology Symposium was devoted to this topic. This provided a wake-up call to the plant world, summarized in this journal (Gallie and Bailey-Serres, 1997), that not all gene expression is controlled at the level of transcription. This was expanded upon at a European Molecular Biology Organization Workshop in Leysin, Switzerland, in 1999 (Bailey-Serres et al., 1999). The 3-day meeting in Ames brought together a strong and diverse contingent of plant biologists from four continents. The participants represented an unusually heterogeneous group of disciplines ranging from virology to stress response to computational biology. The research approaches and techniques represented were similarly diverse. Here we discuss a sample of the many fascinating aspects of post-transcriptional control that were presented at this meeting; we apologize to those whose work is not described here.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We present a machine learning model that predicts a structural disruption score from a protein s primary structure. SCHEMA was introduced by Frances Arnold and colleagues as a method for determining putative recombination sites of a protein on the basis of the full (PDB) description of its structure. The present method provides an alternative to SCHEMA that is able to determine the same score from sequence data only. Circumventing the need for resolving the full structure enables the exploration of yet unresolved and even hypothetical sequences for protein design efforts. Deriving the SCHEMA score from a primary structure is achieved using a two step approach: first predicting a secondary structure from the sequence and then predicting the SCHEMA score from the predicted secondary structure. The correlation coefficient for the prediction is 0.88 and indicates the feasibility of replacing SCHEMA with little loss of precision.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Determination of sequence similarity is a central issue in computational biology, a problem addressed primarily through BLAST, an alignment based heuristic which has underpinned much of the analysis and annotation of the genomic era. Despite their success, alignment-based approaches scale poorly with increasing data set size, and are not robust under structural sequence rearrangements. Successive waves of innovation in sequencing technologies – so-called Next Generation Sequencing (NGS) approaches – have led to an explosion in data availability, challenging existing methods and motivating novel approaches to sequence representation and similarity scoring, including adaptation of existing methods from other domains such as information retrieval. In this work, we investigate locality-sensitive hashing of sequences through binary document signatures, applying the method to a bacterial protein classification task. Here, the goal is to predict the gene family to which a given query protein belongs. Experiments carried out on a pair of small but biologically realistic datasets (the full protein repertoires of families of Chlamydia and Staphylococcus aureus genomes respectively) show that a measure of similarity obtained by locality sensitive hashing gives highly accurate results while offering a number of avenues which will lead to substantial performance improvements over BLAST..

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Bayesian networks (BNs) are graphical probabilistic models used for reasoning under uncertainty. These models are becoming increasing popular in a range of fields including ecology, computational biology, medical diagnosis, and forensics. In most of these cases, the BNs are quantified using information from experts, or from user opinions. An interest therefore lies in the way in which multiple opinions can be represented and used in a BN. This paper proposes the use of a measurement error model to combine opinions for use in the quantification of a BN. The multiple opinions are treated as a realisation of measurement error and the model uses the posterior probabilities ascribed to each node in the BN which are computed from the prior information given by each expert. The proposed model addresses the issues associated with current methods of combining opinions such as the absence of a coherent probability model, the lack of the conditional independence structure of the BN being maintained, and the provision of only a point estimate for the consensus. The proposed model is applied an existing Bayesian Network and performed well when compared to existing methods of combining opinions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The reliable response to weak biological signals requires that they be amplified with fidelity. In E. coli, the flagellar motors that control swimming can switch direction in response to very small changes in the concentration of the signaling protein CheY-P, but how this works is not well understood. A recently proposed allosteric model based on cooperative conformational spread in a ring of identical protomers seems promising as it is able to qualitatively reproduce switching, locked state behavior and Hill coefficient values measured for the rotary motor. In this paper we undertook a comprehensive simulation study to analyze the behavior of this model in detail and made predictions on three experimentally observable quantities: switch time distribution, locked state interval distribution, Hill coefficient of the switch response. We parameterized the model using experimental measurements, finding excellent agreement with published data on motor behavior. Analysis of the simulated switching dynamics revealed a mechanism for chemotactic ultrasensitivity, in which cooperativity is indispensable for realizing both coherent switching and effective amplification. These results showed how cells can combine elements of analog and digital control to produce switches that are simultaneously sensitive and reliable. © 2012 Ma et al.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

E. coli does chemotaxis by performing a biased random walk composed of alternating periods of swimming (runs) and reorientations (tumbles). Tumbles are typically modelled as complete directional randomisations but it is known that in wild type E. coli, successive run directions are actually weakly correlated, with a mean directional difference of ∼63°. We recently presented a model of the evolution of chemotactic swimming strategies in bacteria which is able to quantitatively reproduce the emergence of this correlation. The agreement between model and experiments suggests that directional persistence may serve some function, a hypothesis supported by the results of an earlier model. Here we investigate the effect of persistence on chemotactic efficiency, using a spatial Monte Carlo model of bacterial swimming in a gradient, combined with simulations of natural selection based on chemotactic efficiency. A direct search of the parameter space reveals two attractant gradient regimes, (a) a low-gradient regime, in which efficiency is unaffected by directional persistence and (b) a high-gradient regime, in which persistence can improve chemotactic efficiency. The value of the persistence parameter that maximises this effect corresponds very closely with the value observed experimentally. This result is matched by independent simulations of the evolution of directional memory in a population of model bacteria, which also predict the emergence of persistence in high-gradient conditions. The relationship between optimality and persistence in different environments may reflect a universal property of random-walk foraging algorithms, which must strike a compromise between two competing aims: exploration and exploitation. We also present a new graphical way to generally illustrate the evolution of a particular trait in a population, in terms of variations in an evolvable parameter.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Relative abundance data is common in the life sciences, but appreciation that it needs special analysis and interpretation is scarce. Correlation is popular as a statistical measure of pairwise association but should not be used on data that carry only relative information. Using timecourse yeast gene expression data, we show how correlation of relative abundances can lead to conclusions opposite to those drawn from absolute abundances, and that its value changes when different components are included in the analysis. Once all absolute information has been removed, only a subset of those associations will reliably endure in the remaining relative data, specifically, associations where pairs of values behave proportionally across observations. We propose a new statistic φ to describe the strength of proportionality between two variables and demonstrate how it can be straightforwardly used instead of correlation as the basis of familiar analyses and visualization methods.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In vitro studies and mathematical models are now being widely used to study the underlying mechanisms driving the expansion of cell colonies. This can improve our understanding of cancer formation and progression. Although much progress has been made in terms of developing and analysing mathematical models, far less progress has been made in terms of understanding how to estimate model parameters using experimental in vitro image-based data. To address this issue, a new approximate Bayesian computation (ABC) algorithm is proposed to estimate key parameters governing the expansion of melanoma cell (MM127) colonies, including cell diffusivity, D, cell proliferation rate, λ, and cell-to-cell adhesion, q, in two experimental scenarios, namely with and without a chemical treatment to suppress cell proliferation. Even when little prior biological knowledge about the parameters is assumed, all parameters are precisely inferred with a small posterior coefficient of variation, approximately 2–12%. The ABC analyses reveal that the posterior distributions of D and q depend on the experimental elapsed time, whereas the posterior distribution of λ does not. The posterior mean values of D and q are in the ranges 226–268 µm2h−1, 311–351 µm2h−1 and 0.23–0.39, 0.32–0.61 for the experimental periods of 0–24 h and 24–48 h, respectively. Furthermore, we found that the posterior distribution of q also depends on the initial cell density, whereas the posterior distributions of D and λ do not. The ABC approach also enables information from the two experiments to be combined, resulting in greater precision for all estimates of D and λ.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This work addresses fundamental issues in the mathematical modelling of the diffusive motion of particles in biological and physiological settings. New mathematical results are proved and implemented in computer models for the colonisation of the embryonic gut by neural cells and the propagation of electrical waves in the heart, offering new insights into the relationships between structure and function. In particular, the thesis focuses on the use of non-local differential operators of non-integer order to capture the main features of diffusion processes occurring in complex spatial structures characterised by high levels of heterogeneity.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Since we still know very little about stem cells in their natural environment, it is useful to explore their dynamics through modelling and simulation, as well as experimentally. Most models of stem cell systems are based on deterministic differential equations that ignore the natural heterogeneity of stem cell populations. This is not appropriate at the level of individual cells and niches, when randomness is more likely to affect dynamics. In this paper, we introduce a fast stochastic method for simulating a metapopulation of stem cell niche lineages, that is, many sub-populations that together form a heterogeneous metapopulation, over time. By selecting the common limiting timestep, our method ensures that the entire metapopulation is simulated synchronously. This is important, as it allows us to introduce interactions between separate niche lineages, which would otherwise be impossible. We expand our method to enable the coupling of many lineages into niche groups, where differentiated cells are pooled within each niche group. Using this method, we explore the dynamics of the haematopoietic system from a demand control system perspective. We find that coupling together niche lineages allows the organism to regulate blood cell numbers as closely as possible to the homeostatic optimum. Furthermore, coupled lineages respond better than uncoupled ones to random perturbations, here the loss of some myeloid cells. This could imply that it is advantageous for an organism to connect together its niche lineages into groups. Our results suggest that a potential fruitful empirical direction will be to understand how stem cell descendants communicate with the niche and how cancer may arise as a result of a failure of such communication.