17 resultados para computational biology
em University of Queensland eSpace - Australia
Resumo:
Cyclic peptides containing oxazole and thiazole heterocycles have been examined for their capacity to be used as scaffolds in larger, more complex, protein-like structures. Both the macrocyclic scaffolds and the supramolecular structures derived therefrom have been visualised by molecular modelling techniques. These molecules are too symmetrical to examine structurally by NMR spectroscopy. The cyclic hexapeptide ([Aaa-Thz](3), [Aaa-Oxz](3)) and cyclic octapeptide ([Aaa-Thz](4), [Aaa-Oxz](4)) analogues are composed of dipeptide surrogates (Aaa: amino acid, Thz: thiazole, Oxz: oxazole) derived from intramolecular condensation of cysteine or serine/threonine side chains in dipeptides like Aaa-Cys, Aaa-Ser and Aaa-Thr. The five-membered heterocyclic rings, like thiazole, oxazole and reduced analogues like thiazoline, thiazolidine and oxazoline have profound influences on the structures and bioactivities of cyclic peptides derived therefrom. This work suggests that such constrained cyclic peptides can be used as scaffolds to create a range of novel protein-like supramolecular structures (e.g. cylinders, troughs, cones, multi-loop structures, helix bundles) that are comparable in size, shape and composition to bioactive surfaces of proteins. They may therefore represent interesting starting points for the design of novel artificial proteins and artificial enzymes. (C) 2002 Elsevier Science Inc. All rights reserved.
Resumo:
A recent development of the Markov chain Monte Carlo (MCMC) technique is the emergence of MCMC samplers that allow transitions between different models. Such samplers make possible a range of computational tasks involving models, including model selection, model evaluation, model averaging and hypothesis testing. An example of this type of sampler is the reversible jump MCMC sampler, which is a generalization of the Metropolis-Hastings algorithm. Here, we present a new MCMC sampler of this type. The new sampler is a generalization of the Gibbs sampler, but somewhat surprisingly, it also turns out to encompass as particular cases all of the well-known MCMC samplers, including those of Metropolis, Barker, and Hastings. Moreover, the new sampler generalizes the reversible jump MCMC. It therefore appears to be a very general framework for MCMC sampling. This paper describes the new sampler and illustrates its use in three applications in Computational Biology, specifically determination of consensus sequences, phylogenetic inference and delineation of isochores via multiple change-point analysis.
Resumo:
Cluster analysis via a finite mixture model approach is considered. With this approach to clustering, the data can be partitioned into a specified number of clusters g by first fitting a mixture model with g components. An outright clustering of the data is then obtained by assigning an observation to the component to which it has the highest estimated posterior probability of belonging; that is, the ith cluster consists of those observations assigned to the ith component (i = 1,..., g). The focus is on the use of mixtures of normal components for the cluster analysis of data that can be regarded as being continuous. But attention is also given to the case of mixed data, where the observations consist of both continuous and discrete variables.
Resumo:
We present two methods of estimating the trend, seasonality and noise in time series of coronary heart disease events. In contrast to previous work we use a non-linear trend, allow multiple seasonal components, and carefully examine the residuals from the fitted model. We show the importance of estimating these three aspects of the observed data to aid insight of the underlying process, although our major focus is on the seasonal components. For one method we allow the seasonal effects to vary over time and show how this helps the understanding of the association between coronary heart disease and varying temperature patterns. Copyright (C) 2004 John Wiley Sons, Ltd.
Resumo:
Bistability and switching are two important aspects of the genetic regulatory network of phage. Positive and negative feedbacks are key regulatory mechanisms in this network. By the introduction of threshold values, the developmental pathway of A phage is divided into different stages. If the protein level reaches a threshold value, positive or negative feedback will be effective and regulate the process of development. Using this regulatory mechanism, we present a quantitative model to realize bistability and switching of phage based on experimental data. This model gives descriptions of decisive mechanisms for different pathways in induction. A stochastic model is also introduced for describing statistical properties of switching in induction. A stochastic degradation rate is used to represent intrinsic noise in induction for switching the system from the lysogenic pathway to the lysis pathway. The approach in this paper represents an attempt to describe the regulatory mechanism in genetic regulatory network under the influence of intrinsic noise in the framework of continuous models. (C) 2003 Elsevier Ltd. All rights reserved.
Resumo:
This paper gives a review of recent progress in the design of numerical methods for computing the trajectories (sample paths) of solutions to stochastic differential equations. We give a brief survey of the area focusing on a number of application areas where approximations to strong solutions are important, with a particular focus on computational biology applications, and give the necessary analytical tools for understanding some of the important concepts associated with stochastic processes. We present the stochastic Taylor series expansion as the fundamental mechanism for constructing effective numerical methods, give general results that relate local and global order of convergence and mention the Magnus expansion as a mechanism for designing methods that preserve the underlying structure of the problem. We also present various classes of explicit and implicit methods for strong solutions, based on the underlying structure of the problem. Finally, we discuss implementation issues relating to maintaining the Brownian path, efficient simulation of stochastic integrals and variable-step-size implementations based on various types of control.
Resumo:
In large epidemiological studies missing data can be a problem, especially if information is sought on a sensitive topic or when a composite measure is calculated from several variables each affected by missing values. Multiple imputation is the method of choice for 'filling in' missing data based on associations among variables. Using an example about body mass index from the Australian Longitudinal Study on Women's Health, we identify a subset of variables that are particularly useful for imputing values for the target variables. Then we illustrate two uses of multiple imputation. The first is to examine and correct for bias when data are not missing completely at random. The second is to impute missing values for an important covariate; in this case omission from the imputation process of variables to be used in the analysis may introduce bias. We conclude with several recommendations for handling issues of missing data. Copyright (C) 2004 John Wiley Sons, Ltd.
Resumo:
The use of a fully parametric Bayesian method for analysing single patient trials based on the notion of treatment 'preference' is described. This Bayesian hierarchical modelling approach allows for full parameter uncertainty, use of prior information and the modelling of individual and patient sub-group structures. It provides updated probabilistic results for individual patients, and groups of patients with the same medical condition, as they are sequentially enrolled into individualized trials using the same medication alternatives. Two clinically interpretable criteria for determining a patient's response are detailed and illustrated using data from a previously published paper under two different prior information scenarios. Copyright (C) 2005 John Wiley & Sons, Ltd.
Resumo:
Motivation: An important problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. We provide a straightforward and easily implemented method for estimating the posterior probability that an individual gene is null. The problem can be expressed in a two-component mixture framework, using an empirical Bayes approach. Current methods of implementing this approach either have some limitations due to the minimal assumptions made or with more specific assumptions are computationally intensive. Results: By converting to a z-score the value of the test statistic used to test the significance of each gene, we propose a simple two-component normal mixture that models adequately the distribution of this score. The usefulness of our approach is demonstrated on three real datasets.
Resumo:
Motivation: The clustering of gene profiles across some experimental conditions of interest contributes significantly to the elucidation of unknown gene function, the validation of gene discoveries and the interpretation of biological processes. However, this clustering problem is not straightforward as the profiles of the genes are not all independently distributed and the expression levels may have been obtained from an experimental design involving replicated arrays. Ignoring the dependence between the gene profiles and the structure of the replicated data can result in important sources of variability in the experiments being overlooked in the analysis, with the consequent possibility of misleading inferences being made. We propose a random-effects model that provides a unified approach to the clustering of genes with correlated expression levels measured in a wide variety of experimental situations. Our model is an extension of the normal mixture model to account for the correlations between the gene profiles and to enable covariate information to be incorporated into the clustering process. Hence the model is applicable to longitudinal studies with or without replication, for example, time-course experiments by using time as a covariate, and to cross-sectional experiments by using categorical covariates to represent the different experimental classes. Results: We show that our random-effects model can be fitted by maximum likelihood via the EM algorithm for which the E(expectation) and M(maximization) steps can be implemented in closed form. Hence our model can be fitted deterministically without the need for time-consuming Monte Carlo approximations. The effectiveness of our model-based procedure for the clustering of correlated gene profiles is demonstrated on three real datasets, representing typical microarray experimental designs, covering time-course, repeated-measurement and cross-sectional data. In these examples, relevant clusters of the genes are obtained, which are supported by existing gene-function annotation. A synthetic dataset is considered too.
Resumo:
High-quality data about protein structures and their gene sequences are essential to the understanding of the relationship between protein folding and protein coding sequences. Firstly we constructed the EcoPDB database, which is a high-quality database of Escherichia coli genes and their corresponding PDB structures. Based on EcoPDB, we presented a novel approach based on information theory to investigate the correlation between cysteine synonymous codon usages and local amino acids flanking cysteines, the correlation between cysteine synonymous codon usages and synonymous codon usages of local amino acids flanking cysteines, as well as the correlation between cysteine synonymous codon usages and the disulfide bonding states of cysteines in the E. coli genome. The results indicate that the nearest neighboring residues and their synonymous codons of the C-terminus have the greatest influence on the usages of the synonymous codons of cysteines and the usage of the synonymous codons has a specific correlation with the disulfide bond formation of cysteines in proteins. The correlations may result from the regulation mechanism of protein structures at gene sequence level and reflect the biological function restriction that cysteines pair to form disulfide bonds. The results may also be helpful in identifying residues that are important for synonymous codon selection of cysteines to introduce disulfide bridges in protein engineering and molecular biology. The approach presented in this paper can also be utilized as a complementary computational method and be applicable to analyse the synonymous codon usages in other model organisms. (c) 2005 Elsevier Ltd. All rights reserved.
Resumo:
Many populations have a negative impact on their habitat or upon other species in the environment if their numbers become too large. For this reason they are often subjected to some form of control. One common control regime is the reduction regime: when the population reaches a certain threshold it is controlled (for example culled) until it falls below a lower predefined level. The natural model for such a controlled population is a birth-death process with two phases, the phase determining which of two distinct sets of birth and death rates governs the process. We present formulae for the probability of extinction and the expected time to extinction, and discuss several applications. (c) 2006 Elsevier Inc. All rights reserved.
Resumo:
Count data with excess zeros relative to a Poisson distribution are common in many biomedical applications. A popular approach to the analysis of such data is to use a zero-inflated Poisson (ZIP) regression model. Often, because of the hierarchical Study design or the data collection procedure, zero-inflation and lack of independence may occur simultaneously, which tender the standard ZIP model inadequate. To account for the preponderance of zero counts and the inherent correlation of observations, a class of multi-level ZIP regression model with random effects is presented. Model fitting is facilitated using an expectation-maximization algorithm, whereas variance components are estimated via residual maximum likelihood estimating equations. A score test for zero-inflation is also presented. The multi-level ZIP model is then generalized to cope with a more complex correlation structure. Application to the analysis of correlated count data from a longitudinal infant feeding study illustrates the usefulness of the approach.
Resumo:
Experimental and theoretical studies have shown the importance of stochastic processes in genetic regulatory networks and cellular processes. Cellular networks and genetic circuits often involve small numbers of key proteins such as transcriptional factors and signaling proteins. In recent years stochastic models have been used successfully for studying noise in biological pathways, and stochastic modelling of biological systems has become a very important research field in computational biology. One of the challenge problems in this field is the reduction of the huge computing time in stochastic simulations. Based on the system of the mitogen-activated protein kinase cascade that is activated by epidermal growth factor, this work give a parallel implementation by using OpenMP and parallelism across the simulation. Special attention is paid to the independence of the generated random numbers in parallel computing, that is a key criterion for the success of stochastic simulations. Numerical results indicate that parallel computers can be used as an efficient tool for simulating the dynamics of large-scale genetic regulatory networks and cellular processes