966 resultados para Bayesian point estimate
Resumo:
In vitro studies and mathematical models are now being widely used to study the underlying mechanisms driving the expansion of cell colonies. This can improve our understanding of cancer formation and progression. Although much progress has been made in terms of developing and analysing mathematical models, far less progress has been made in terms of understanding how to estimate model parameters using experimental in vitro image-based data. To address this issue, a new approximate Bayesian computation (ABC) algorithm is proposed to estimate key parameters governing the expansion of melanoma cell (MM127) colonies, including cell diffusivity, D, cell proliferation rate, λ, and cell-to-cell adhesion, q, in two experimental scenarios, namely with and without a chemical treatment to suppress cell proliferation. Even when little prior biological knowledge about the parameters is assumed, all parameters are precisely inferred with a small posterior coefficient of variation, approximately 2–12%. The ABC analyses reveal that the posterior distributions of D and q depend on the experimental elapsed time, whereas the posterior distribution of λ does not. The posterior mean values of D and q are in the ranges 226–268 µm2h−1, 311–351 µm2h−1 and 0.23–0.39, 0.32–0.61 for the experimental periods of 0–24 h and 24–48 h, respectively. Furthermore, we found that the posterior distribution of q also depends on the initial cell density, whereas the posterior distributions of D and λ do not. The ABC approach also enables information from the two experiments to be combined, resulting in greater precision for all estimates of D and λ.
Resumo:
The total entropy utility function is considered for the dual purpose of Bayesian design for model discrimination and parameter estimation. A sequential design setting is proposed where it is shown how to efficiently estimate the total entropy utility for a wide variety of data types. Utility estimation relies on forming particle approximations to a number of intractable integrals which is afforded by the use of the sequential Monte Carlo algorithm for Bayesian inference. A number of motivating examples are considered for demonstrating the performance of total entropy in comparison to utilities for model discrimination and parameter estimation. The results suggest that the total entropy utility selects designs which are efficient under both experimental goals with little compromise in achieving either goal. As such, the total entropy utility is advocated as a general utility for Bayesian design in the presence of model uncertainty.
Resumo:
An important uncertainty when estimating per capita consumption of, for example, illicit drugs by means of wastewater analysis (sometimes referred to as “sewage epidemiology”) relates to the size and variability of the de facto population in the catchment of interest. In the absence of a day-specific direct population count any indirect surrogate model to estimate population size lacks a standard to assess associated uncertainties. Therefore, the objective of this study was to collect wastewater samples at a unique opportunity, that is, on a census day, as a basis for a model to estimate the number of people contributing to a given wastewater sample. Mass loads for a wide range of pharmaceuticals and personal care products were quantified in influents of ten sewage treatment plants (STP) serving populations ranging from approximately 3500 to 500 000 people. Separate linear models for population size were estimated with the mass loads of the different chemical as the explanatory variable: 14 chemicals showed good, linear relationships, with highest correlations for acesulfame and gabapentin. De facto population was then estimated through Bayesian inference, by updating the population size provided by STP staff (prior knowledge) with measured chemical mass loads. Cross validation showed that large populations can be estimated fairly accurately with a few chemical mass loads quantified from 24-h composite samples. In contrast, the prior knowledge for small population sizes cannot be improved substantially despite the information of multiple chemical mass loads. In the future, observations other than chemical mass loads may improve this deficit, since Bayesian inference allows including any kind of information relating to population size.
Resumo:
Gene expression is arguably the most important indicator of biological function. Thus identifying differentially expressed genes is one of the main aims of high throughout studies that use microarray and RNAseq platforms to study deregulated cellular pathways. There are many tools for analysing differentia gene expression from transciptomic datasets. The major challenge of this topic is to estimate gene expression variance due to the high amount of ‘background noise’ that is generated from biological equipment and the lack of biological replicates. Bayesian inference has been widely used in the bioinformatics field. In this work, we reveal that the prior knowledge employed in the Bayesian framework also helps to improve the accuracy of differential gene expression analysis when using a small number of replicates. We have developed a differential analysis tool that uses Bayesian estimation of the variance of gene expression for use with small numbers of biological replicates. Our method is more consistent when compared to the widely used cyber-t tool that successfully introduced the Bayesian framework to differential analysis. We also provide a user-friendly web based Graphic User Interface for biologists to use with microarray and RNAseq data. Bayesian inference can compensate for the instability of variance caused when using a small number of biological replicates by using pseudo replicates as prior knowledge. We also show that our new strategy to select pseudo replicates will improve the performance of the analysis. - See more at: http://www.eurekaselect.com/node/138761/article#sthash.VeK9xl5k.dpuf
Resumo:
This paper proposes solutions to three issues pertaining to the estimation of finite mixture models with an unknown number of components: the non-identifiability induced by overfitting the number of components, the mixing limitations of standard Markov Chain Monte Carlo (MCMC) sampling techniques, and the related label switching problem. An overfitting approach is used to estimate the number of components in a finite mixture model via a Zmix algorithm. Zmix provides a bridge between multidimensional samplers and test based estimation methods, whereby priors are chosen to encourage extra groups to have weights approaching zero. MCMC sampling is made possible by the implementation of prior parallel tempering, an extension of parallel tempering. Zmix can accurately estimate the number of components, posterior parameter estimates and allocation probabilities given a sufficiently large sample size. The results will reflect uncertainty in the final model and will report the range of possible candidate models and their respective estimated probabilities from a single run. Label switching is resolved with a computationally light-weight method, Zswitch, developed for overfitted mixtures by exploiting the intuitiveness of allocation-based relabelling algorithms and the precision of label-invariant loss functions. Four simulation studies are included to illustrate Zmix and Zswitch, as well as three case studies from the literature. All methods are available as part of the R package Zmix, which can currently be applied to univariate Gaussian mixture models.
Resumo:
Hierarchical Bayesian models can assimilate surveillance and ecological information to estimate both invasion extent and model parameters for invading plant pests spread by people. A reliability analysis framework that can accommodate multiple dispersal modes is developed to estimate human-mediated dispersal parameters for an invasive species. Uncertainty in the observation process is modelled by accounting for local natural spread and population growth within spatial units. Broad scale incursion dynamics are based on a mechanistic gravity model with a Weibull distribution modification to incorporate a local pest build-up phase. The model uses Markov chain Monte Carlo simulations to infer the probability of colonisation times for discrete spatial units and to estimate connectivity parameters between these units. The hierarchical Bayesian model with observational and ecological components is applied to a surveillance dataset for a spiralling whitefly (Aleurodicus dispersus) invasion in Queensland, Australia. The model structure provides a useful application that draws on surveillance data and ecological knowledge that can be used to manage the risk of pest movement.
Resumo:
Disease maps are effective tools for explaining and predicting patterns of disease outcomes across geographical space, identifying areas of potentially elevated risk, and formulating and validating aetiological hypotheses for a disease. Bayesian models have become a standard approach to disease mapping in recent decades. This article aims to provide a basic understanding of the key concepts involved in Bayesian disease mapping methods for areal data. It is anticipated that this will help in interpretation of published maps, and provide a useful starting point for anyone interested in running disease mapping methods for areal data. The article provides detailed motivation and descriptions on disease mapping methods by explaining the concepts, defining the technical terms, and illustrating the utility of disease mapping for epidemiological research by demonstrating various ways of visualising model outputs using a case study. The target audience includes spatial scientists in health and other fields, policy or decision makers, health geographers, spatial analysts, public health professionals, and epidemiologists.
Resumo:
This thesis which consists of an introduction and four peer-reviewed original publications studies the problems of haplotype inference (haplotyping) and local alignment significance. The problems studied here belong to the broad area of bioinformatics and computational biology. The presented solutions are computationally fast and accurate, which makes them practical in high-throughput sequence data analysis. Haplotype inference is a computational problem where the goal is to estimate haplotypes from a sample of genotypes as accurately as possible. This problem is important as the direct measurement of haplotypes is difficult, whereas the genotypes are easier to quantify. Haplotypes are the key-players when studying for example the genetic causes of diseases. In this thesis, three methods are presented for the haplotype inference problem referred to as HaploParser, HIT, and BACH. HaploParser is based on a combinatorial mosaic model and hierarchical parsing that together mimic recombinations and point-mutations in a biologically plausible way. In this mosaic model, the current population is assumed to be evolved from a small founder population. Thus, the haplotypes of the current population are recombinations of the (implicit) founder haplotypes with some point--mutations. HIT (Haplotype Inference Technique) uses a hidden Markov model for haplotypes and efficient algorithms are presented to learn this model from genotype data. The model structure of HIT is analogous to the mosaic model of HaploParser with founder haplotypes. Therefore, it can be seen as a probabilistic model of recombinations and point-mutations. BACH (Bayesian Context-based Haplotyping) utilizes a context tree weighting algorithm to efficiently sum over all variable-length Markov chains to evaluate the posterior probability of a haplotype configuration. Algorithms are presented that find haplotype configurations with high posterior probability. BACH is the most accurate method presented in this thesis and has comparable performance to the best available software for haplotype inference. Local alignment significance is a computational problem where one is interested in whether the local similarities in two sequences are due to the fact that the sequences are related or just by chance. Similarity of sequences is measured by their best local alignment score and from that, a p-value is computed. This p-value is the probability of picking two sequences from the null model that have as good or better best local alignment score. Local alignment significance is used routinely for example in homology searches. In this thesis, a general framework is sketched that allows one to compute a tight upper bound for the p-value of a local pairwise alignment score. Unlike the previous methods, the presented framework is not affeced by so-called edge-effects and can handle gaps (deletions and insertions) without troublesome sampling and curve fitting.
Resumo:
Predicting temporal responses of ecosystems to disturbances associated with industrial activities is critical for their management and conservation. However, prediction of ecosystem responses is challenging due to the complexity and potential non-linearities stemming from interactions between system components and multiple environmental drivers. Prediction is particularly difficult for marine ecosystems due to their often highly variable and complex natures and large uncertainties surrounding their dynamic responses. Consequently, current management of such systems often rely on expert judgement and/or complex quantitative models that consider only a subset of the relevant ecological processes. Hence there exists an urgent need for the development of whole-of-systems predictive models to support decision and policy makers in managing complex marine systems in the context of industry based disturbances. This paper presents Dynamic Bayesian Networks (DBNs) for predicting the temporal response of a marine ecosystem to anthropogenic disturbances. The DBN provides a visual representation of the problem domain in terms of factors (parts of the ecosystem) and their relationships. These relationships are quantified via Conditional Probability Tables (CPTs), which estimate the variability and uncertainty in the distribution of each factor. The combination of qualitative visual and quantitative elements in a DBN facilitates the integration of a wide array of data, published and expert knowledge and other models. Such multiple sources are often essential as one single source of information is rarely sufficient to cover the diverse range of factors relevant to a management task. Here, a DBN model is developed for tropical, annual Halophila and temperate, persistent Amphibolis seagrass meadows to inform dredging management and help meet environmental guidelines. Specifically, the impacts of capital (e.g. new port development) and maintenance (e.g. maintaining channel depths in established ports) dredging is evaluated with respect to the risk of permanent loss, defined as no recovery within 5 years (Environmental Protection Agency guidelines). The model is developed using expert knowledge, existing literature, statistical models of environmental light, and experimental data. The model is then demonstrated in a case study through the analysis of a variety of dredging, environmental and seagrass ecosystem recovery scenarios. In spatial zones significantly affected by dredging, such as the zone of moderate impact, shoot density has a very high probability of being driven to zero by capital dredging due to the duration of such dredging. Here, fast growing Halophila species can recover, however, the probability of recovery depends on the presence of seed banks. On the other hand, slow growing Amphibolis meadows have a high probability of suffering permanent loss. However, in the maintenance dredging scenario, due to the shorter duration of dredging, Amphibolis is better able to resist the impacts of dredging. For both types of seagrass meadows, the probability of loss was strongly dependent on the biological and ecological status of the meadow, as well as environmental conditions post-dredging. The ability to predict the ecosystem response under cumulative, non-linear interactions across a complex ecosystem highlights the utility of DBNs for decision support and environmental management.
Resumo:
The aim of this paper is to provide a Bayesian formulation of the so-called magnitude-based inference approach to quantifying and interpreting effects, and in a case study example provide accurate probabilistic statements that correspond to the intended magnitude-based inferences. The model is described in the context of a published small-scale athlete study which employed a magnitude-based inference approach to compare the effect of two altitude training regimens (live high-train low (LHTL), and intermittent hypoxic exposure (IHE)) on running performance and blood measurements of elite triathletes. The posterior distributions, and corresponding point and interval estimates, for the parameters and associated effects and comparisons of interest, were estimated using Markov chain Monte Carlo simulations. The Bayesian analysis was shown to provide more direct probabilistic comparisons of treatments and able to identify small effects of interest. The approach avoided asymptotic assumptions and overcame issues such as multiple testing. Bayesian analysis of unscaled effects showed a probability of 0.96 that LHTL yields a substantially greater increase in hemoglobin mass than IHE, a 0.93 probability of a substantially greater improvement in running economy and a greater than 0.96 probability that both IHE and LHTL yield a substantially greater improvement in maximum blood lactate concentration compared to a Placebo. The conclusions are consistent with those obtained using a ‘magnitude-based inference’ approach that has been promoted in the field. The paper demonstrates that a fully Bayesian analysis is a simple and effective way of analysing small effects, providing a rich set of results that are straightforward to interpret in terms of probabilistic statements.
Resumo:
Foliage density and leaf area index are important vegetation structure variables. They can be measured by several methods but few have been tested in tropical forests which have high structural heterogeneity. In this study, foliage density estimates by two indirect methods, the point quadrat and photographic methods, were compared with those obtained by direct leaf counts in the understorey of a wet evergreen forest in southern India. The point quadrat method has a tendency to overestimate, whereas the photographic method consistently and ignificantly underestimates foliage density. There was stratification within the understorey, with areas close to the ground having higher foliage densities.
Resumo:
The impulse response of a typical wireless multipath channel can be modeled as a tapped delay line filter whose non-zero components are sparse relative to the channel delay spread. In this paper, a novel method of estimating such sparse multipath fading channels for OFDM systems is explored. In particular, Sparse Bayesian Learning (SBL) techniques are applied to jointly estimate the sparse channel and its second order statistics, and a new Bayesian Cramer-Rao bound is derived for the SBL algorithm. Further, in the context of OFDM channel estimation, an enhancement to the SBL algorithm is proposed, which uses an Expectation Maximization (EM) framework to jointly estimate the sparse channel, unknown data symbols and the second order statistics of the channel. The EM-SBL algorithm is able to recover the support as well as the channel taps more efficiently, and/or using fewer pilot symbols, than the SBL algorithm. To further improve the performance of the EM-SBL, a threshold-based pruning of the estimated second order statistics that are input to the algorithm is proposed, and its mean square error and symbol error rate performance is illustrated through Monte-Carlo simulations. Thus, the algorithms proposed in this paper are capable of obtaining efficient sparse channel estimates even in the presence of a small number of pilots.
Resumo:
The lifetime calculation of large dense sensor networks with fixed energy resources and the remaining residual energy have shown that for a constant energy resource in a sensor network the fault rate at the cluster head is network size invariant when using the network layer with no MAC losses.Even after increasing the battery capacities in the nodes the total lifetime does not increase after a max limit of 8 times. As this is a serious limitation lots of research has been done at the MAC layer which allows to adapt to the specific connectivity, traffic and channel polling needs for sensor networks. There have been lots of MAC protocols which allow to control the channel polling of new radios which are available to sensor nodes to communicate. This further reduces the communication overhead by idling and sleep scheduling thus extending the lifetime of the monitoring application. We address the two issues which effects the distributed characteristics and performance of connected MAC nodes. (1) To determine the theoretical minimum rate based on joint coding for a correlated data source at the singlehop, (2a) to estimate cluster head errors using Bayesian rule for routing using persistence clustering when node densities are the same and stored using prior probability at the network layer, (2b) to estimate the upper bound of routing errors when using passive clustering were the node densities at the multi-hop MACS are unknown and not stored at the multi-hop nodes a priori. In this paper we evaluate many MAC based sensor network protocols and study the effects on sensor network lifetime. A renewable energy MAC routing protocol is designed when the probabilities of active nodes are not known a priori. From theoretical derivations we show that for a Bayesian rule with known class densities of omega1, omega2 with expected error P* is bounded by max error rate of P=2P* for single-hop. We study the effects of energy losses using cross-layer simulation of - large sensor network MACS setup, the error rate which effect finding sufficient node densities to have reliable multi-hop communications due to unknown node densities. The simulation results show that even though the lifetime is comparable the expected Bayesian posterior probability error bound is close or higher than Pges2P*.
Resumo:
In this work, we address the recovery of block sparse vectors with intra-block correlation, i.e., the recovery of vectors in which the correlated nonzero entries are constrained to lie in a few clusters, from noisy underdetermined linear measurements. Among Bayesian sparse recovery techniques, the cluster Sparse Bayesian Learning (SBL) is an efficient tool for block-sparse vector recovery, with intra-block correlation. However, this technique uses a heuristic method to estimate the intra-block correlation. In this paper, we propose the Nested SBL (NSBL) algorithm, which we derive using a novel Bayesian formulation that facilitates the use of the monotonically convergent nested Expectation Maximization (EM) and a Kalman filtering based learning framework. Unlike the cluster-SBL algorithm, this formulation leads to closed-form EMupdates for estimating the correlation coefficient. We demonstrate the efficacy of the proposed NSBL algorithm using Monte Carlo simulations.
Resumo:
A new approach is proposed to estimate the thermal diffusivity of optically transparent solids at ambient temperature based on the velocity of an effective temperature point (ETP), and by using a two-beam interferometer the proposed concept is corroborated. 1D unsteady heat flow via step-temperature excitation is interpreted as a `micro-scale rectilinear translatory motion' of an ETP. The velocity dependent function is extracted by revisiting the Fourier heat diffusion equation. The relationship between the velocity of the ETP with thermal diffusivity is modeled using a standard solution. Under optimized thermal excitation, the product of the `velocity of the ETP' and the distance is a new constitutive equation for the thermal diffusivity of the solid. The experimental approach involves the establishment of a 1D unsteady heat flow inside the sample through step-temperature excitation. In the moving isothermal surfaces, the ETP is identified using a two-beam interferometer. The arrival-time of the ETP to reach a fixed distance away from heat source is measured, and its velocity is calculated. The velocity of the ETP and a given distance is sufficient to estimate the thermal diffusivity of a solid. The proposed method is experimentally verified for BK7 glass samples and the measured results are found to match closely with the reported value.