980 resultados para Bayesian probability
Resumo:
The spatiotemporal dynamics of an alien species invasion across a real landscape are typically complex. While surveillance is an essential part of a management response, planning surveillance in space and time present a difficult challenge due to this complexity. We show here a method for determining the highest probability sites for occupancy across a landscape at an arbitrary point in the future, based on occupancy data from a single slice in time. We apply to the method to the invasion of Giant Hogweed, a serious weed in the Czech republic and throughout Europe.
Resumo:
Approximate Bayesian Computation’ (ABC) represents a powerful methodology for the analysis of complex stochastic systems for which the likelihood of the observed data under an arbitrary set of input parameters may be entirely intractable – the latter condition rendering useless the standard machinery of tractable likelihood-based, Bayesian statistical inference [e.g. conventional Markov chain Monte Carlo (MCMC) simulation]. In this paper, we demonstrate the potential of ABC for astronomical model analysis by application to a case study in the morphological transformation of high-redshift galaxies. To this end, we develop, first, a stochastic model for the competing processes of merging and secular evolution in the early Universe, and secondly, through an ABC-based comparison against the observed demographics of massive (Mgal > 1011 M⊙) galaxies (at 1.5 < z < 3) in the Cosmic Assembly Near-IR Deep Extragalatic Legacy Survey (CANDELS)/Extended Groth Strip (EGS) data set we derive posterior probability densities for the key parameters of this model. The ‘Sequential Monte Carlo’ implementation of ABC exhibited herein, featuring both a self-generating target sequence and self-refining MCMC kernel, is amongst the most efficient of contemporary approaches to this important statistical algorithm. We highlight as well through our chosen case study the value of careful summary statistic selection, and demonstrate two modern strategies for assessment and optimization in this regard. Ultimately, our ABC analysis of the high-redshift morphological mix returns tight constraints on the evolving merger rate in the early Universe and favours major merging (with disc survival or rapid reformation) over secular evolution as the mechanism most responsible for building up the first generation of bulges in early-type discs.
Resumo:
Bayesian networks (BNs) are graphical probabilistic models used for reasoning under uncertainty. These models are becoming increasing popular in a range of fields including ecology, computational biology, medical diagnosis, and forensics. In most of these cases, the BNs are quantified using information from experts, or from user opinions. An interest therefore lies in the way in which multiple opinions can be represented and used in a BN. This paper proposes the use of a measurement error model to combine opinions for use in the quantification of a BN. The multiple opinions are treated as a realisation of measurement error and the model uses the posterior probabilities ascribed to each node in the BN which are computed from the prior information given by each expert. The proposed model addresses the issues associated with current methods of combining opinions such as the absence of a coherent probability model, the lack of the conditional independence structure of the BN being maintained, and the provision of only a point estimate for the consensus. The proposed model is applied an existing Bayesian Network and performed well when compared to existing methods of combining opinions.
Resumo:
The use of expert knowledge to quantify a Bayesian Network (BN) is necessary when data is not available. This however raises questions regarding how opinions from multiple experts can be used in a BN. Linear pooling is a popular method for combining probability assessments from multiple experts. In particular, Prior Linear Pooling (PrLP), which pools opinions then places them into the BN is a common method. This paper firstly proposes an alternative pooling method, Posterior Linear Pooling (PoLP). This method constructs a BN for each expert, then pools the resulting probabilities at the nodes of interest. Secondly, it investigates the advantages and disadvantages of using these pooling methods to combine the opinions of multiple experts. Finally, the methods are applied to an existing BN, the Wayfinding Bayesian Network Model, to investigate the behaviour of different groups of people and how these different methods may be able to capture such differences. The paper focusses on 6 nodes Human Factors, Environmental Factors, Wayfinding, Communication, Visual Elements of Communication and Navigation Pathway, and three subgroups Gender (female, male),Travel Experience (experienced, inexperienced), and Travel Purpose (business, personal) and finds that different behaviors can indeed be captured by the different methods.
Resumo:
To further investigate susceptibility loci identified by genome-wide association studies, we genotyped 5,500 SNPs across 14 associated regions in 8,000 samples from a control group and 3 diseases: type 2 diabetes (T2D), coronary artery disease (CAD) and Graves' disease. We defined, using Bayes theorem, credible sets of SNPs that were 95% likely, based on posterior probability, to contain the causal disease-associated SNPs. In 3 of the 14 regions, TCF7L2 (T2D), CTLA4 (Graves' disease) and CDKN2A-CDKN2B (T2D), much of the posterior probability rested on a single SNP, and, in 4 other regions (CDKN2A-CDKN2B (CAD) and CDKAL1, FTO and HHEX (T2D)), the 95% sets were small, thereby excluding most SNPs as potentially causal. Very few SNPs in our credible sets had annotated functions, illustrating the limitations in understanding the mechanisms underlying susceptibility to common diseases. Our results also show the value of more detailed mapping to target sequences for functional studies. © 2012 Nature America, Inc. All rights reserved.
Resumo:
In this paper the issue of finding uncertainty intervals for queries in a Bayesian Network is reconsidered. The investigation focuses on Bayesian Nets with discrete nodes and finite populations. An earlier asymptotic approach is compared with a simulation-based approach, together with further alternatives, one based on a single sample of the Bayesian Net of a particular finite population size, and another which uses expected population sizes together with exact probabilities. We conclude that a query of a Bayesian Net should be expressed as a probability embedded in an uncertainty interval. Based on an investigation of two Bayesian Net structures, the preferred method is the simulation method. However, both the single sample method and the expected sample size methods may be useful and are simpler to compute. Any method at all is more useful than none, when assessing a Bayesian Net under development, or when drawing conclusions from an ‘expert’ system.
Resumo:
Ship seakeeping operability refers to the quantification of motion performance in waves relative to mission requirements. This is used to make decisions about preferred vessel designs, but it can also be used as comprehensive assessment of the benefits of ship-motion-control systems. Traditionally, operability computation aggregates statistics of motion computed over over the envelope of likely environmental conditions in order to determine a coefficient in the range from 0 to 1 called operability. When used for assessment of motion-control systems, the increase of operability is taken as the key performance indicator. The operability coefficient is often given the interpretation of the percentage of time operable. This paper considers an alternative probabilistic approach to this traditional computation of operability. It characterises operability not as a number to which a frequency interpretation is attached, but as a hypothesis that a vessel will attain the desired performance in one mission considering the envelope of likely operational conditions. This enables the use of Bayesian theory to compute the probability of that this hypothesis is true conditional on data from simulations. Thus, the metric considered is the probability of operability. This formulation not only adheres to recent developments in reliability and risk analysis, but also allows incorporating into the analysis more accurate descriptions of ship-motion-control systems since the analysis is not limited to linear ship responses in the frequency domain. The paper also discusses an extension of the approach to the case of assessment of increased levels of autonomy for unmanned marine craft.
Resumo:
So far, most Phase II trials have been designed and analysed under a frequentist framework. Under this framework, a trial is designed so that the overall Type I and Type II errors of the trial are controlled at some desired levels. Recently, a number of articles have advocated the use of Bavesian designs in practice. Under a Bayesian framework, a trial is designed so that the trial stops when the posterior probability of treatment is within certain prespecified thresholds. In this article, we argue that trials under a Bayesian framework can also be designed to control frequentist error rates. We introduce a Bayesian version of Simon's well-known two-stage design to achieve this goal. We also consider two other errors, which are called Bayesian errors in this article because of their similarities to posterior probabilities. We show that our method can also control these Bayesian-type errors. We compare our method with other recent Bayesian designs in a numerical study and discuss implications of different designs on error rates. An example of a clinical trial for patients with nasopharyngeal carcinoma is used to illustrate differences of the different designs.
Resumo:
Stallard (1998, Biometrics 54, 279-294) recently used Bayesian decision theory for sample-size determination in phase II trials. His design maximizes the expected financial gains in the development of a new treatment. However, it results in a very high probability (0.65) of recommending an ineffective treatment for phase III testing. On the other hand, the expected gain using his design is more than 10 times that of a design that tightly controls the false positive error (Thall and Simon, 1994, Biometrics 50, 337-349). Stallard's design maximizes the expected gain per phase II trial, but it does not maximize the rate of gain or total gain for a fixed length of time because the rate of gain depends on the proportion: of treatments forwarding to the phase III study. We suggest maximizing the rate of gain, and the resulting optimal one-stage design becomes twice as efficient as Stallard's one-stage design. Furthermore, the new design has a probability of only 0.12 of passing an ineffective treatment to phase III study.
Resumo:
Early detection surveillance programs aim to find invasions of exotic plant pests and diseases before they are too widespread to eradicate. However, the value of these programs can be difficult to justify when no positive detections are made. To demonstrate the value of pest absence information provided by these programs, we use a hierarchical Bayesian framework to model estimates of incursion extent with and without surveillance. A model for the latent invasion process provides the baseline against which surveillance data are assessed. Ecological knowledge and pest management criteria are introduced into the model using informative priors for invasion parameters. Observation models assimilate information from spatio-temporal presence/absence data to accommodate imperfect detection and generate posterior estimates of pest extent. When applied to an early detection program operating in Queensland, Australia, the framework demonstrates that this typical surveillance regime provides a modest reduction in the estimate that a surveyed district is infested. More importantly, the model suggests that early detection surveillance programs can provide a dramatic reduction in the putative area of incursion and therefore offer a substantial benefit to incursion management. By mapping spatial estimates of the point probability of infestation, the model identifies where future surveillance resources can be most effectively deployed.
Resumo:
Hierarchical Bayesian models can assimilate surveillance and ecological information to estimate both invasion extent and model parameters for invading plant pests spread by people. A reliability analysis framework that can accommodate multiple dispersal modes is developed to estimate human-mediated dispersal parameters for an invasive species. Uncertainty in the observation process is modelled by accounting for local natural spread and population growth within spatial units. Broad scale incursion dynamics are based on a mechanistic gravity model with a Weibull distribution modification to incorporate a local pest build-up phase. The model uses Markov chain Monte Carlo simulations to infer the probability of colonisation times for discrete spatial units and to estimate connectivity parameters between these units. The hierarchical Bayesian model with observational and ecological components is applied to a surveillance dataset for a spiralling whitefly (Aleurodicus dispersus) invasion in Queensland, Australia. The model structure provides a useful application that draws on surveillance data and ecological knowledge that can be used to manage the risk of pest movement.
Resumo:
Whether a statistician wants to complement a probability model for observed data with a prior distribution and carry out fully probabilistic inference, or base the inference only on the likelihood function, may be a fundamental question in theory, but in practice it may well be of less importance if the likelihood contains much more information than the prior. Maximum likelihood inference can be justified as a Gaussian approximation at the posterior mode, using flat priors. However, in situations where parametric assumptions in standard statistical models would be too rigid, more flexible model formulation, combined with fully probabilistic inference, can be achieved using hierarchical Bayesian parametrization. This work includes five articles, all of which apply probability modeling under various problems involving incomplete observation. Three of the papers apply maximum likelihood estimation and two of them hierarchical Bayesian modeling. Because maximum likelihood may be presented as a special case of Bayesian inference, but not the other way round, in the introductory part of this work we present a framework for probability-based inference using only Bayesian concepts. We also re-derive some results presented in the original articles using the toolbox equipped herein, to show that they are also justifiable under this more general framework. Here the assumption of exchangeability and de Finetti's representation theorem are applied repeatedly for justifying the use of standard parametric probability models with conditionally independent likelihood contributions. It is argued that this same reasoning can be applied also under sampling from a finite population. The main emphasis here is in probability-based inference under incomplete observation due to study design. This is illustrated using a generic two-phase cohort sampling design as an example. The alternative approaches presented for analysis of such a design are full likelihood, which utilizes all observed information, and conditional likelihood, which is restricted to a completely observed set, conditioning on the rule that generated that set. Conditional likelihood inference is also applied for a joint analysis of prevalence and incidence data, a situation subject to both left censoring and left truncation. Other topics covered are model uncertainty and causal inference using posterior predictive distributions. We formulate a non-parametric monotonic regression model for one or more covariates and a Bayesian estimation procedure, and apply the model in the context of optimal sequential treatment regimes, demonstrating that inference based on posterior predictive distributions is feasible also in this case.
Resumo:
Advancements in the analysis techniques have led to a rapid accumulation of biological data in databases. Such data often are in the form of sequences of observations, examples including DNA sequences and amino acid sequences of proteins. The scale and quality of the data give promises of answering various biologically relevant questions in more detail than what has been possible before. For example, one may wish to identify areas in an amino acid sequence, which are important for the function of the corresponding protein, or investigate how characteristics on the level of DNA sequence affect the adaptation of a bacterial species to its environment. Many of the interesting questions are intimately associated with the understanding of the evolutionary relationships among the items under consideration. The aim of this work is to develop novel statistical models and computational techniques to meet with the challenge of deriving meaning from the increasing amounts of data. Our main concern is on modeling the evolutionary relationships based on the observed molecular data. We operate within a Bayesian statistical framework, which allows a probabilistic quantification of the uncertainties related to a particular solution. As the basis of our modeling approach we utilize a partition model, which is used to describe the structure of data by appropriately dividing the data items into clusters of related items. Generalizations and modifications of the partition model are developed and applied to various problems. Large-scale data sets provide also a computational challenge. The models used to describe the data must be realistic enough to capture the essential features of the current modeling task but, at the same time, simple enough to make it possible to carry out the inference in practice. The partition model fulfills these two requirements. The problem-specific features can be taken into account by modifying the prior probability distributions of the model parameters. The computational efficiency stems from the ability to integrate out the parameters of the partition model analytically, which enables the use of efficient stochastic search algorithms.
Resumo:
Predicting temporal responses of ecosystems to disturbances associated with industrial activities is critical for their management and conservation. However, prediction of ecosystem responses is challenging due to the complexity and potential non-linearities stemming from interactions between system components and multiple environmental drivers. Prediction is particularly difficult for marine ecosystems due to their often highly variable and complex natures and large uncertainties surrounding their dynamic responses. Consequently, current management of such systems often rely on expert judgement and/or complex quantitative models that consider only a subset of the relevant ecological processes. Hence there exists an urgent need for the development of whole-of-systems predictive models to support decision and policy makers in managing complex marine systems in the context of industry based disturbances. This paper presents Dynamic Bayesian Networks (DBNs) for predicting the temporal response of a marine ecosystem to anthropogenic disturbances. The DBN provides a visual representation of the problem domain in terms of factors (parts of the ecosystem) and their relationships. These relationships are quantified via Conditional Probability Tables (CPTs), which estimate the variability and uncertainty in the distribution of each factor. The combination of qualitative visual and quantitative elements in a DBN facilitates the integration of a wide array of data, published and expert knowledge and other models. Such multiple sources are often essential as one single source of information is rarely sufficient to cover the diverse range of factors relevant to a management task. Here, a DBN model is developed for tropical, annual Halophila and temperate, persistent Amphibolis seagrass meadows to inform dredging management and help meet environmental guidelines. Specifically, the impacts of capital (e.g. new port development) and maintenance (e.g. maintaining channel depths in established ports) dredging is evaluated with respect to the risk of permanent loss, defined as no recovery within 5 years (Environmental Protection Agency guidelines). The model is developed using expert knowledge, existing literature, statistical models of environmental light, and experimental data. The model is then demonstrated in a case study through the analysis of a variety of dredging, environmental and seagrass ecosystem recovery scenarios. In spatial zones significantly affected by dredging, such as the zone of moderate impact, shoot density has a very high probability of being driven to zero by capital dredging due to the duration of such dredging. Here, fast growing Halophila species can recover, however, the probability of recovery depends on the presence of seed banks. On the other hand, slow growing Amphibolis meadows have a high probability of suffering permanent loss. However, in the maintenance dredging scenario, due to the shorter duration of dredging, Amphibolis is better able to resist the impacts of dredging. For both types of seagrass meadows, the probability of loss was strongly dependent on the biological and ecological status of the meadow, as well as environmental conditions post-dredging. The ability to predict the ecosystem response under cumulative, non-linear interactions across a complex ecosystem highlights the utility of DBNs for decision support and environmental management.
Resumo:
Considering a general linear model of signal degradation, by modeling the probability density function (PDF) of the clean signal using a Gaussian mixture model (GMM) and additive noise by a Gaussian PDF, we derive the minimum mean square error (MMSE) estimator. The derived MMSE estimator is non-linear and the linear MMSE estimator is shown to be a special case. For speech signal corrupted by independent additive noise, by modeling the joint PDF of time-domain speech samples of a speech frame using a GMM, we propose a speech enhancement method based on the derived MMSE estimator. We also show that the same estimator can be used for transform-domain speech enhancement.