946 resultados para BAYESIAN-INFERENCE
Resumo:
The relationships among organisms and their surroundings can be of immense complexity. To describe and understand an ecosystem as a tangled bank, multiple ways of interaction and their effects have to be considered, such as predation, competition, mutualism and facilitation. Understanding the resulting interaction networks is a challenge in changing environments, e.g. to predict knock-on effects of invasive species and to understand how climate change impacts biodiversity. The elucidation of complex ecological systems with their interactions will benefit enormously from the development of new machine learning tools that aim to infer the structure of interaction networks from field data. In the present study, we propose a novel Bayesian regression and multiple changepoint model (BRAM) for reconstructing species interaction networks from observed species distributions. The model has been devised to allow robust inference in the presence of spatial autocorrelation and distributional heterogeneity. We have evaluated the model on simulated data that combines a trophic niche model with a stochastic population model on a 2-dimensional lattice, and we have compared the performance of our model with L1-penalized sparse regression (LASSO) and non-linear Bayesian networks with the BDe scoring scheme. In addition, we have applied our method to plant ground coverage data from the western shore of the Outer Hebrides with the objective to infer the ecological interactions. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Mineral exploration programmes around the world use data from remote sensing, geophysics and direct sampling. On a regional scale, the combination of airborne geophysics and ground-based geochemical sampling can aid geological mapping and economic minerals exploration. The fact that airborne geophysical and traditional soil-sampling data are generated at different spatial resolutions means that they are not immediately comparable due to their different sampling density. Several geostatistical techniques, including indicator cokriging and collocated cokriging, can be used to integrate different types of data into a geostatistical model. With increasing numbers of variables the inference of the cross-covariance model required for cokriging can be demanding in terms of effort and computational time. In this paper a Gaussian-based Bayesian updating approach is applied to integrate airborne radiometric data and ground-sampled geochemical soil data to maximise information generated from the soil survey, to enable more accurate geological interpretation for the exploration and development of natural resources. The Bayesian updating technique decomposes the collocated estimate into a production of two models: prior and likelihood models. The prior model is built from primary information and the likelihood model is built from secondary information. The prior model is then updated with the likelihood model to build the final model. The approach allows multiple secondary variables to be simultaneously integrated into the mapping of the primary variable. The Bayesian updating approach is demonstrated using a case study from Northern Ireland where the history of mineral prospecting for precious and base metals dates from the 18th century. Vein-hosted, strata-bound and volcanogenic occurrences of mineralisation are found. The geostatistical technique was used to improve the resolution of soil geochemistry, collected one sample per 2 km2, by integrating more closely measured airborne geophysical data from the GSNI Tellus Survey, measured over a footprint of 65 x 200 m. The directly measured geochemistry data were considered as primary data in the Bayesian approach and the airborne radiometric data were used as secondary data. The approach produced more detailed updated maps and in particular maximized information on mapped estimates of zinc, copper and lead. Greater delineation of an elongated northwest/southeast trending zone in the updated maps strengthened the potential to investigate stratabound base metal deposits.
Resumo:
Credal networks are graph-based statistical models whose parameters take values in a set, instead of being sharply specified as in traditional statistical models (e.g., Bayesian networks). The computational complexity of inferences on such models depends on the irrelevance/independence concept adopted. In this paper, we study inferential complexity under the concepts of epistemic irrelevance and strong independence. We show that inferences under strong independence are NP-hard even in trees with binary variables except for a single ternary one. We prove that under epistemic irrelevance the polynomial-time complexity of inferences in credal trees is not likely to extend to more general models (e.g., singly connected topologies). These results clearly distinguish networks that admit efficient inferences and those where inferences are most likely hard, and settle several open questions regarding their computational complexity. We show that these results remain valid even if we disallow the use of zero probabilities. We also show that the computation of bounds on the probability of the future state in a hidden Markov model is the same whether we assume epistemic irrelevance or strong independence, and we prove an analogous result for inference in Naive Bayes structures. These inferential equivalences are important for practitioners, as hidden Markov models and Naive Bayes networks are used in real applications of imprecise probability.
Resumo:
Credal networks generalize Bayesian networks by relaxing the requirement of precision of probabilities. Credal networks are considerably more expressive than Bayesian networks, but this makes belief updating NP-hard even on polytrees. We develop a new efficient algorithm for approximate belief updating in credal networks. The algorithm is based on an important representation result we prove for general credal networks: that any credal network can be equivalently reformulated as a credal network with binary variables; moreover, the transformation, which is considerably more complex than in the Bayesian case, can be implemented in polynomial time. The equivalent binary credal network is then updated by L2U, a loopy approximate algorithm for binary credal networks. Overall, we generalize L2U to non-binary credal networks, obtaining a scalable algorithm for the general case, which is approximate only because of its loopy nature. The accuracy of the inferences with respect to other state-of-the-art algorithms is evaluated by extensive numerical tests.
Resumo:
Credal nets generalize Bayesian nets by relaxing the requirement of precision of probabilities. Credal nets are considerably more expressive than Bayesian nets, but this makes belief updating NP-hard even on polytrees. We develop a new efficient algorithm for approximate belief updating in credal nets. The algorithm is based on an important representation result we prove for general credal nets: that any credal net can be equivalently reformulated as a credal net with binary variables; moreover, the transformation, which is considerably more complex than in the Bayesian case, can be implemented in polynomial time. The equivalent binary credal net is updated by L2U, a loopy approximate algorithm for binary credal nets. Thus, we generalize L2U to non-binary credal nets, obtaining an accurate and scalable algorithm for the general case, which is approximate only because of its loopy nature. The accuracy of the inferences is evaluated by empirical tests.
Resumo:
This paper addresses the estimation of parameters of a Bayesian network from incomplete data. The task is usually tackled by running the Expectation-Maximization (EM) algorithm several times in order to obtain a high log-likelihood estimate. We argue that choosing the maximum log-likelihood estimate (as well as the maximum penalized log-likelihood and the maximum a posteriori estimate) has severe drawbacks, being affected both by overfitting and model uncertainty. Two ideas are discussed to overcome these issues: a maximum entropy approach and a Bayesian model averaging approach. Both ideas can be easily applied on top of EM, while the entropy idea can be also implemented in a more sophisticated way, through a dedicated non-linear solver. A vast set of experiments shows that these ideas produce significantly better estimates and inferences than the traditional and widely used maximum (penalized) log-likelihood and maximum a posteriori estimates. In particular, if EM is adopted as optimization engine, the model averaging approach is the best performing one; its performance is matched by the entropy approach when implemented using the non-linear solver. The results suggest that the applicability of these ideas is immediate (they are easy to implement and to integrate in currently available inference engines) and that they constitute a better way to learn Bayesian network parameters.
Resumo:
This paper considers inference from multinomial data and addresses the problem of choosing the strength of the Dirichlet prior under a mean-squared error criterion. We compare the Maxi-mum Likelihood Estimator (MLE) and the most commonly used Bayesian estimators obtained by assuming a prior Dirichlet distribution with non-informative prior parameters, that is, the parameters of the Dirichlet are equal and altogether sum up to the so called strength of the prior. Under this criterion, MLE becomes more preferable than the Bayesian estimators at the increase of the number of categories k of the multinomial, because non-informative Bayesian estimators induce a region where they are dominant that quickly shrinks with the increase of k. This can be avoided if the strength of the prior is not kept constant but decreased with the number of categories. We argue that the strength should decrease at least k times faster than usual estimators do.
Resumo:
This paper investigates a representation language with flexibility inspired by probabilistic logic and compactness inspired by relational Bayesian networks. The goal is to handle propositional and first-order constructs together with precise, imprecise, indeterminate and qualitative probabilistic assessments. The paper shows how this can be achieved through the theory of credal networks. New exact and approximate inference algorithms based on multilinear programming and iterated/loopy propagation of interval probabilities are presented; their superior performance, compared to existing ones, is shown empirically.
Resumo:
A credal network is a graphical tool for representation and manipulation of uncertainty, where probability values may be imprecise or indeterminate. A credal network associates a directed acyclic graph with a collection of sets of probability measures; in this context, inference is the computation of tight lower and upper bounds for conditional probabilities. In this paper we present new algorithms for inference in credal networks based on multilinear programming techniques. Experiments indicate that these new algorithms have better performance than existing ones, in the sense that they can produce more accurate results in larger networks.
Resumo:
Learning Bayesian networks with bounded tree-width has attracted much attention recently, because low tree-width allows exact inference to be performed efficiently. Some existing methods [12, 14] tackle the problem by using k-trees to learn the optimal Bayesian network with tree-width up to k. In this paper, we propose a sampling method to efficiently find representative k-trees by introducing an Informative score function to characterize the quality of a k-tree. The proposed algorithm can efficiently learn a Bayesian network with tree-width at most k. Experiment results indicate that our approach is comparable with exact methods, but is much more computationally efficient.
Resumo:
Bounding the tree-width of a Bayesian network can reduce the chance of overfitting, and allows exact inference to be performed efficiently. Several existing algorithms tackle the problem of learning bounded tree-width Bayesian networks by learning from k-trees as super-structures, but they do not scale to large domains and/or large tree-width. We propose a guided search algorithm to find k-trees with maximum Informative scores, which is a measure of quality for the k-tree in yielding good Bayesian networks. The algorithm achieves close to optimal performance compared to exact solutions in small domains, and can discover better networks than existing approximate methods can in large domains. It also provides an optimal elimination order of variables that guarantees small complexity for later runs of exact inference. Comparisons with well-known approaches in terms of learning and inference accuracy illustrate its capabilities.
Resumo:
Thesis (Master's)--University of Washington, 2016-03
Resumo:
This article extends existing discussion in literature on probabilistic inference and decision making with respect to continuous hypotheses that are prevalent in forensic toxicology. As a main aim, this research investigates the properties of a widely followed approach for quantifying the level of toxic substances in blood samples, and to compare this procedure with a Bayesian probabilistic approach. As an example, attention is confined to the presence of toxic substances, such as THC, in blood from car drivers. In this context, the interpretation of results from laboratory analyses needs to take into account legal requirements for establishing the 'presence' of target substances in blood. In a first part, the performance of the proposed Bayesian model for the estimation of an unknown parameter (here, the amount of a toxic substance) is illustrated and compared with the currently used method. The model is then used in a second part to approach-in a rational way-the decision component of the problem, that is judicial questions of the kind 'Is the quantity of THC measured in the blood over the legal threshold of 1.5 μg/l?'. This is pointed out through a practical example.
Resumo:
We study the problem of measuring the uncertainty of CGE (or RBC)-type model simulations associated with parameter uncertainty. We describe two approaches for building confidence sets on model endogenous variables. The first one uses a standard Wald-type statistic. The second approach assumes that a confidence set (sampling or Bayesian) is available for the free parameters, from which confidence sets are derived by a projection technique. The latter has two advantages: first, confidence set validity is not affected by model nonlinearities; second, we can easily build simultaneous confidence intervals for an unlimited number of variables. We study conditions under which these confidence sets take the form of intervals and show they can be implemented using standard methods for solving CGE models. We present an application to a CGE model of the Moroccan economy to study the effects of policy-induced increases of transfers from Moroccan expatriates.
Resumo:
Genetic data obtained on population samples convey information about their evolutionary history. Inference methods can extract part of this information but they require sophisticated statistical techniques that have been made available to the biologist community (through computer programs) only for simple and standard situations typically involving a small number of samples. We propose here a computer program (DIY ABC) for inference based on approximate Bayesian computation (ABC), in which scenarios can be customized by the user to fit many complex situations involving any number of populations and samples. Such scenarios involve any combination of population divergences, admixtures and population size changes. DIY ABC can be used to compare competing scenarios, estimate parameters for one or more scenarios and compute bias and precision measures for a given scenario and known values of parameters (the current version applies to unlinked microsatellite data). This article describes key methods used in the program and provides its main features. The analysis of one simulated and one real dataset, both with complex evolutionary scenarios, illustrates the main possibilities of DIY ABC.