10 resultados para Bayesian methods
Resumo:
The impact of rapid climate change on contemporary human populations is of global concern. To contextualize our understanding of human responses to rapid climate change it is necessary to examine the archeological record during past climate transitions. One episode of abrupt climate change has been correlated with societal collapse at the end of the northwestern European Bronze Age. We apply new methods to interrogate archeological and paleoclimate data for this transition in Ireland at a higher level of precision than has previously been possible. We analyze archeological 14C dates to demonstrate dramatic population collapse and present high-precision proxy climate data, analyzed through Bayesian methods, to provide evidence for a rapid climatic transition at ca. 750 calibrated years B.C. Our results demonstrate that this climatic downturn did not initiate population collapse and highlight the nondeterministic nature of human responses to past climate change.
Resumo:
The sediment sequence from Hasseldala port in southeastern Sweden provides a unique Lateglacial/early Holocene record that contains five different tephra layers. Three of these have been geochemically identified as the Borrobol Tephra, the Hasseldalen Tephra and the 10-ka Askja Tephra. Twenty-eight high-resolution C-14 measurements have been obtained and three different age models based on Bayesian statistics are employed to provide age estimates for the five different tephra layers. The chrono- and pollen stratigraphic framework supports the stratigraphic position of the Borrobol Tephra as found in Sweden at the very end of the Older Dryas pollen zone and provides the first age estimates for the Askja and Hasseldalen tephras. Our results, however, highlight the limitations that arise in attempting to establish a robust, chronologically independent lacustrine sequence that can be correlated in great detail to ice core or marine records. Radiocarbon samples are prone to error and sedimentation rates in lake basins may vary considerably due to a number of factors. Any type of valid and 'realistic' age model, therefore, has to take these limitations into account and needs to include this information in its prior assumptions. As a result, the age ranges for the specific horizons at Hasseldala port are large and calendar year estimates differ according to the assumptions of the age-model. Not only do these results provide a cautionary note for overdependence on one age-model for the derivation of age estimates for specific horizons, but they also demonstrate that precise correlations to other palaeoarchives to detect leads or lags is problematic. Given the uncertainties associated with establishing age-depth models for sedimentary sequences spanning the Lateglacial period, however, this exercise employing Bayesian probability methods represents the best possible approach and provides the most statistically significant age estimates for the pollen zone boundaries and tephra horizons. Copyright (C) 2006 John Wiley & Sons, Ltd.
Resumo:
Aim-To develop an expert system model for the diagnosis of fine needle aspiration cytology (FNAC) of the breast.
Methods-Knowledge and uncertainty were represented in the form of a Bayesian belief network which permitted the combination of diagnostic evidence in a cumulative manner and provided a final probability for the possible diagnostic outcomes. The network comprised 10 cytological features (evidence nodes), each independently linked to the diagnosis (decision node) by a conditional probability matrix. The system was designed to be interactive in that the cytopathologist entered evidence into the network in the form of likelihood ratios for the outcomes at each evidence node.
Results-The efficiency of the network was tested on a series of 40 breast FNAC specimens. The highest diagnostic probability provided by the network agreed with the cytopathologists' diagnosis in 100% of cases for the assessment of discrete, benign, and malignant aspirates. A typical probably benign cases were given probabilities in favour of a benign diagnosis. Suspicious cases tended to have similar probabilities for both diagnostic outcomes and so, correctly, could not be assigned as benign or malignant. A closer examination of cumulative belief graphs for the diagnostic sequence of each case provided insight into the diagnostic process, and quantitative data which improved the identification of suspicious cases.
Conclusion-The further development of such a system will have three important roles in breast cytodiagnosis: (1) to aid the cytologist in making a more consistent and objective diagnosis; (2) to provide a teaching tool on breast cytological diagnosis for the non-expert; and (3) it is the first stage in the development of a system capable of automated diagnosis through the use of expert system machine vision.
Resumo:
We present a Bayesian-odds-ratio-based algorithm for detecting stellar flares in light-curve data. We assume flares are described by a model in which there is a rapid rise with a half-Gaussian profile, followed by an exponential decay. Our signal model also contains a polynomial background model required to fit underlying light-curve variations in the data, which could otherwise partially mimic a flare. We characterize the false alarm probability and efficiency of this method under the assumption that any unmodelled noise in the data is Gaussian, and compare it with a simpler thresholding method based on that used in Walkowicz et al. We find our method has a significant increase in detection efficiency for low signal-to-noise ratio (S/N) flares. For a conservative false alarm probability our method can detect 95 per cent of flares with S/N less than 20, as compared to S/N of 25 for the simpler method. We also test how well the assumption of Gaussian noise holds by applying the method to a selection of 'quiet' Kepler stars. As an example we have applied our method to a selection of stars in Kepler Quarter 1 data. The method finds 687 flaring stars with a total of 1873 flares after vetos have been applied. For these flares we have made preliminary characterizations of their durations and and S/N.
Resumo:
This work presents novel algorithms for learning Bayesian networks of bounded treewidth. Both exact and approximate methods are developed. The exact method combines mixed integer linear programming formulations for structure learning and treewidth computation. The approximate method consists in sampling k-trees (maximal graphs of treewidth k), and subsequently selecting, exactly or approximately, the best structure whose moral graph is a subgraph of that k-tree. The approaches are empirically compared to each other and to state-of-the-art methods on a collection of public data sets with up to 100 variables.
Resumo:
This paper addresses the problem of learning Bayesian network structures from data based on score functions that are decomposable. It describes properties that strongly reduce the time and memory costs of many known methods without losing global optimality guarantees. These properties are derived for different score criteria such as Minimum Description Length (or Bayesian Information Criterion), Akaike Information Criterion and Bayesian Dirichlet Criterion. Then a branch-and-bound algorithm is presented that integrates structural constraints with data in a way to guarantee global optimality. As an example, structural constraints are used to map the problem of structure learning in Dynamic Bayesian networks into a corresponding augmented Bayesian network. Finally, we show empirically the benefits of using the properties with state-of-the-art methods and with the new algorithm, which is able to handle larger data sets than before.
Resumo:
Retrospective clinical datasets are often characterized by a relatively small sample size and many missing data. In this case, a common way for handling the missingness consists in discarding from the analysis patients with missing covariates, further reducing the sample size. Alternatively, if the mechanism that generated the missing allows, incomplete data can be imputed on the basis of the observed data, avoiding the reduction of the sample size and allowing methods to deal with complete data later on. Moreover, methodologies for data imputation might depend on the particular purpose and might achieve better results by considering specific characteristics of the domain. The problem of missing data treatment is studied in the context of survival tree analysis for the estimation of a prognostic patient stratification. Survival tree methods usually address this problem by using surrogate splits, that is, splitting rules that use other variables yielding similar results to the original ones. Instead, our methodology consists in modeling the dependencies among the clinical variables with a Bayesian network, which is then used to perform data imputation, thus allowing the survival tree to be applied on the completed dataset. The Bayesian network is directly learned from the incomplete data using a structural expectation–maximization (EM) procedure in which the maximization step is performed with an exact anytime method, so that the only source of approximation is due to the EM formulation itself. On both simulated and real data, our proposed methodology usually outperformed several existing methods for data imputation and the imputation so obtained improved the stratification estimated by the survival tree (especially with respect to using surrogate splits).
Resumo:
Learning Bayesian networks with bounded tree-width has attracted much attention recently, because low tree-width allows exact inference to be performed efficiently. Some existing methods [12, 14] tackle the problem by using k-trees to learn the optimal Bayesian network with tree-width up to k. In this paper, we propose a sampling method to efficiently find representative k-trees by introducing an Informative score function to characterize the quality of a k-tree. The proposed algorithm can efficiently learn a Bayesian network with tree-width at most k. Experiment results indicate that our approach is comparable with exact methods, but is much more computationally efficient.
Resumo:
Bounding the tree-width of a Bayesian network can reduce the chance of overfitting, and allows exact inference to be performed efficiently. Several existing algorithms tackle the problem of learning bounded tree-width Bayesian networks by learning from k-trees as super-structures, but they do not scale to large domains and/or large tree-width. We propose a guided search algorithm to find k-trees with maximum Informative scores, which is a measure of quality for the k-tree in yielding good Bayesian networks. The algorithm achieves close to optimal performance compared to exact solutions in small domains, and can discover better networks than existing approximate methods can in large domains. It also provides an optimal elimination order of variables that guarantees small complexity for later runs of exact inference. Comparisons with well-known approaches in terms of learning and inference accuracy illustrate its capabilities.
Resumo:
Learning Bayesian networks with bounded tree-width has attracted much attention recently, because low tree-width allows exact inference to be performed efficiently. Some existing methods \cite{korhonen2exact, nie2014advances} tackle the problem by using $k$-trees to learn the optimal Bayesian network with tree-width up to $k$. Finding the best $k$-tree, however, is computationally intractable. In this paper, we propose a sampling method to efficiently find representative $k$-trees by introducing an informative score function to characterize the quality of a $k$-tree. To further improve the quality of the $k$-trees, we propose a probabilistic hill climbing approach that locally refines the sampled $k$-trees. The proposed algorithm can efficiently learn a quality Bayesian network with tree-width at most $k$. Experimental results demonstrate that our approach is more computationally efficient than the exact methods with comparable accuracy, and outperforms most existing approximate methods.