68 resultados para Bayesian nonparametric
Resumo:
In this paper, we present a Bayesian approach to estimate a chromosome and a disorder network from the Online Mendelian Inheritance in Man (OMIM) database. In contrast to other approaches, we obtain statistic rather than deterministic networks enabling a parametric control in the uncertainty of the underlying disorder-disease gene associations contained in the OMIM, on which the networks are based. From a structural investigation of the chromosome network, we identify three chromosome subgroups that reflect architectural differences in chromosome-disorder associations that are predictively exploitable for a functional analysis of diseases.
Resumo:
A benefit function transfer obtains estimates of willingness-to-pay (WTP) for the evaluation of a given policy at a site by combining existing information from different study sites. This has the advantage that more efficient estimates are obtained, but it relies on the assumption that the heterogeneity between sites is appropriately captured in the benefit transfer model. A more expensive alternative to estimate WTP is to analyze only data from the policy site in question while ignoring information from other sites. We make use of the fact that these two choices can be viewed as a model selection problem and extend the set of models to allow for the hypothesis that the benefit function is only applicable to a subset of sites. We show how Bayesian model averaging (BMA) techniques can be used to optimally combine information from all models. The Bayesian algorithm searches for the set of sites that can form the basis for estimating a benefit function and reveals whether such information can be transferred to new sites for which only a small data set is available. We illustrate the method with a sample of 42 forests from U.K. and Ireland. We find that BMA benefit function transfer produces reliable estimates and can increase about 8 times the information content of a small sample when the forest is 'poolable'. © 2008 Elsevier Inc. All rights reserved.
Resumo:
The relationships among organisms and their surroundings can be of immense complexity. To describe and understand an ecosystem as a tangled bank, multiple ways of interaction and their effects have to be considered, such as predation, competition, mutualism and facilitation. Understanding the resulting interaction networks is a challenge in changing environments, e.g. to predict knock-on effects of invasive species and to understand how climate change impacts biodiversity. The elucidation of complex ecological systems with their interactions will benefit enormously from the development of new machine learning tools that aim to infer the structure of interaction networks from field data. In the present study, we propose a novel Bayesian regression and multiple changepoint model (BRAM) for reconstructing species interaction networks from observed species distributions. The model has been devised to allow robust inference in the presence of spatial autocorrelation and distributional heterogeneity. We have evaluated the model on simulated data that combines a trophic niche model with a stochastic population model on a 2-dimensional lattice, and we have compared the performance of our model with L1-penalized sparse regression (LASSO) and non-linear Bayesian networks with the BDe scoring scheme. In addition, we have applied our method to plant ground coverage data from the western shore of the Outer Hebrides with the objective to infer the ecological interactions. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Recently, Bayesian statistical software has been developed for age-depth modeling (wiggle-match dating) of sequences of densely spaced radiocarbon dates from peat cores. The method is described in non-statistical terms, and is compared with an alternative method of chronological ordering of 14C dates. Case studies include the dating of the start of agriculture in the northeastern part of the Netherlands, and of a possible Hekla-3 tephra layer in the same country. We discuss future enhancements in Bayesian age modeling.
Resumo:
In many applications in applied statistics researchers reduce the complexity of a data set by combining a group of variables into a single measure using factor analysis or an index number. We argue that such compression loses information if the data actually has high dimensionality. We advocate the use of a non-parametric estimator, commonly used in physics (the Takens estimator), to estimate the correlation dimension of the data prior to compression. The advantage of this approach over traditional linear data compression approaches is that the data does not have to be linearized. Applying our ideas to the United Nations Human Development Index we find that the four variables that are used in its construction have dimension three and the index loses information.
Resumo:
Prostatic intraepithelial neoplasia (PIN) diagnosis and grading are affected by uncertainties which arise from the fact that almost all knowledge of PIN histopathology is expressed in concepts, descriptive linguistic terms, and words. A Bayesian belief network (BBN) was therefore used to reduce the problem of uncertainty in diagnostic clue assessment, while still considering the dependences between elements in the reasoning sequence. A shallow network was used with an open-tree topology, with eight first-level descendant nodes for the diagnostic clues (evidence nodes), each independently linked by a conditional probability matrix to a root node containing the diagnostic alternatives (decision node). One of the evidence nodes was based on the tissue architecture and the others were based on cell features. The system was designed to be interactive, in that the histopathologist entered evidence into the network in the form of likelihood ratios for outcomes at each evidence node. The efficiency of the network was tested on a series of 110 prostate specimens, subdivided as follows: 22 cases of non-neoplastic prostate or benign prostatic tissue (NP), 22 PINs of low grade (PINlow), 22 PINs of high grade (PINhigh), 22 prostatic adenocarcinomas with cribriform pattern (PACcri), and 22 prostatic adenocarcinomas with large acinar pattern (PAClgac). The results obtained in the benign and malignant categories showed that the belief for the diagnostic alternatives is very high, the values being in general more than 0.8 and often close to 1.0. When considering the PIN lesions, the network classified and graded most of the cases with high certainty. However, there were some cases which showed values less than 0.8 (13 cases out of 44), thus indicating that there are situations in which the feature changes are intermediate between contiguous categories or grades. Discrepancy between morphological grading and the BBN results was observed in four out of 44 PIN cases: one PINlow was classified as PINhigh and three PINhigh were classified as PINlow. In conclusion, the network can grade PlN lesions and differentiate them from other prostate lesions with certainty. In particular, it offers a descriptive classifier which is readily implemented and which allows the use of linguistic, fuzzy variables.
Resumo:
Aim-To develop an expert system model for the diagnosis of fine needle aspiration cytology (FNAC) of the breast.
Methods-Knowledge and uncertainty were represented in the form of a Bayesian belief network which permitted the combination of diagnostic evidence in a cumulative manner and provided a final probability for the possible diagnostic outcomes. The network comprised 10 cytological features (evidence nodes), each independently linked to the diagnosis (decision node) by a conditional probability matrix. The system was designed to be interactive in that the cytopathologist entered evidence into the network in the form of likelihood ratios for the outcomes at each evidence node.
Results-The efficiency of the network was tested on a series of 40 breast FNAC specimens. The highest diagnostic probability provided by the network agreed with the cytopathologists' diagnosis in 100% of cases for the assessment of discrete, benign, and malignant aspirates. A typical probably benign cases were given probabilities in favour of a benign diagnosis. Suspicious cases tended to have similar probabilities for both diagnostic outcomes and so, correctly, could not be assigned as benign or malignant. A closer examination of cumulative belief graphs for the diagnostic sequence of each case provided insight into the diagnostic process, and quantitative data which improved the identification of suspicious cases.
Conclusion-The further development of such a system will have three important roles in breast cytodiagnosis: (1) to aid the cytologist in making a more consistent and objective diagnosis; (2) to provide a teaching tool on breast cytological diagnosis for the non-expert; and (3) it is the first stage in the development of a system capable of automated diagnosis through the use of expert system machine vision.
Resumo:
Mobile malware has been growing in scale and complexity spurred by the unabated uptake of smartphones worldwide. Android is fast becoming the most popular mobile platform resulting in sharp increase in malware targeting the platform. Additionally, Android malware is evolving rapidly to evade detection by traditional signature-based scanning. Despite current detection measures in place, timely discovery of new malware is still a critical issue. This calls for novel approaches to mitigate the growing threat of zero-day Android malware. Hence, the authors develop and analyse proactive machine-learning approaches based on Bayesian classification aimed at uncovering unknown Android malware via static analysis. The study, which is based on a large malware sample set of majority of the existing families, demonstrates detection capabilities with high accuracy. Empirical results and comparative analysis are presented offering useful insight towards development of effective static-analytic Bayesian classification-based solutions for detecting unknown Android malware.
Resumo:
Mineral exploration programmes around the world use data from remote sensing, geophysics and direct sampling. On a regional scale, the combination of airborne geophysics and ground-based geochemical sampling can aid geological mapping and economic minerals exploration. The fact that airborne geophysical and traditional soil-sampling data are generated at different spatial resolutions means that they are not immediately comparable due to their different sampling density. Several geostatistical techniques, including indicator cokriging and collocated cokriging, can be used to integrate different types of data into a geostatistical model. With increasing numbers of variables the inference of the cross-covariance model required for cokriging can be demanding in terms of effort and computational time. In this paper a Gaussian-based Bayesian updating approach is applied to integrate airborne radiometric data and ground-sampled geochemical soil data to maximise information generated from the soil survey, to enable more accurate geological interpretation for the exploration and development of natural resources. The Bayesian updating technique decomposes the collocated estimate into a production of two models: prior and likelihood models. The prior model is built from primary information and the likelihood model is built from secondary information. The prior model is then updated with the likelihood model to build the final model. The approach allows multiple secondary variables to be simultaneously integrated into the mapping of the primary variable. The Bayesian updating approach is demonstrated using a case study from Northern Ireland where the history of mineral prospecting for precious and base metals dates from the 18th century. Vein-hosted, strata-bound and volcanogenic occurrences of mineralisation are found. The geostatistical technique was used to improve the resolution of soil geochemistry, collected one sample per 2 km2, by integrating more closely measured airborne geophysical data from the GSNI Tellus Survey, measured over a footprint of 65 x 200 m. The directly measured geochemistry data were considered as primary data in the Bayesian approach and the airborne radiometric data were used as secondary data. The approach produced more detailed updated maps and in particular maximized information on mapped estimates of zinc, copper and lead. Greater delineation of an elongated northwest/southeast trending zone in the updated maps strengthened the potential to investigate stratabound base metal deposits.
Resumo:
We present a Bayesian-odds-ratio-based algorithm for detecting stellar flares in light-curve data. We assume flares are described by a model in which there is a rapid rise with a half-Gaussian profile, followed by an exponential decay. Our signal model also contains a polynomial background model required to fit underlying light-curve variations in the data, which could otherwise partially mimic a flare. We characterize the false alarm probability and efficiency of this method under the assumption that any unmodelled noise in the data is Gaussian, and compare it with a simpler thresholding method based on that used in Walkowicz et al. We find our method has a significant increase in detection efficiency for low signal-to-noise ratio (S/N) flares. For a conservative false alarm probability our method can detect 95 per cent of flares with S/N less than 20, as compared to S/N of 25 for the simpler method. We also test how well the assumption of Gaussian noise holds by applying the method to a selection of 'quiet' Kepler stars. As an example we have applied our method to a selection of stars in Kepler Quarter 1 data. The method finds 687 flaring stars with a total of 1873 flares after vetos have been applied. For these flares we have made preliminary characterizations of their durations and and S/N.
Resumo:
We consider the local order estimation of nonlinear autoregressive systems with exogenous inputs (NARX), which may have different local dimensions at different points. By minimizing the kernel-based local information criterion introduced in this paper, the strongly consistent estimates for the local orders of the NARX system at points of interest are obtained. The modification of the criterion and a simple procedure of searching the minimum of the criterion, are also discussed. The theoretical results derived here are tested by simulation examples.
Resumo:
This work presents two new score functions based on the Bayesian Dirichlet equivalent uniform (BDeu) score for learning Bayesian network structures. They consider the sensitivity of BDeu to varying parameters of the Dirichlet prior. The scores take on the most adversary and the most beneficial priors among those within a contamination set around the symmetric one. We build these scores in such way that they are decomposable and can be computed efficiently. Because of that, they can be integrated into any state-of-the-art structure learning method that explores the space of directed acyclic graphs and allows decomposable scores. Empirical results suggest that our scores outperform the standard BDeu score in terms of the likelihood of unseen data and in terms of edge discovery with respect to the true network, at least when the training sample size is small. We discuss the relation between these new scores and the accuracy of inferred models. Moreover, our new criteria can be used to identify the amount of data after which learning is saturated, that is, additional data are of little help to improve the resulting model.
Resumo:
This work presents novel algorithms for learning Bayesian networks of bounded treewidth. Both exact and approximate methods are developed. The exact method combines mixed integer linear programming formulations for structure learning and treewidth computation. The approximate method consists in sampling k-trees (maximal graphs of treewidth k), and subsequently selecting, exactly or approximately, the best structure whose moral graph is a subgraph of that k-tree. The approaches are empirically compared to each other and to state-of-the-art methods on a collection of public data sets with up to 100 variables.