936 resultados para Bayesian classifier
Resumo:
The relationships among organisms and their surroundings can be of immense complexity. To describe and understand an ecosystem as a tangled bank, multiple ways of interaction and their effects have to be considered, such as predation, competition, mutualism and facilitation. Understanding the resulting interaction networks is a challenge in changing environments, e.g. to predict knock-on effects of invasive species and to understand how climate change impacts biodiversity. The elucidation of complex ecological systems with their interactions will benefit enormously from the development of new machine learning tools that aim to infer the structure of interaction networks from field data. In the present study, we propose a novel Bayesian regression and multiple changepoint model (BRAM) for reconstructing species interaction networks from observed species distributions. The model has been devised to allow robust inference in the presence of spatial autocorrelation and distributional heterogeneity. We have evaluated the model on simulated data that combines a trophic niche model with a stochastic population model on a 2-dimensional lattice, and we have compared the performance of our model with L1-penalized sparse regression (LASSO) and non-linear Bayesian networks with the BDe scoring scheme. In addition, we have applied our method to plant ground coverage data from the western shore of the Outer Hebrides with the objective to infer the ecological interactions. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Recently, Bayesian statistical software has been developed for age-depth modeling (wiggle-match dating) of sequences of densely spaced radiocarbon dates from peat cores. The method is described in non-statistical terms, and is compared with an alternative method of chronological ordering of 14C dates. Case studies include the dating of the start of agriculture in the northeastern part of the Netherlands, and of a possible Hekla-3 tephra layer in the same country. We discuss future enhancements in Bayesian age modeling.
Resumo:
Aim-To develop an expert system model for the diagnosis of fine needle aspiration cytology (FNAC) of the breast.
Methods-Knowledge and uncertainty were represented in the form of a Bayesian belief network which permitted the combination of diagnostic evidence in a cumulative manner and provided a final probability for the possible diagnostic outcomes. The network comprised 10 cytological features (evidence nodes), each independently linked to the diagnosis (decision node) by a conditional probability matrix. The system was designed to be interactive in that the cytopathologist entered evidence into the network in the form of likelihood ratios for the outcomes at each evidence node.
Results-The efficiency of the network was tested on a series of 40 breast FNAC specimens. The highest diagnostic probability provided by the network agreed with the cytopathologists' diagnosis in 100% of cases for the assessment of discrete, benign, and malignant aspirates. A typical probably benign cases were given probabilities in favour of a benign diagnosis. Suspicious cases tended to have similar probabilities for both diagnostic outcomes and so, correctly, could not be assigned as benign or malignant. A closer examination of cumulative belief graphs for the diagnostic sequence of each case provided insight into the diagnostic process, and quantitative data which improved the identification of suspicious cases.
Conclusion-The further development of such a system will have three important roles in breast cytodiagnosis: (1) to aid the cytologist in making a more consistent and objective diagnosis; (2) to provide a teaching tool on breast cytological diagnosis for the non-expert; and (3) it is the first stage in the development of a system capable of automated diagnosis through the use of expert system machine vision.
Resumo:
Mobile malware has been growing in scale and complexity spurred by the unabated uptake of smartphones worldwide. Android is fast becoming the most popular mobile platform resulting in sharp increase in malware targeting the platform. Additionally, Android malware is evolving rapidly to evade detection by traditional signature-based scanning. Despite current detection measures in place, timely discovery of new malware is still a critical issue. This calls for novel approaches to mitigate the growing threat of zero-day Android malware. Hence, the authors develop and analyse proactive machine-learning approaches based on Bayesian classification aimed at uncovering unknown Android malware via static analysis. The study, which is based on a large malware sample set of majority of the existing families, demonstrates detection capabilities with high accuracy. Empirical results and comparative analysis are presented offering useful insight towards development of effective static-analytic Bayesian classification-based solutions for detecting unknown Android malware.
Resumo:
Mineral exploration programmes around the world use data from remote sensing, geophysics and direct sampling. On a regional scale, the combination of airborne geophysics and ground-based geochemical sampling can aid geological mapping and economic minerals exploration. The fact that airborne geophysical and traditional soil-sampling data are generated at different spatial resolutions means that they are not immediately comparable due to their different sampling density. Several geostatistical techniques, including indicator cokriging and collocated cokriging, can be used to integrate different types of data into a geostatistical model. With increasing numbers of variables the inference of the cross-covariance model required for cokriging can be demanding in terms of effort and computational time. In this paper a Gaussian-based Bayesian updating approach is applied to integrate airborne radiometric data and ground-sampled geochemical soil data to maximise information generated from the soil survey, to enable more accurate geological interpretation for the exploration and development of natural resources. The Bayesian updating technique decomposes the collocated estimate into a production of two models: prior and likelihood models. The prior model is built from primary information and the likelihood model is built from secondary information. The prior model is then updated with the likelihood model to build the final model. The approach allows multiple secondary variables to be simultaneously integrated into the mapping of the primary variable. The Bayesian updating approach is demonstrated using a case study from Northern Ireland where the history of mineral prospecting for precious and base metals dates from the 18th century. Vein-hosted, strata-bound and volcanogenic occurrences of mineralisation are found. The geostatistical technique was used to improve the resolution of soil geochemistry, collected one sample per 2 km2, by integrating more closely measured airborne geophysical data from the GSNI Tellus Survey, measured over a footprint of 65 x 200 m. The directly measured geochemistry data were considered as primary data in the Bayesian approach and the airborne radiometric data were used as secondary data. The approach produced more detailed updated maps and in particular maximized information on mapped estimates of zinc, copper and lead. Greater delineation of an elongated northwest/southeast trending zone in the updated maps strengthened the potential to investigate stratabound base metal deposits.
Resumo:
Features analysis is an important task which can significantly affect the performance of automatic bacteria colony picking. Unstructured environments also affect the automatic colony screening. This paper presents a novel approach for adaptive colony segmentation in unstructured environments by treating the detected peaks of intensity histograms as a morphological feature of images. In order to avoid disturbing peaks, an entropy based mean shift filter is introduced to smooth images as a preprocessing step. The relevance and importance of these features can be determined in an improved support vector machine classifier using unascertained least square estimation. Experimental results show that the proposed unascertained least square support vector machine (ULSSVM) has better recognition accuracy than the other state-of-the-art techniques, and its training process takes less time than most of the traditional approaches presented in this paper.
Resumo:
Classification methods with embedded feature selection capability are very appealing for the analysis of complex processes since they allow the analysis of root causes even when the number of input variables is high. In this work, we investigate the performance of three techniques for classification within a Monte Carlo strategy with the aim of root cause analysis. We consider the naive bayes classifier and the logistic regression model with two different implementations for controlling model complexity, namely, a LASSO-like implementation with a L1 norm regularization and a fully Bayesian implementation of the logistic model, the so called relevance vector machine. Several challenges can arise when estimating such models mainly linked to the characteristics of the data: a large number of input variables, high correlation among subsets of variables, the situation where the number of variables is higher than the number of available data points and the case of unbalanced datasets. Using an ecological and a semiconductor manufacturing dataset, we show advantages and drawbacks of each method, highlighting the superior performance in term of classification accuracy for the relevance vector machine with respect to the other classifiers. Moreover, we show how the combination of the proposed techniques and the Monte Carlo approach can be used to get more robust insights into the problem under analysis when faced with challenging modelling conditions.
Resumo:
We present a Bayesian-odds-ratio-based algorithm for detecting stellar flares in light-curve data. We assume flares are described by a model in which there is a rapid rise with a half-Gaussian profile, followed by an exponential decay. Our signal model also contains a polynomial background model required to fit underlying light-curve variations in the data, which could otherwise partially mimic a flare. We characterize the false alarm probability and efficiency of this method under the assumption that any unmodelled noise in the data is Gaussian, and compare it with a simpler thresholding method based on that used in Walkowicz et al. We find our method has a significant increase in detection efficiency for low signal-to-noise ratio (S/N) flares. For a conservative false alarm probability our method can detect 95 per cent of flares with S/N less than 20, as compared to S/N of 25 for the simpler method. We also test how well the assumption of Gaussian noise holds by applying the method to a selection of 'quiet' Kepler stars. As an example we have applied our method to a selection of stars in Kepler Quarter 1 data. The method finds 687 flaring stars with a total of 1873 flares after vetos have been applied. For these flares we have made preliminary characterizations of their durations and and S/N.
Resumo:
This work presents two new score functions based on the Bayesian Dirichlet equivalent uniform (BDeu) score for learning Bayesian network structures. They consider the sensitivity of BDeu to varying parameters of the Dirichlet prior. The scores take on the most adversary and the most beneficial priors among those within a contamination set around the symmetric one. We build these scores in such way that they are decomposable and can be computed efficiently. Because of that, they can be integrated into any state-of-the-art structure learning method that explores the space of directed acyclic graphs and allows decomposable scores. Empirical results suggest that our scores outperform the standard BDeu score in terms of the likelihood of unseen data and in terms of edge discovery with respect to the true network, at least when the training sample size is small. We discuss the relation between these new scores and the accuracy of inferred models. Moreover, our new criteria can be used to identify the amount of data after which learning is saturated, that is, additional data are of little help to improve the resulting model.
Resumo:
This work proposes an extended version of the well-known tree-augmented naive Bayes (TAN) classifier where the structure learning step is performed without requiring features to be connected to the class. Based on a modification of Edmonds’ algorithm, our structure learning procedure explores a superset of the structures that are considered by TAN, yet achieves global optimality of the learning score function in a very efficient way (quadratic in the number of features, the same complexity as learning TANs). A range of experiments show that we obtain models with better accuracy than TAN and comparable to the accuracy of the state-of-the-art classifier averaged one-dependence estimator.
Resumo:
This work presents novel algorithms for learning Bayesian networks of bounded treewidth. Both exact and approximate methods are developed. The exact method combines mixed integer linear programming formulations for structure learning and treewidth computation. The approximate method consists in sampling k-trees (maximal graphs of treewidth k), and subsequently selecting, exactly or approximately, the best structure whose moral graph is a subgraph of that k-tree. The approaches are empirically compared to each other and to state-of-the-art methods on a collection of public data sets with up to 100 variables.