936 resultados para Bayesian classifier
Resumo:
Bagging is a method of obtaining more ro- bust predictions when the model class under consideration is unstable with respect to the data, i.e., small changes in the data can cause the predicted values to change significantly. In this paper, we introduce a Bayesian ver- sion of bagging based on the Bayesian boot- strap. The Bayesian bootstrap resolves a the- oretical problem with ordinary bagging and often results in more efficient estimators. We show how model averaging can be combined within the Bayesian bootstrap and illustrate the procedure with several examples.
Resumo:
This chapter presents a model averaging approach in the M-open setting using sample re-use methods to approximate the predictive distribution of future observations. It first reviews the standard M-closed Bayesian Model Averaging approach and decision-theoretic methods for producing inferences and decisions. It then reviews model selection from the M-complete and M-open perspectives, before formulating a Bayesian solution to model averaging in the M-open perspective. It constructs optimal weights for MOMA:M-open Model Averaging using a decision-theoretic framework, where models are treated as part of the ‘action space’ rather than unknown states of nature. Using ‘incompatible’ retrospective and prospective models for data from a case-control study, the chapter demonstrates that MOMA gives better predictive accuracy than the proxy models. It concludes with open questions and future directions.
Resumo:
We propose a novel unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation is to represent the pattern of links between records as a {\em bipartite} graph, in which records are directly linked to latent true individuals, and only indirectly linked to other records. This flexible new representation of the linkage structure naturally allows us to estimate the attributes of the unique observable people in the population, calculate $k$-way posterior probabilities of matches across records, and propagate the uncertainty of record linkage into later analyses. Our linkage structure lends itself to an efficient, linear-time, hybrid Markov chain Monte Carlo algorithm, which overcomes many obstacles encountered by previously proposed methods of record linkage, despite the high dimensional parameter space. We assess our results on real and simulated data.
Resumo:
Ecosystems consist of complex dynamic interactions among species and the environment, the understanding of which has implications for predicting the environmental response to changes in climate and biodiversity. However, with the recent adoption of more explorative tools, like Bayesian networks, in predictive ecology, few assumptions can be made about the data and complex, spatially varying interactions can be recovered from collected field data. In this study, we compare Bayesian network modelling approaches accounting for latent effects to reveal species dynamics for 7 geographically and temporally varied areas within the North Sea. We also apply structure learning techniques to identify functional relationships such as prey–predator between trophic groups of species that vary across space and time. We examine if the use of a general hidden variable can reflect overall changes in the trophic dynamics of each spatial system and whether the inclusion of a specific hidden variable can model unmeasured group of species. The general hidden variable appears to capture changes in the variance of different groups of species biomass. Models that include both general and specific hidden variables resulted in identifying similarity with the underlying food web dynamics and modelling spatial unmeasured effect. We predict the biomass of the trophic groups and find that predictive accuracy varies with the models' features and across the different spatial areas thus proposing a model that allows for spatial autocorrelation and two hidden variables. Our proposed model was able to produce novel insights on this ecosystem's dynamics and ecological interactions mainly because we account for the heterogeneous nature of the driving factors within each area and their changes over time. Our findings demonstrate that accounting for additional sources of variation, by combining structure learning from data and experts' knowledge in the model architecture, has the potential for gaining deeper insights into the structure and stability of ecosystems. Finally, we were able to discover meaningful functional networks that were spatially and temporally differentiated with the particular mechanisms varying from trophic associations through interactions with climate and commercial fisheries.
Resumo:
This paper provides algorithms that use an information-theoretic analysis to learn Bayesian network structures from data. Based on our three-phase learning framework, we develop efficient algorithms that can effectively learn Bayesian networks, requiring only polynomial numbers of conditional independence (CI) tests in typical cases. We provide precise conditions that specify when these algorithms are guaranteed to be correct as well as empirical evidence (from real world applications and simulation tests) that demonstrates that these systems work efficiently and reliably in practice.
Resumo:
Feature selection and feature weighting are useful techniques for improving the classification accuracy of K-nearest-neighbor (K-NN) rule. The term feature selection refers to algorithms that select the best subset of the input feature set. In feature weighting, each feature is multiplied by a weight value proportional to the ability of the feature to distinguish pattern classes. In this paper, a novel hybrid approach is proposed for simultaneous feature selection and feature weighting of K-NN rule based on Tabu Search (TS) heuristic. The proposed TS heuristic in combination with K-NN classifier is compared with several classifiers on various available data sets. The results have indicated a significant improvement in the performance in classification accuracy. The proposed TS heuristic is also compared with various feature selection algorithms. Experiments performed revealed that the proposed hybrid TS heuristic is superior to both simple TS and sequential search algorithms. We also present results for the classification of prostate cancer using multispectral images, an important problem in biomedicine.