823 resultados para learning networks
Resumo:
This work presents two new score functions based on the Bayesian Dirichlet equivalent uniform (BDeu) score for learning Bayesian network structures. They consider the sensitivity of BDeu to varying parameters of the Dirichlet prior. The scores take on the most adversary and the most beneficial priors among those within a contamination set around the symmetric one. We build these scores in such way that they are decomposable and can be computed efficiently. Because of that, they can be integrated into any state-of-the-art structure learning method that explores the space of directed acyclic graphs and allows decomposable scores. Empirical results suggest that our scores outperform the standard BDeu score in terms of the likelihood of unseen data and in terms of edge discovery with respect to the true network, at least when the training sample size is small. We discuss the relation between these new scores and the accuracy of inferred models. Moreover, our new criteria can be used to identify the amount of data after which learning is saturated, that is, additional data are of little help to improve the resulting model.
Resumo:
This work presents novel algorithms for learning Bayesian networks of bounded treewidth. Both exact and approximate methods are developed. The exact method combines mixed integer linear programming formulations for structure learning and treewidth computation. The approximate method consists in sampling k-trees (maximal graphs of treewidth k), and subsequently selecting, exactly or approximately, the best structure whose moral graph is a subgraph of that k-tree. The approaches are empirically compared to each other and to state-of-the-art methods on a collection of public data sets with up to 100 variables.
Resumo:
This paper addresses the problem of learning Bayesian network structures from data based on score functions that are decomposable. It describes properties that strongly reduce the time and memory costs of many known methods without losing global optimality guarantees. These properties are derived for different score criteria such as Minimum Description Length (or Bayesian Information Criterion), Akaike Information Criterion and Bayesian Dirichlet Criterion. Then a branch-and-bound algorithm is presented that integrates structural constraints with data in a way to guarantee global optimality. As an example, structural constraints are used to map the problem of structure learning in Dynamic Bayesian networks into a corresponding augmented Bayesian network. Finally, we show empirically the benefits of using the properties with state-of-the-art methods and with the new algorithm, which is able to handle larger data sets than before.
Resumo:
This paper explores semi-qualitative probabilistic networks (SQPNs) that combine numeric and qualitative information. We first show that exact inferences with SQPNs are NPPP-Complete. We then show that existing qualitative relations in SQPNs (plus probabilistic logic and imprecise assessments) can be dealt effectively through multilinear programming. We then discuss learning: we consider a maximum likelihood method that generates point estimates given a SQPN and empirical data, and we describe a Bayesian-minded method that employs the Imprecise Dirichlet Model to generate set-valued estimates.
Resumo:
We present a method for learning Bayesian networks from data sets containing thousands of variables without the need for structure constraints. Our approach is made of two parts. The first is a novel algorithm that effectively explores the space of possible parent sets of a node. It guides the exploration towards the most promising parent sets on the basis of an approximated score function that is computed in constant time. The second part is an improvement of an existing ordering-based algorithm for structure optimization. The new algorithm provably achieves a higher score compared to its original formulation. Our novel approach consistently outperforms the state of the art on very large data sets.
Resumo:
Learning Bayesian networks with bounded tree-width has attracted much attention recently, because low tree-width allows exact inference to be performed efficiently. Some existing methods [12, 14] tackle the problem by using k-trees to learn the optimal Bayesian network with tree-width up to k. In this paper, we propose a sampling method to efficiently find representative k-trees by introducing an Informative score function to characterize the quality of a k-tree. The proposed algorithm can efficiently learn a Bayesian network with tree-width at most k. Experiment results indicate that our approach is comparable with exact methods, but is much more computationally efficient.
Resumo:
Bounding the tree-width of a Bayesian network can reduce the chance of overfitting, and allows exact inference to be performed efficiently. Several existing algorithms tackle the problem of learning bounded tree-width Bayesian networks by learning from k-trees as super-structures, but they do not scale to large domains and/or large tree-width. We propose a guided search algorithm to find k-trees with maximum Informative scores, which is a measure of quality for the k-tree in yielding good Bayesian networks. The algorithm achieves close to optimal performance compared to exact solutions in small domains, and can discover better networks than existing approximate methods can in large domains. It also provides an optimal elimination order of variables that guarantees small complexity for later runs of exact inference. Comparisons with well-known approaches in terms of learning and inference accuracy illustrate its capabilities.
Resumo:
Locating new wind farms is of crucial importance for energy policies of the next decade. To select the new location, an accurate picture of the wind fields is necessary. However, characterizing wind fields is a difficult task, since the phenomenon is highly nonlinear and related to complex topographical features. In this paper, we propose both a nonparametric model to estimate wind speed at different time instants and a procedure to discover underrepresented topographic conditions, where new measuring stations could be added. Compared to space filling techniques, this last approach privileges optimization of the output space, thus locating new potential measuring sites through the uncertainty of the model itself.