969 resultados para Bayesian Learning


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bayesian techniques have been developed over many years in a range of different fields, but have only recently been applied to the problem of learning in neural networks. As well as providing a consistent framework for statistical pattern recognition, the Bayesian approach offers a number of practical advantages including a potential solution to the problem of over-fitting. This chapter aims to provide an introductory overview of the application of Bayesian methods to neural networks. It assumes the reader is familiar with standard feed-forward network models and how to train them using conventional techniques.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bayesian techniques have been developed over many years in a range of different fields, but have only recently been applied to the problem of learning in neural networks. As well as providing a consistent framework for statistical pattern recognition, the Bayesian approach offers a number of practical advantages including a potential solution to the problem of over-fitting. This chapter aims to provide an introductory overview of the application of Bayesian methods to neural networks. It assumes the reader is familiar with standard feed-forward network models and how to train them using conventional techniques.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We are concerned with the problem of image segmentation in which each pixel is assigned to one of a predefined finite number of classes. In Bayesian image analysis, this requires fusing together local predictions for the class labels with a prior model of segmentations. Markov Random Fields (MRFs) have been used to incorporate some of this prior knowledge, but this not entirely satisfactory as inference in MRFs is NP-hard. The multiscale quadtree model of Bouman and Shapiro (1994) is an attractive alternative, as this is a tree-structured belief network in which inference can be carried out in linear time (Pearl 1988). It is an hierarchical model where the bottom-level nodes are pixels, and higher levels correspond to downsampled versions of the image. The conditional-probability tables (CPTs) in the belief network encode the knowledge of how the levels interact. In this paper we discuss two methods of learning the CPTs given training data, using (a) maximum likelihood and the EM algorithm and (b) emphconditional maximum likelihood (CML). Segmentations obtained using networks trained by CML show a statistically-significant improvement in performance on synthetic images. We also demonstrate the methods on a real-world outdoor-scene segmentation task.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

On-line learning is one of the most powerful and commonly used techniques for training large layered networks and has been used successfully in many real-world applications. Traditional analytical methods have been recently complemented by ones from statistical physics and Bayesian statistics. This powerful combination of analytical methods provides more insight and deeper understanding of existing algorithms and leads to novel and principled proposals for their improvement. This book presents a coherent picture of the state-of-the-art in the theoretical analysis of on-line learning. An introduction relates the subject to other developments in neural networks and explains the overall picture. Surveys by leading experts in the field combine new and established material and enable non-experts to learn more about the techniques and methods used. This book, the first in the area, provides a comprehensive view of the subject and will be welcomed by mathematicians, scientists and engineers, whether in industry or academia.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The assessment of the reliability of systems which learn from data is a key issue to investigate thoroughly before the actual application of information processing techniques to real-world problems. Over the recent years Gaussian processes and Bayesian neural networks have come to the fore and in this thesis their generalisation capabilities are analysed from theoretical and empirical perspectives. Upper and lower bounds on the learning curve of Gaussian processes are investigated in order to estimate the amount of data required to guarantee a certain level of generalisation performance. In this thesis we analyse the effects on the bounds and the learning curve induced by the smoothness of stochastic processes described by four different covariance functions. We also explain the early, linearly-decreasing behaviour of the curves and we investigate the asymptotic behaviour of the upper bounds. The effect of the noise and the characteristic lengthscale of the stochastic process on the tightness of the bounds are also discussed. The analysis is supported by several numerical simulations. The generalisation error of a Gaussian process is affected by the dimension of the input vector and may be decreased by input-variable reduction techniques. In conventional approaches to Gaussian process regression, the positive definite matrix estimating the distance between input points is often taken diagonal. In this thesis we show that a general distance matrix is able to estimate the effective dimensionality of the regression problem as well as to discover the linear transformation from the manifest variables to the hidden-feature space, with a significant reduction of the input dimension. Numerical simulations confirm the significant superiority of the general distance matrix with respect to the diagonal one.In the thesis we also present an empirical investigation of the generalisation errors of neural networks trained by two Bayesian algorithms, the Markov Chain Monte Carlo method and the evidence framework; the neural networks have been trained on the task of labelling segmented outdoor images.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Control design for stochastic uncertain nonlinear systems is traditionally based on minimizing the expected value of a suitably chosen loss function. Moreover, most control methods usually assume the certainty equivalence principle to simplify the problem and make it computationally tractable. We offer an improved probabilistic framework which is not constrained by these previous assumptions, and provides a more natural framework for incorporating and dealing with uncertainty. The focus of this paper is on developing this framework to obtain an optimal control law strategy using a fully probabilistic approach for information extraction from process data, which does not require detailed knowledge of system dynamics. Moreover, the proposed control method framework allows handling the problem of input-dependent noise. A basic paradigm is proposed and the resulting algorithm is discussed. The proposed probabilistic control method is for the general nonlinear class of discrete-time systems. It is demonstrated theoretically on the affine class. A nonlinear simulation example is also provided to validate theoretical development.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We developed a parallel strategy for learning optimally specific realizable rules by perceptrons, in an online learning scenario. Our result is a generalization of the Caticha–Kinouchi (CK) algorithm developed for learning a perceptron with a synaptic vector drawn from a uniform distribution over the N-dimensional sphere, so called the typical case. Our method outperforms the CK algorithm in almost all possible situations, failing only in a denumerable set of cases. The algorithm is optimal in the sense that it saturates Bayesian bounds when it succeeds.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sentiment analysis has long focused on binary classification of text as either positive or negative. There has been few work on mapping sentiments or emotions into multiple dimensions. This paper studies a Bayesian modeling approach to multi-class sentiment classification and multidimensional sentiment distributions prediction. It proposes effective mechanisms to incorporate supervised information such as labeled feature constraints and document-level sentiment distributions derived from the training data into model learning. We have evaluated our approach on the datasets collected from the confession section of the Experience Project website where people share their life experiences and personal stories. Our results show that using the latent representation of the training documents derived from our approach as features to build a maximum entropy classifier outperforms other approaches on multi-class sentiment classification. In the more difficult task of multi-dimensional sentiment distributions prediction, our approach gives superior performance compared to a few competitive baselines. © 2012 ACM.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a comparative study of three closely related Bayesian models for unsupervised document level sentiment classification, namely, the latent sentiment model (LSM), the joint sentiment-topic (JST) model, and the Reverse-JST model. Extensive experiments have been conducted on two corpora, the movie review dataset and the multi-domain sentiment dataset. It has been found that while all the three models achieve either better or comparable performance on these two corpora when compared to the existing unsupervised sentiment classification approaches, both JST and Reverse-JST are able to extract sentiment-oriented topics. In addition, Reverse-JST always performs worse than JST suggesting that the JST model is more appropriate for joint sentiment topic detection.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present and analyze three different online algorithms for learning in discrete Hidden Markov Models (HMMs) and compare their performance with the Baldi-Chauvin Algorithm. Using the Kullback-Leibler divergence as a measure of the generalization error we draw learning curves in simplified situations and compare the results. The performance for learning drifting concepts of one of the presented algorithms is analyzed and compared with the Baldi-Chauvin algorithm in the same situations. A brief discussion about learning and symmetry breaking based on our results is also presented. © 2006 American Institute of Physics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bayesian algorithms pose a limit to the performance learning algorithms can achieve. Natural selection should guide the evolution of information processing systems towards those limits. What can we learn from this evolution and what properties do the intermediate stages have? While this question is too general to permit any answer, progress can be made by restricting the class of information processing systems under study. We present analytical and numerical results for the evolution of on-line algorithms for learning from examples for neural network classifiers, which might include or not a hidden layer. The analytical results are obtained by solving a variational problem to determine the learning algorithm that leads to maximum generalization ability. Simulations using evolutionary programming, for programs that implement learning algorithms, confirm and expand the results. The principal result is not just that the evolution is towards a Bayesian limit. Indeed it is essentially reached. In addition we find that evolution is driven by the discovery of useful structures or combinations of variables and operators. In different runs the temporal order of the discovery of such combinations is unique. The main result is that combinations that signal the surprise brought by an example arise always before combinations that serve to gauge the performance of the learning algorithm. This latter structures can be used to implement annealing schedules. The temporal ordering can be understood analytically as well by doing the functional optimization in restricted functional spaces. We also show that there is data suggesting that the appearance of these traits also follows the same temporal ordering in biological systems. © 2006 American Institute of Physics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this letter, we derive continuum equations for the generalization error of the Bayesian online algorithm (BOnA) for the one-layer perceptron with a spherical covariance matrix using the Rosenblatt potential and show, by numerical calculations, that the asymptotic performance of the algorithm is the same as the one for the optimal algorithm found by means of variational methods with the added advantage that the BOnA does not use any inaccessible information during learning. © 2007 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The inverse controller is traditionally assumed to be a deterministic function. This paper presents a pedagogical methodology for estimating the stochastic model of the inverse controller. The proposed method is based on Bayes' theorem. Using Bayes' rule to obtain the stochastic model of the inverse controller allows the use of knowledge of uncertainty from both the inverse and the forward model in estimating the optimal control signal. The paper presents the methodology for general nonlinear systems and is demonstrated on nonlinear single-input-single-output (SISO) and multiple-input-multiple-output (MIMO) examples. © 2006 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Graph-based representations have been used with considerable success in computer vision in the abstraction and recognition of object shape and scene structure. Despite this, the methodology available for learning structural representations from sets of training examples is relatively limited. In this paper we take a simple yet effective Bayesian approach to attributed graph learning. We present a naïve node-observation model, where we make the important assumption that the observation of each node and each edge is independent of the others, then we propose an EM-like approach to learn a mixture of these models and a Minimum Message Length criterion for components selection. Moreover, in order to avoid the bias that could arise with a single estimation of the node correspondences, we decide to estimate the sampling probability over all the possible matches. Finally we show the utility of the proposed approach on popular computer vision tasks such as 2D and 3D shape recognition. © 2011 Springer-Verlag.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The real purpose of collecting big data is to identify causality in the hope that this will facilitate credible predictivity . But the search for causality can trap one into infinite regress, and thus one takes refuge in seeking associations between variables in data sets. Regrettably, the mere knowledge of associations does not enable predictivity. Associations need to be embedded within the framework of probability calculus to make coherent predictions. This is so because associations are a feature of probability models, and hence they do not exist outside the framework of a model. Measures of association, like correlation, regression, and mutual information merely refute a preconceived model. Estimated measures of associations do not lead to a probability model; a model is the product of pure thought. This paper discusses these and other fundamentals that are germane to seeking associations in particular, and machine learning in general. ACM Computing Classification System (1998): H.1.2, H.2.4., G.3.