78 resultados para Non-gaussian statistical mechanics
Resumo:
Mixture Density Networks are a principled method to model conditional probability density functions which are non-Gaussian. This is achieved by modelling the conditional distribution for each pattern with a Gaussian Mixture Model for which the parameters are generated by a neural network. This thesis presents a novel method to introduce regularisation in this context for the special case where the mean and variance of the spherical Gaussian Kernels in the mixtures are fixed to predetermined values. Guidelines for how these parameters can be initialised are given, and it is shown how to apply the evidence framework to mixture density networks to achieve regularisation. This also provides an objective stopping criteria that can replace the `early stopping' methods that have previously been used. If the neural network used is an RBF network with fixed centres this opens up new opportunities for improved initialisation of the network weights, which are exploited to start training relatively close to the optimum. The new method is demonstrated on two data sets. The first is a simple synthetic data set while the second is a real life data set, namely satellite scatterometer data used to infer the wind speed and wind direction near the ocean surface. For both data sets the regularisation method performs well in comparison with earlier published results. Ideas on how the constraint on the kernels may be relaxed to allow fully adaptable kernels are presented.
Resumo:
The dynamics of on-line learning is investigated for structurally unrealizable tasks in the context of two-layer neural networks with an arbitrary number of hidden neurons. Within a statistical mechanics framework, a closed set of differential equations describing the learning dynamics can be derived, for the general case of unrealizable isotropic tasks. In the asymptotic regime one can solve the dynamics analytically in the limit of large number of hidden neurons, providing an analytical expression for the residual generalization error, the optimal and critical asymptotic training parameters, and the corresponding prefactor of the generalization error decay.
Resumo:
Natural gradient learning is an efficient and principled method for improving on-line learning. In practical applications there will be an increased cost required in estimating and inverting the Fisher information matrix. We propose to use the matrix momentum algorithm in order to carry out efficient inversion and study the efficacy of a single step estimation of the Fisher information matrix. We analyse the proposed algorithm in a two-layer network, using a statistical mechanics framework which allows us to describe analytically the learning dynamics, and compare performance with true natural gradient learning and standard gradient descent.
Resumo:
We apply methods of Statistical Mechanics to study the generalization performance of Support vector Machines in large data spaces.
Resumo:
This paper presents a general methodology for estimating and incorporating uncertainty in the controller and forward models for noisy nonlinear control problems. Conditional distribution modeling in a neural network context is used to estimate uncertainty around the prediction of neural network outputs. The developed methodology circumvents the dynamic programming problem by using the predicted neural network uncertainty to localize the possible control solutions to consider. A nonlinear multivariable system with different delays between the input-output pairs is used to demonstrate the successful application of the developed control algorithm. The proposed method is suitable for redundant control systems and allows us to model strongly non Gaussian distributions of control signal as well as processes with hysteresis.
Resumo:
We consider the direct adaptive inverse control of nonlinear multivariable systems with different delays between every input-output pair. In direct adaptive inverse control, the inverse mapping is learned from examples of input-output pairs. This makes the obtained controller sub optimal, since the network may have to learn the response of the plant over a larger operational range than necessary. Moreover, in certain applications, the control problem can be redundant, implying that the inverse problem is ill posed. In this paper we propose a new algorithm which allows estimating and exploiting uncertainty in nonlinear multivariable control systems. This approach allows us to model strongly non-Gaussian distribution of control signals as well as processes with hysteresis. The proposed algorithm circumvents the dynamic programming problem by using the predicted neural network uncertainty to localise the possible control solutions to consider.
Resumo:
Mixture Density Networks are a principled method to model conditional probability density functions which are non-Gaussian. This is achieved by modelling the conditional distribution for each pattern with a Gaussian Mixture Model for which the parameters are generated by a neural network. This thesis presents a novel method to introduce regularisation in this context for the special case where the mean and variance of the spherical Gaussian Kernels in the mixtures are fixed to predetermined values. Guidelines for how these parameters can be initialised are given, and it is shown how to apply the evidence framework to mixture density networks to achieve regularisation. This also provides an objective stopping criteria that can replace the `early stopping' methods that have previously been used. If the neural network used is an RBF network with fixed centres this opens up new opportunities for improved initialisation of the network weights, which are exploited to start training relatively close to the optimum. The new method is demonstrated on two data sets. The first is a simple synthetic data set while the second is a real life data set, namely satellite scatterometer data used to infer the wind speed and wind direction near the ocean surface. For both data sets the regularisation method performs well in comparison with earlier published results. Ideas on how the constraint on the kernels may be relaxed to allow fully adaptable kernels are presented.
Resumo:
We describe a template model for perception of edge blur and identify a crucial early nonlinearity in this process. The main principle is to spatially filter the edge image to produce a 'signature', and then find which of a set of templates best fits that signature. Psychophysical blur-matching data strongly support the use of a second-derivative signature, coupled to Gaussian first-derivative templates. The spatial scale of the best-fitting template signals the edge blur. This model predicts blur-matching data accurately for a wide variety of Gaussian and non-Gaussian edges, but it suffers a bias when edges of opposite sign come close together in sine-wave gratings and other periodic images. This anomaly suggests a second general principle: the region of an image that 'belongs' to a given edge should have a consistent sign or direction of luminance gradient. Segmentation of the gradient profile into regions of common sign is achieved by implementing the second-derivative 'signature' operator as two first-derivative operators separated by a half-wave rectifier. This multiscale system of nonlinear filters predicts perceived blur accurately for periodic and aperiodic waveforms. We also outline its extension to 2-D images and infer the 2-D shape of the receptive fields.
Resumo:
Edge blur is an important perceptual cue, but how does the visual system encode the degree of blur at edges? Blur could be measured by the width of the luminance gradient profile, peak ^ trough separation in the 2nd derivative profile, or the ratio of 1st-to-3rd derivative magnitudes. In template models, the system would store a set of templates of different sizes and find which one best fits the `signature' of the edge. The signature could be the luminance profile itself, or one of its spatial derivatives. I tested these possibilities in blur-matching experiments. In a 2AFC staircase procedure, observers adjusted the blur of Gaussian edges (30% contrast) to match the perceived blur of various non-Gaussian test edges. In experiment 1, test stimuli were mixtures of 2 Gaussian edges (eg 10 and 30 min of arc blur) at the same location, while in experiment 2, test stimuli were formed from a blurred edge sharpened to different extents by a compressive transformation. Predictions of the various models were tested against the blur-matching data, but only one model was strongly supported. This was the template model, in which the input signature is the 2nd derivative of the luminance profile, and the templates are applied to this signature at the zero-crossings. The templates are Gaussian derivative receptive fields that covary in width and length to form a self-similar set (ie same shape, different sizes). This naturally predicts that shorter edges should look sharper. As edge length gets shorter, responses of longer templates drop more than shorter ones, and so the response distribution shifts towards shorter (smaller) templates, signalling a sharper edge. The data confirmed this, including the scale-invariance implied by self-similarity, and a good fit was obtained from templates with a length-to-width ratio of about 1. The simultaneous analysis of edge blur and edge location may offer a new solution to the multiscale problem in edge detection.
Resumo:
The typical behavior of the relay-without-delay channel under low-density parity-check coding and its multiple-unit generalization, termed the relay array, is studied using methods of statistical mechanics. A demodulate-and- forward strategy is analytically solved using the replica symmetric ansatz which is exact in the system studied at Nishimori's temperature. In particular, the typical level of improvement in communication performance by relaying messages is shown in the case of a small and a large number of relay units. © 2007 The American Physical Society.
Resumo:
Typical properties of sparse random matrices over finite (Galois) fields are studied, in the limit of large matrices, using techniques from the physics of disordered systems. For the case of a finite field GF(q) with prime order q, we present results for the average kernel dimension, average dimension of the eigenvector spaces and the distribution of the eigenvalues. The number of matrices for a given distribution of entries is also calculated for the general case. The significance of these results to error-correcting codes and random graphs is also discussed.
Resumo:
We apply well known nonlinear diffraction theory governing focusing of a powerful light beam of arbitrary shape in medium with Kerr nonlinearity to the analysis of femtosecond (fs) laser processing of dielectric in sub-critical (input power less than the critical power of selffocusing) regime. Simple analytical expressions are derived for the input beam power and spatial focusing parameter (numerical aperture) that are required for achieving an inscription threshold. Application of non-Gaussian laser beams for better controlled fs inscription at higher powers is also discussed. © 2007 Optical Society of America.
Resumo:
Computing circuits composed of noisy logical gates and their ability to represent arbitrary Boolean functions with a given level of error are investigated within a statistical mechanics setting. Existing bounds on their performance are straightforwardly retrieved, generalized, and identified as the corresponding typical-case phase transitions. Results on error rates, function depth, and sensitivity, and their dependence on the gate-type and noise model used are also obtained.
Resumo:
We consider a variation of the prototype combinatorial optimization problem known as graph colouring. Our optimization goal is to colour the vertices of a graph with a fixed number of colours, in a way to maximize the number of different colours present in the set of nearest neighbours of each given vertex. This problem, which we pictorially call palette-colouring, has been recently addressed as a basic example of a problem arising in the context of distributed data storage. Even though it has not been proved to be NP-complete, random search algorithms find the problem hard to solve. Heuristics based on a naive belief propagation algorithm are observed to work quite well in certain conditions. In this paper, we build upon the mentioned result, working out the correct belief propagation algorithm, which needs to take into account the many-body nature of the constraints present in this problem. This method improves the naive belief propagation approach at the cost of increased computational effort. We also investigate the emergence of a satisfiable-to-unsatisfiable 'phase transition' as a function of the vertex mean degree, for different ensembles of sparse random graphs in the large size ('thermodynamic') limit.
Resumo:
The problem of learning by examples in ultrametric committee machines (UCMs) is studied within the framework of statistical mechanics. Using the replica formalism we calculate the average generalization error in UCMs with L hidden layers and for a large enough number of units. In most of the regimes studied we find that the generalization error, as a function of the number of examples presented, develops a discontinuous drop at a critical value of the load parameter. We also find that when L>1 a number of teacher networks with the same number of hidden layers and different overlaps induce learning processes with the same critical points.