29 resultados para Geometric mixture
em Aston University Research Archive
Resumo:
Minimization of a sum-of-squares or cross-entropy error function leads to network outputs which approximate the conditional averages of the target data, conditioned on the input vector. For classifications problems, with a suitably chosen target coding scheme, these averages represent the posterior probabilities of class membership, and so can be regarded as optimal. For problems involving the prediction of continuous variables, however, the conditional averages provide only a very limited description of the properties of the target variables. This is particularly true for problems in which the mapping to be learned is multi-valued, as often arises in the solution of inverse problems, since the average of several correct target values is not necessarily itself a correct value. In order to obtain a complete description of the data, for the purposes of predicting the outputs corresponding to new input vectors, we must model the conditional probability distribution of the target data, again conditioned on the input vector. In this paper we introduce a new class of network models obtained by combining a conventional neural network with a mixture density model. The complete system is called a Mixture Density Network, and can in principle represent arbitrary conditional probability distributions in the same way that a conventional neural network can represent arbitrary functions. We demonstrate the effectiveness of Mixture Density Networks using both a toy problem and a problem involving robot inverse kinematics.
Resumo:
Neural networks can be regarded as statistical models, and can be analysed in a Bayesian framework. Generalisation is measured by the performance on independent test data drawn from the same distribution as the training data. Such performance can be quantified by the posterior average of the information divergence between the true and the model distributions. Averaging over the Bayesian posterior guarantees internal coherence; Using information divergence guarantees invariance with respect to representation. The theory generalises the least mean squares theory for linear Gaussian models to general problems of statistical estimation. The main results are: (1)~the ideal optimal estimate is always given by average over the posterior; (2)~the optimal estimate within a computational model is given by the projection of the ideal estimate to the model. This incidentally shows some currently popular methods dealing with hyperpriors are in general unnecessary and misleading. The extension of information divergence to positive normalisable measures reveals a remarkable relation between the dlt dual affine geometry of statistical manifolds and the geometry of the dual pair of Banach spaces Ld and Ldd. It therefore offers conceptual simplification to information geometry. The general conclusion on the issue of evaluating neural network learning rules and other statistical inference methods is that such evaluations are only meaningful under three assumptions: The prior P(p), describing the environment of all the problems; the divergence Dd, specifying the requirement of the task; and the model Q, specifying available computing resources.
Resumo:
Mixture Density Networks (MDNs) are a well-established method for modelling the conditional probability density which is useful for complex multi-valued functions where regression methods (such as MLPs) fail. In this paper we extend earlier research of a regularisation method for a special case of MDNs to the general case using evidence based regularisation and we show how the Hessian of the MDN error function can be evaluated using R-propagation. The method is tested on two data sets and compared with early stopping.
Resumo:
This technical report contains all technical information and results from experiments where Mixture Density Networks (MDN) using an RBF network and fixed kernel means and variances were used to infer the wind direction from satellite data from the ersII weather satellite. The regularisation is based on the evidence framework and three different approximations were used to estimate the regularisation parameter. The results were compared with the results by `early stopping'.
Resumo:
Purpose: The aim of this study was to compare a developmental optical coherence tomography (OCT) based contact lens inspection instrument to a widely used geometric inspection instrument (Optimec JCF), to establish the capability of a market focused OCT system. Methods: Measurements of 27 soft spherical contact lenses were made using the Optimec JCF and a new OCT based instrument, the Optimec is830. Twelve of the lenses analysed were specially commissioned from a traditional hydrogel (Contamac GM Advance 49%) and 12 from a silicone hydrogel (Contamac Definitive 65), each set with a range of back optic zone radius (BOZR) and centre thickness (CT) values. Three commercial lenses were also measured; CooperVision MyDay (Stenfilcon A) in −10D, −3D and +6D powers. Two measurements of BOZR, CT and total diameter were made for each lens in temperature controlled saline on both instruments. Results: The results showed that the is830 and JCF measurements were comparable, but that the is830 had a better repeatability coefficient for BOZR (0.065 mm compared to 0.151 mm) and CT (0.008 mm compared to 0.027 mm). Both instruments had similar results for total diameter (0.041 mm compared to 0.044 mm). Conclusions: The OCT based instrument assessed in this study is able to match and improve on the JCF instrument for the measurement of total diameter, back optic zone radius and centre thickness for soft contact lenses in temperature controlled saline.
Resumo:
Training Mixture Density Network (MDN) configurations within the NETLAB framework takes time due to the nature of the computation of the error function and the gradient of the error function. By optimising the computation of these functions, so that gradient information is computed in parameter space, training time is decreased by at least a factor of sixty for the example given. Decreased training time increases the spectrum of problems to which MDNs can be practically applied making the MDN framework an attractive method to the applied problem solver.
Resumo:
Mixture Density Networks are a principled method to model conditional probability density functions which are non-Gaussian. This is achieved by modelling the conditional distribution for each pattern with a Gaussian Mixture Model for which the parameters are generated by a neural network. This thesis presents a novel method to introduce regularisation in this context for the special case where the mean and variance of the spherical Gaussian Kernels in the mixtures are fixed to predetermined values. Guidelines for how these parameters can be initialised are given, and it is shown how to apply the evidence framework to mixture density networks to achieve regularisation. This also provides an objective stopping criteria that can replace the `early stopping' methods that have previously been used. If the neural network used is an RBF network with fixed centres this opens up new opportunities for improved initialisation of the network weights, which are exploited to start training relatively close to the optimum. The new method is demonstrated on two data sets. The first is a simple synthetic data set while the second is a real life data set, namely satellite scatterometer data used to infer the wind speed and wind direction near the ocean surface. For both data sets the regularisation method performs well in comparison with earlier published results. Ideas on how the constraint on the kernels may be relaxed to allow fully adaptable kernels are presented.
Resumo:
We have proposed a novel robust inversion-based neurocontroller that searches for the optimal control law by sampling from the estimated Gaussian distribution of the inverse plant model. However, for problems involving the prediction of continuous variables, a Gaussian model approximation provides only a very limited description of the properties of the inverse model. This is usually the case for problems in which the mapping to be learned is multi-valued or involves hysteritic transfer characteristics. This often arises in the solution of inverse plant models. In order to obtain a complete description of the inverse model, a more general multicomponent distributions must be modeled. In this paper we test whether our proposed sampling approach can be used when considering an arbitrary conditional probability distributions. These arbitrary distributions will be modeled by a mixture density network. Importance sampling provides a structured and principled approach to constrain the complexity of the search space for the ideal control law. The effectiveness of the importance sampling from an arbitrary conditional probability distribution will be demonstrated using a simple single input single output static nonlinear system with hysteretic characteristics in the inverse plant model.
Resumo:
When applying multivariate analysis techniques in information systems and social science disciplines, such as management information systems (MIS) and marketing, the assumption that the empirical data originate from a single homogeneous population is often unrealistic. When applying a causal modeling approach, such as partial least squares (PLS) path modeling, segmentation is a key issue in coping with the problem of heterogeneity in estimated cause-and-effect relationships. This chapter presents a new PLS path modeling approach which classifies units on the basis of the heterogeneity of the estimates in the inner model. If unobserved heterogeneity significantly affects the estimated path model relationships on the aggregate data level, the methodology will allow homogenous groups of observations to be created that exhibit distinctive path model estimates. The approach will, thus, provide differentiated analytical outcomes that permit more precise interpretations of each segment formed. An application on a large data set in an example of the American customer satisfaction index (ACSI) substantiates the methodology’s effectiveness in evaluating PLS path modeling results.
Resumo:
Traditional approaches to calculate total factor productivity change through Malmquist indexes rely on distance functions. In this paper we show that the use of distance functions as a means to calculate total factor productivity change may introduce some bias in the analysis, and therefore we propose a procedure that calculates total factor productivity change through observed values only. Our total factor productivity change is then decomposed into efficiency change, technological change, and a residual effect. This decomposition makes use of a non-oriented measure in order to avoid problems associated with the traditional use of radial oriented measures, especially when variable returns to scale technologies are to be compared.
Resumo:
Traditional approaches to calculate total factor productivity (TFP) change through Malmquist indexes rely on distance functions. In this paper we show that the use of distance functions as a means to calculate TFP change may introduce some bias in the analysis, and therefore we propose a procedure that calculates TFP change through observed values only. Our total TFP change is then decomposed into efficiency change, technological change, and a residual effect. This decomposition makes use of a non-oriented measure in order to avoid problems associated with the traditional use of radial oriented measures, especially when variable returns to scale technologies are to be compared. The proposed approach is applied in this paper to a sample of Portuguese bank branches.
Resumo:
Mixture Density Networks are a principled method to model conditional probability density functions which are non-Gaussian. This is achieved by modelling the conditional distribution for each pattern with a Gaussian Mixture Model for which the parameters are generated by a neural network. This thesis presents a novel method to introduce regularisation in this context for the special case where the mean and variance of the spherical Gaussian Kernels in the mixtures are fixed to predetermined values. Guidelines for how these parameters can be initialised are given, and it is shown how to apply the evidence framework to mixture density networks to achieve regularisation. This also provides an objective stopping criteria that can replace the `early stopping' methods that have previously been used. If the neural network used is an RBF network with fixed centres this opens up new opportunities for improved initialisation of the network weights, which are exploited to start training relatively close to the optimum. The new method is demonstrated on two data sets. The first is a simple synthetic data set while the second is a real life data set, namely satellite scatterometer data used to infer the wind speed and wind direction near the ocean surface. For both data sets the regularisation method performs well in comparison with earlier published results. Ideas on how the constraint on the kernels may be relaxed to allow fully adaptable kernels are presented.
Resumo:
Observers perceive sinusoidal shading patterns as being due to sinusoidally corrugated surfaces, and perceive surface peaks to be offset from luminance maxima by between zero and 1/4 wavelength. This offset varies with grating orientation. Physically, the shading profile of a sinusoidal surface will be approximately sinusoidal, with the same spatial frequency as the surface, only when: (A) it is lit suitably obliquely by a point source, or (B) the light source is diffuse and hemispherical--the 'dark is deep' rule applies. For A, surface peaks will be offset by 1/4 wavelength from the luminance maxima; for B, this offset will be zero. As the sum of two same-frequency sinusoids with different phases is a sinusoid of intermediate phase, our results suggest that observers assume a mixture of two light sources whose relative strength varies with grating orientation. The perceived surface offsets imply that gratings close to horizontal are taken to be lit by a point source; those close to vertical by a diffuse source. [Supported by EPSRC grants to AJS and MAG].
Resumo:
Most object-based approaches to Geographical Information Systems (GIS) have concentrated on the representation of geometric properties of objects in terms of fixed geometry. In our road traffic marking application domain we have a requirement to represent the static locations of the road markings but also enforce the associated regulations, which are typically geometric in nature. For example a give way line of a pedestrian crossing in the UK must be within 1100-3000 mm of the edge of the crossing pattern. In previous studies of the application of spatial rules (often called 'business logic') in GIS emphasis has been placed on the representation of topological constraints and data integrity checks. There is very little GIS literature that describes models for geometric rules, although there are some examples in the Computer Aided Design (CAD) literature. This paper introduces some of the ideas from so called variational CAD models to the GIS application domain, and extends these using a Geography Markup Language (GML) based representation. In our application we have an additional requirement; the geometric rules are often changed and vary from country to country so should be represented in a flexible manner. In this paper we describe an elegant solution to the representation of geometric rules, such as requiring lines to be offset from other objects. The method uses a feature-property model embraced in GML 3.1 and extends the possible relationships in feature collections to permit the application of parameterized geometric constraints to sub features. We show the parametric rule model we have developed and discuss the advantage of using simple parametric expressions in the rule base. We discuss the possibilities and limitations of our approach and relate our data model to GML 3.1. © 2006 Springer-Verlag Berlin Heidelberg.