34 resultados para probabilistic refinement calculus
em Aston University Research Archive
Resumo:
Principal component analysis (PCA) is one of the most popular techniques for processing, compressing and visualising data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a combination of local linear PCA projections. However, conventional PCA does not correspond to a probability density, and so there is no unique way to combine PCA models. Previous attempts to formulate mixture models for PCA have therefore to some extent been ad hoc. In this paper, PCA is formulated within a maximum-likelihood framework, based on a specific form of Gaussian latent variable model. This leads to a well-defined mixture model for probabilistic principal component analysers, whose parameters can be determined using an EM algorithm. We discuss the advantages of this model in the context of clustering, density modelling and local dimensionality reduction, and we demonstrate its application to image compression and handwritten digit recognition.
Resumo:
Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of parameters in a latent variable model closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss the advantages conveyed by the definition of a probability density function for PCA.
Resumo:
Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of parameters in a latent variable model closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss the advantages conveyed by the definition of a probability density function for PCA.
Resumo:
It has been argued that a single two-dimensional visualization plot may not be sufficient to capture all of the interesting aspects of complex data sets, and therefore a hierarchical visualization system is desirable. In this paper we extend an existing locally linear hierarchical visualization system PhiVis ¸iteBishop98a in several directions: bf(1) We allow for em non-linear projection manifolds. The basic building block is the Generative Topographic Mapping. bf(2) We introduce a general formulation of hierarchical probabilistic models consisting of local probabilistic models organized in a hierarchical tree. General training equations are derived, regardless of the position of the model in the tree. bf(3) Using tools from differential geometry we derive expressions for local directional curvatures of the projection manifold. Like PhiVis, our system is statistically principled and is built interactively in a top-down fashion using the EM algorithm. It enables the user to interactively highlight those data in the parent visualization plot which are captured by a child model. We also incorporate into our system a hierarchical, locally selective representation of magnification factors and directional curvatures of the projection manifolds. Such information is important for further refinement of the hierarchical visualization plot, as well as for controlling the amount of regularization imposed on the local models. We demonstrate the principle of the approach on a toy data set and apply our system to two more complex 12- and 19-dimensional data sets.
Resumo:
This Letter addresses image segmentation via a generative model approach. A Bayesian network (BNT) in the space of dyadic wavelet transform coefficients is introduced to model texture images. The model is similar to a Hidden Markov model (HMM), but with non-stationary transitive conditional probability distributions. It is composed of discrete hidden variables and observable Gaussian outputs for wavelet coefficients. In particular, the Gabor wavelet transform is considered. The introduced model is compared with the simplest joint Gaussian probabilistic model for Gabor wavelet coefficients for several textures from the Brodatz album [1]. The comparison is based on cross-validation and includes probabilistic model ensembles instead of single models. In addition, the robustness of the models to cope with additive Gaussian noise is investigated. We further study the feasibility of the introduced generative model for image segmentation in the novelty detection framework [2]. Two examples are considered: (i) sea surface pollution detection from intensity images and (ii) image segmentation of the still images with varying illumination across the scene.
Resumo:
It has been argued that a single two-dimensional visualization plot may not be sufficient to capture all of the interesting aspects of complex data sets, and therefore a hierarchical visualization system is desirable. In this paper we extend an existing locally linear hierarchical visualization system PhiVis ¸iteBishop98a in several directions: bf(1) We allow for em non-linear projection manifolds. The basic building block is the Generative Topographic Mapping (GTM). bf(2) We introduce a general formulation of hierarchical probabilistic models consisting of local probabilistic models organized in a hierarchical tree. General training equations are derived, regardless of the position of the model in the tree. bf(3) Using tools from differential geometry we derive expressions for local directional curvatures of the projection manifold. Like PhiVis, our system is statistically principled and is built interactively in a top-down fashion using the EM algorithm. It enables the user to interactively highlight those data in the ancestor visualization plots which are captured by a child model. We also incorporate into our system a hierarchical, locally selective representation of magnification factors and directional curvatures of the projection manifolds. Such information is important for further refinement of the hierarchical visualization plot, as well as for controlling the amount of regularization imposed on the local models. We demonstrate the principle of the approach on a toy data set and apply our system to two more complex 12- and 18-dimensional data sets.
Resumo:
Hierarchical visualization systems are desirable because a single two-dimensional visualization plot may not be sufficient to capture all of the interesting aspects of complex high-dimensional data sets. We extend an existing locally linear hierarchical visualization system PhiVis [1] in several directions: bf(1) we allow for em non-linear projection manifolds (the basic building block is the Generative Topographic Mapping -- GTM), bf(2) we introduce a general formulation of hierarchical probabilistic models consisting of local probabilistic models organized in a hierarchical tree, bf(3) we describe folding patterns of low-dimensional projection manifold in high-dimensional data space by computing and visualizing the manifold's local directional curvatures. Quantities such as magnification factors [3] and directional curvatures are helpful for understanding the layout of the nonlinear projection manifold in the data space and for further refinement of the hierarchical visualization plot. Like PhiVis, our system is statistically principled and is built interactively in a top-down fashion using the EM algorithm. We demonstrate the visualization system principle of the approach on a complex 12-dimensional data set and mention possible applications in the pharmaceutical industry.
Resumo:
This thesis provides an interoperable language for quantifying uncertainty using probability theory. A general introduction to interoperability and uncertainty is given, with particular emphasis on the geospatial domain. Existing interoperable standards used within the geospatial sciences are reviewed, including Geography Markup Language (GML), Observations and Measurements (O&M) and the Web Processing Service (WPS) specifications. The importance of uncertainty in geospatial data is identified and probability theory is examined as a mechanism for quantifying these uncertainties. The Uncertainty Markup Language (UncertML) is presented as a solution to the lack of an interoperable standard for quantifying uncertainty. UncertML is capable of describing uncertainty using statistics, probability distributions or a series of realisations. The capabilities of UncertML are demonstrated through a series of XML examples. This thesis then provides a series of example use cases where UncertML is integrated with existing standards in a variety of applications. The Sensor Observation Service - a service for querying and retrieving sensor-observed data - is extended to provide a standardised method for quantifying the inherent uncertainties in sensor observations. The INTAMAP project demonstrates how UncertML can be used to aid uncertainty propagation using a WPS by allowing UncertML as input and output data. The flexibility of UncertML is demonstrated with an extension to the GML geometry schemas to allow positional uncertainty to be quantified. Further applications and developments of UncertML are discussed.
Resumo:
It has been argued that a single two-dimensional visualization plot may not be sufficient to capture all of the interesting aspects of complex data sets, and therefore a hierarchical visualization system is desirable. In this paper we extend an existing locally linear hierarchical visualization system PhiVis (Bishop98a) in several directions: 1. We allow for em non-linear projection manifolds. The basic building block is the Generative Topographic Mapping. 2. We introduce a general formulation of hierarchical probabilistic models consisting of local probabilistic models organized in a hierarchical tree. General training equations are derived, regardless of the position of the model in the tree. 3. Using tools from differential geometry we derive expressions for local directionalcurvatures of the projection manifold. Like PhiVis, our system is statistically principled and is built interactively in a top-down fashion using the EM algorithm. It enables the user to interactively highlight those data in the parent visualization plot which are captured by a child model.We also incorporate into our system a hierarchical, locally selective representation of magnification factors and directional curvatures of the projection manifolds. Such information is important for further refinement of the hierarchical visualization plot, as well as for controlling the amount of regularization imposed on the local models. We demonstrate the principle of the approach on a toy data set andapply our system to two more complex 12- and 19-dimensional data sets.
Resumo:
The generation of very short range forecasts of precipitation in the 0-6 h time window is traditionally referred to as nowcasting. Most existing nowcasting systems essentially extrapolate radar observations in some manner, however, very few systems account for the uncertainties involved. Thus deterministic forecast are produced, which have a limited use when decisions must be made, since they have no measure of confidence or spread of the forecast. This paper develops a Bayesian state space modelling framework for quantitative precipitation nowcasting which is probabilistic from conception. The model treats the observations (radar) as noisy realisations of the underlying true precipitation process, recognising that this process can never be completely known, and thus must be represented probabilistically. In the model presented here the dynamics of the precipitation are dominated by advection, so this is a probabilistic extrapolation forecast. The model is designed in such a way as to minimise the computational burden, while maintaining a full, joint representation of the probability density function of the precipitation process. The update and evolution equations avoid the need to sample, thus only one model needs be run as opposed to the more traditional ensemble route. It is shown that the model works well on both simulated and real data, but that further work is required before the model can be used operationally. © 2004 Elsevier B.V. All rights reserved.
Resumo:
DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT
Resumo:
This thesis introduces a flexible visual data exploration framework which combines advanced projection algorithms from the machine learning domain with visual representation techniques developed in the information visualisation domain to help a user to explore and understand effectively large multi-dimensional datasets. The advantage of such a framework to other techniques currently available to the domain experts is that the user is directly involved in the data mining process and advanced machine learning algorithms are employed for better projection. A hierarchical visualisation model guided by a domain expert allows them to obtain an informed segmentation of the input space. Two other components of this thesis exploit properties of these principled probabilistic projection algorithms to develop a guided mixture of local experts algorithm which provides robust prediction and a model to estimate feature saliency simultaneously with the training of a projection algorithm.Local models are useful since a single global model cannot capture the full variability of a heterogeneous data space such as the chemical space. Probabilistic hierarchical visualisation techniques provide an effective soft segmentation of an input space by a visualisation hierarchy whose leaf nodes represent different regions of the input space. We use this soft segmentation to develop a guided mixture of local experts (GME) algorithm which is appropriate for the heterogeneous datasets found in chemoinformatics problems. Moreover, in this approach the domain experts are more involved in the model development process which is suitable for an intuition and domain knowledge driven task such as drug discovery. We also derive a generative topographic mapping (GTM) based data visualisation approach which estimates feature saliency simultaneously with the training of a visualisation model.
Resumo:
The accuracy of altimetrically derived oceanographic and geophysical information is limited by the precision of the radial component of the satellite ephemeris. A non-dynamic technique is proposed as a method of reducing the global radial orbit error of altimetric satellites. This involves the recovery of each coefficient of an analytically derived radial error correction through a refinement of crossover difference residuals. The crossover data is supplemented by absolute height measurements to permit the retrieval of otherwise unobservable geographically correlated and linearly combined parameters. The feasibility of the radial reduction procedure is established upon application to the three day repeat orbit of SEASAT. The concept of arc aggregates is devised as a means of extending the method to incorporate longer durations, such as the 35 day repeat period of ERS-1. A continuous orbit is effectively created by including the radial misclosure between consecutive long arcs as an infallible observation. The arc aggregate procedure is validated using a combination of three successive SEASAT ephemerides. A complete simulation of the 501 revolution per 35 day repeat orbit of ERS-1 is derived and the recovery of the global radial orbit error over the full repeat period is successfully accomplished. The radial reduction is dependent upon the geographical locations of the supplementary direct height data. Investigations into the respective influences of various sites proposed for the tracking of ERS-1 by ground-based transponders are carried out. The potential effectiveness on the radial orbital accuracy of locating future tracking sites in regions of high latitudinal magnitude is demonstrated.
Resumo:
Geometric information relating to most engineering products is available in the form of orthographic drawings or 2D data files. For many recent computer based applications, such as Computer Integrated Manufacturing (CIM), these data are required in the form of a sophisticated model based on Constructive Solid Geometry (CSG) concepts. A recent novel technique in this area transfers 2D engineering drawings directly into a 3D solid model called `the first approximation'. In many cases, however, this does not represent the real object. In this thesis, a new method is proposed and developed to enhance this model. This method uses the notion of expanding an object in terms of other solid objects, which are either primitive or first approximation models. To achieve this goal, in addition to the prepared subroutine to calculate the first approximation model of input data, two other wireframe models are found for extraction of sub-objects. One is the wireframe representation on input, and the other is the wireframe of the first approximation model. A new fast method is developed for the latter special case wireframe, which is named the `first approximation wireframe model'. This method avoids the use of a solid modeller. Detailed descriptions of algorithms and implementation procedures are given. In these techniques utilisation of dashed line information is also considered in improving the model. Different practical examples are given to illustrate the functioning of the program. Finally, a recursive method is employed to automatically modify the output model towards the real object. Some suggestions for further work are made to increase the domain of objects covered, and provide a commercially usable package. It is concluded that the current method promises the production of accurate models for a large class of objects.
Resumo:
Measurements of the sea surface obtained by satellite borne radar altimetry are irregularly spaced and contaminated with various modelling and correction errors. The largest source of uncertainty for low Earth orbiting satellites such as ERS-1 and Geosat may be attributed to orbital modelling errors. The empirical correction of such errors is investigated by examination of single and dual satellite crossovers, with a view to identifying the extent of any signal aliasing: either by removal of long wavelength ocean signals or introduction of additional error signals. From these studies, it was concluded that sinusoidal approximation of the dominant one cycle per revolution orbit error over arc lengths of 11,500 km did not remove a significant mesoscale ocean signal. The use of TOPEX/Poseidon dual crossovers with ERS-1 was shown to substantially improve the radial accuracy of ERS-1, except for some absorption of small TOPEX/Poseidon errors. The extraction of marine geoid information is of great interest to the oceanographic community and was the subject of the second half of this thesis. Firstly through determination of regional mean sea surfaces using Geosat data, it was demonstrated that a dataset with 70cm orbit error contamination could produce a marine geoid map which compares to better than 12cm with an accurate regional high resolution gravimetric geoid. This study was then developed into Optimal Fourier Transform Interpolation, a technique capable of analysing complete altimeter datasets for the determination of consistent global high resolution geoid maps. This method exploits the regular nature of ascending and descending data subsets thus making possible the application of fast Fourier transform algorithms. Quantitative assessment of this method was limited by the lack of global ground truth gravity data, but qualitative results indicate good signal recovery from a single 35-day cycle.