14 resultados para Sequential Gaussian simulation

em Aston University Research Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently within the machine learning and spatial statistics communities many papers have explored the potential of reduced rank representations of the covariance matrix, often referred to as projected or fixed rank approaches. In such methods the covariance function of the posterior process is represented by a reduced rank approximation which is chosen such that there is minimal information loss. In this paper a sequential framework for inference in such projected processes is presented, where the observations are considered one at a time. We introduce a C++ library for carrying out such projected, sequential estimation which adds several novel features. In particular we have incorporated the ability to use a generic observation operator, or sensor model, to permit data fusion. We can also cope with a range of observation error characteristics, including non-Gaussian observation errors. Inference for the variogram parameters is based on maximum likelihood estimation. We illustrate the projected sequential method in application to synthetic and real data sets. We discuss the software implementation and suggest possible future extensions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Heterogeneous datasets arise naturally in most applications due to the use of a variety of sensors and measuring platforms. Such datasets can be heterogeneous in terms of the error characteristics and sensor models. Treating such data is most naturally accomplished using a Bayesian or model-based geostatistical approach; however, such methods generally scale rather badly with the size of dataset, and require computationally expensive Monte Carlo based inference. Recently within the machine learning and spatial statistics communities many papers have explored the potential of reduced rank representations of the covariance matrix, often referred to as projected or fixed rank approaches. In such methods the covariance function of the posterior process is represented by a reduced rank approximation which is chosen such that there is minimal information loss. In this paper a sequential Bayesian framework for inference in such projected processes is presented. The observations are considered one at a time which avoids the need for high dimensional integrals typically required in a Bayesian approach. A C++ library, gptk, which is part of the INTAMAP web service, is introduced which implements projected, sequential estimation and adds several novel features. In particular the library includes the ability to use a generic observation operator, or sensor model, to permit data fusion. It is also possible to cope with a range of observation error characteristics, including non-Gaussian observation errors. Inference for the covariance parameters is explored, including the impact of the projected process approximation on likelihood profiles. We illustrate the projected sequential method in application to synthetic and real datasets. Limitations and extensions are discussed. © 2010 Elsevier Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We develop an approach for a sparse representation for Gaussian Process (GP) models in order to overcome the limitations of GPs caused by large data sets. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data which fully specifies the prediction of the model. Experimental results on toy examples and large real-world datasets indicate the efficiency of the approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We discuss the Application of TAP mean field methods known from Statistical Mechanics of disordered systems to Bayesian classification with Gaussian processes. In contrast to previous applications, no knowledge about the distribution of inputs is needed. Simulation results for the Sonar data set are given.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We derive a mean field algorithm for binary classification with Gaussian processes which is based on the TAP approach originally proposed in Statistical Physics of disordered systems. The theory also yields an approximate leave-one-out estimator for the generalization error which is computed with no extra computational cost. We show that from the TAP approach, it is possible to derive both a simpler 'naive' mean field theory and support vector machines (SVM) as limiting cases. For both mean field algorithms and support vectors machines, simulation results for three small benchmark data sets are presented. They show 1. that one may get state of the art performance by using the leave-one-out estimator for model selection and 2. the built-in leave-one-out estimators are extremely precise when compared to the exact leave-one-out estimate. The latter result is a taken as a strong support for the internal consistency of the mean field approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this chapter, we elaborate on the well-known relationship between Gaussian processes (GP) and Support Vector Machines (SVM). Secondly, we present approximate solutions for two computational problems arising in GP and SVM. The first one is the calculation of the posterior mean for GP classifiers using a `naive' mean field approach. The second one is a leave-one-out estimator for the generalization error of SVM based on a linear response method. Simulation results on a benchmark dataset show similar performances for the GP mean field algorithm and the SVM algorithm. The approximate leave-one-out estimator is found to be in very good agreement with the exact leave-one-out error.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We develop an approach for sparse representations of Gaussian Process (GP) models (which are Bayesian types of kernel machines) in order to overcome their limitations for large data sets. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data which fully specifies the prediction of the GP model. By using an appealing parametrisation and projection techniques that use the RKHS norm, recursions for the effective parameters and a sparse Gaussian approximation of the posterior process are obtained. This allows both for a propagation of predictions as well as of Bayesian error measures. The significance and robustness of our approach is demonstrated on a variety of experiments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We develop an approach for sparse representations of Gaussian Process (GP) models (which are Bayesian types of kernel machines) in order to overcome their limitations for large data sets. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data which fully specifies the prediction of the GP model. By using an appealing parametrisation and projection techniques that use the RKHS norm, recursions for the effective parameters and a sparse Gaussian approximation of the posterior process are obtained. This allows both for a propagation of predictions as well as of Bayesian error measures. The significance and robustness of our approach is demonstrated on a variety of experiments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A CSSL- type modular FORTRAN package, called ACES, has been developed to assist in the simulation of the dynamic behaviour of chemical plant. ACES can be harnessed, for instance, to simulate the transients in startups or after a throughput change. ACES has benefited from two existing simulators. The structure was adapted from ICL SLAM and most plant models originate in DYFLO. The latter employs sequential modularisation which is not always applicable to chemical engineering problems. A novel device of twice- round execution enables ACES to achieve general simultaneous modularisation. During the FIRST ROUND, STATE-VARIABLES are retrieved from the integrator and local calculations performed. During the SECOND ROUND, fresh derivatives are estimated and stored for simultaneous integration. ACES further includes a version of DIFSUB, a variable-step integrator capable of handling stiff differential systems. ACES is highly formalised . It does not use pseudo steady- state approximations and excludes inconsistent and arbitrary features of DYFLO. Built- in debug traps make ACES robust. ACES shows generality, flexibility, versatility and portability, and is very convenient to use. It undertakes substantial housekeeping behind the scenes and thus minimises the detailed involvement of the user. ACES provides a working set of defaults for simulation to proceed as far as possible. Built- in interfaces allow for reactions and user supplied algorithms to be incorporated . New plant models can be easily appended. Boundary- value problems and optimisation may be tackled using the RERUN feature. ACES is file oriented; a STATE can be saved in a readable form and reactivated later. Thus piecewise simulation is possible. ACES has been illustrated and verified to a large extent using some literature-based examples. Actual plant tests are desirable however to complete the verification of the library. Interaction and graphics are recommended for future work.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Computer models, or simulators, are widely used in a range of scientific fields to aid understanding of the processes involved and make predictions. Such simulators are often computationally demanding and are thus not amenable to statistical analysis. Emulators provide a statistical approximation, or surrogate, for the simulators accounting for the additional approximation uncertainty. This thesis develops a novel sequential screening method to reduce the set of simulator variables considered during emulation. This screening method is shown to require fewer simulator evaluations than existing approaches. Utilising the lower dimensional active variable set simplifies subsequent emulation analysis. For random output, or stochastic, simulators the output dispersion, and thus variance, is typically a function of the inputs. This work extends the emulator framework to account for such heteroscedasticity by constructing two new heteroscedastic Gaussian process representations and proposes an experimental design technique to optimally learn the model parameters. The design criterion is an extension of Fisher information to heteroscedastic variance models. Replicated observations are efficiently handled in both the design and model inference stages. Through a series of simulation experiments on both synthetic and real world simulators, the emulators inferred on optimal designs with replicated observations are shown to outperform equivalent models inferred on space-filling replicate-free designs in terms of both model parameter uncertainty and predictive variance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Boyd's SBS model which includes distributed thermal acoustic noise (DTAN) has been enhanced to enable the Stokes-spontaneous density depletion noise (SSDDN) component of the transmitted optical field to be simulated, probably for the first time, as well as the full transmitted field. SSDDN would not be generated from previous SBS models in which a Stokes seed replaces DTAN. SSDDN becomes the dominant form of transmitted SBS noise as model fibre length (MFL) is increased but its optical power spectrum remains independent of MFL. Simulations of the full transmitted field and SSDDN for different MFLs allow prediction of the optical power spectrum, or system performance parameters which depend on this, for typical communication link lengths which are too long for direct simulation. The SBS model has also been innovatively improved by allowing the Brillouin Shift Frequency (BS) to vary over the model fibre length, for the nonuniform fibre model (NFM) mode, or to remain constant, for the uniform fibre model (UFM) mode. The assumption of a Gaussian probability density function (pdf) for the BSF in the NFM has been confirmed by means of an analysis of reported Brillouin amplified power spectral measurements for the simple case of a nominally step-index single-mode pure silica core fibre. The BSF pdf could be modified to match the Brillouin gain spectra of other fibre types if required. For both models, simulated backscattered and output powers as functions of input power agree well with those from a reported experiment for fitting Brillouin gain coefficients close to theoretical. The NFM and UFM Brillouin gain spectra are then very similar from half to full maximum but diverge at lower values. Consequently, NFM and UFM transmitted SBS noise powers inferred for long MFLs differ by 1-2 dB over the input power range of 0.15 dBm. This difference could be significant for AM-VSB CATV links at some channel frequencies. The modelled characteristic of Carrier-to-Noise Ratio (CNR) as a function of input power for a single intensity modulated subcarrier is in good agreement with the characteristic reported for an experiment when either the UFM or NFM is used. The difference between the two modelled characteristics would have been more noticeable for a higher fibre length or a lower subcarrier frequency.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The principled statistical application of Gaussian random field models used in geostatistics has historically been limited to data sets of a small size. This limitation is imposed by the requirement to store and invert the covariance matrix of all the samples to obtain a predictive distribution at unsampled locations, or to use likelihood-based covariance estimation. Various ad hoc approaches to solve this problem have been adopted, such as selecting a neighborhood region and/or a small number of observations to use in the kriging process, but these have no sound theoretical basis and it is unclear what information is being lost. In this article, we present a Bayesian method for estimating the posterior mean and covariance structures of a Gaussian random field using a sequential estimation algorithm. By imposing sparsity in a well-defined framework, the algorithm retains a subset of “basis vectors” that best represent the “true” posterior Gaussian random field model in the relative entropy sense. This allows a principled treatment of Gaussian random field models on very large data sets. The method is particularly appropriate when the Gaussian random field model is regarded as a latent variable model, which may be nonlinearly related to the observations. We show the application of the sequential, sparse Bayesian estimation in Gaussian random field models and discuss its merits and drawbacks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

SPOT simulation imagery was acquired for a test site in the Forest of Dean in Gloucestershire, U.K. This data was qualitatively and quantitatively evaluated for its potential application in forest resource mapping and management. A variety of techniques are described for enhancing the image with the aim of providing species level discrimination within the forest. Visual interpretation of the imagery was more successful than automated classification. The heterogeneity within the forest classes, and in particular between the forest and urban class, resulted in poor discrimination using traditional `per-pixel' automated methods of classification. Different means of assessing classification accuracy are proposed. Two techniques for measuring textural variation were investigated in an attempt to improve classification accuracy. The first of these, a sequential segmentation method, was found to be beneficial. The second, a parallel segmentation method, resulted in little improvement though this may be related to a combination of resolution in size of the texture extraction area. The effect on classification accuracy of combining the SPOT simulation imagery with other data types is investigated. A grid cell encoding technique was selected as most appropriate for storing digitised topographic (elevation, slope) and ground truth data. Topographic data were shown to improve species-level classification, though with sixteen classes overall accuracies were consistently below 50%. Neither sub-division into age groups or the incorporation of principal components and a band ratio significantly improved classification accuracy. It is concluded that SPOT imagery will not permit species level classification within forested areas as diverse as the Forest of Dean. The imagery will be most useful as part of a multi-stage sampling scheme. The use of texture analysis is highly recommended for extracting maximum information content from the data. Incorporation of the imagery into a GIS will both aid discrimination and provide a useful management tool.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Since wind has an intrinsically complex and stochastic nature, accurate wind power forecasts are necessary for the safety and economics of wind energy utilization. In this paper, we investigate a combination of numeric and probabilistic models: one-day-ahead wind power forecasts were made with Gaussian Processes (GPs) applied to the outputs of a Numerical Weather Prediction (NWP) model. Firstly the wind speed data from NWP was corrected by a GP. Then, as there is always a defined limit on power generated in a wind turbine due the turbine controlling strategy, a Censored GP was used to model the relationship between the corrected wind speed and power output. To validate the proposed approach, two real world datasets were used for model construction and testing. The simulation results were compared with the persistence method and Artificial Neural Networks (ANNs); the proposed model achieves about 11% improvement in forecasting accuracy (Mean Absolute Error) compared to the ANN model on one dataset, and nearly 5% improvement on another.