52 resultados para Approximate Bayesian computation, Posterior distribution, Quantile distribution, Response time data
Resumo:
Predicting species potential and future distribution has become a relevant tool in biodiversity monitoring and conservation.In this data article we present the suitability map of a virtual species generated based on two bioclimatic variables, and a dataset containing more than 700,000 random observations at the extent of Europe. The dataset includes spatial attributes such as: distance to roads, protected areas, country codes, and the habitat suitability of two spatially clustered species (grassland and forest species) and a wide-spread species.
Resumo:
Gaussian processes provide natural non-parametric prior distributions over regression functions. In this paper we consider regression problems where there is noise on the output, and the variance of the noise depends on the inputs. If we assume that the noise is a smooth function of the inputs, then it is natural to model the noise variance using a second Gaussian process, in addition to the Gaussian process governing the noise-free output value. We show that prior uncertainty about the parameters controlling both processes can be handled and that the posterior distribution of the noise rate can be sampled from using Markov chain Monte Carlo methods. Our results on a synthetic data set give a posterior noise variance that well-approximates the true variance.
Resumo:
A novel approach, based on statistical mechanics, to analyze typical performance of optimum code-division multiple-access (CDMA) multiuser detectors is reviewed. A `black-box' view ot the basic CDMA channel is introduced, based on which the CDMA multiuser detection problem is regarded as a `learning-from-examples' problem of the `binary linear perceptron' in the neural network literature. Adopting Bayes framework, analysis of the performance of the optimum CDMA multiuser detectors is reduced to evaluation of the average of the cumulant generating function of a relevant posterior distribution. The evaluation of the average cumulant generating function is done, based on formal analogy with a similar calculation appearing in the spin glass theory in statistical mechanics, by making use of the replica method, a method developed in the spin glass theory.
Resumo:
Amongst all the objectives in the study of time series, uncovering the dynamic law of its generation is probably the most important. When the underlying dynamics are not available, time series modelling consists of developing a model which best explains a sequence of observations. In this thesis, we consider hidden space models for analysing and describing time series. We first provide an introduction to the principal concepts of hidden state models and draw an analogy between hidden Markov models and state space models. Central ideas such as hidden state inference or parameter estimation are reviewed in detail. A key part of multivariate time series analysis is identifying the delay between different variables. We present a novel approach for time delay estimating in a non-stationary environment. The technique makes use of hidden Markov models and we demonstrate its application for estimating a crucial parameter in the oil industry. We then focus on hybrid models that we call dynamical local models. These models combine and generalise hidden Markov models and state space models. Probabilistic inference is unfortunately computationally intractable and we show how to make use of variational techniques for approximating the posterior distribution over the hidden state variables. Experimental simulations on synthetic and real-world data demonstrate the application of dynamical local models for segmenting a time series into regimes and providing predictive distributions.
Resumo:
The amplification of demand variation up a supply chain widely termed ‘the Bullwhip Effect’ is disruptive, costly and something that supply chain management generally seeks to minimise. Originally attributed to poor system design; deficiencies in policies, organisation structure and delays in material and information flow all lead to sub-optimal reorder point calculation. It has since been attributed to exogenous random factors such as: uncertainties in demand, supply and distribution lead time but these causes are not exclusive as academic and operational studies since have shown that orders and/or inventories can exhibit significant variability even if customer demand and lead time are deterministic. This increase in the range of possible causes of dynamic behaviour indicates that our understanding of the phenomenon is far from complete. One possible, yet previously unexplored, factor that may influence dynamic behaviour in supply chains is the application and operation of supply chain performance measures. Organisations monitoring and responding to their adopted key performance metrics will make operational changes and this action may influence the level of dynamics within the supply chain, possibly degrading the performance of the very system they were intended to measure. In order to explore this a plausible abstraction of the operational responses to the Supply Chain Council’s SCOR® (Supply Chain Operations Reference) model was incorporated into a classic Beer Game distribution representation, using the dynamic discrete event simulation software Simul8. During the simulation the five SCOR Supply Chain Performance Attributes: Reliability, Responsiveness, Flexibility, Cost and Utilisation were continuously monitored and compared to established targets. Operational adjustments to the; reorder point, transportation modes and production capacity (where appropriate) for three independent supply chain roles were made and the degree of dynamic behaviour in the Supply Chain measured, using the ratio of the standard deviation of upstream demand relative to the standard deviation of the downstream demand. Factors employed to build the detailed model include: variable retail demand, order transmission, transportation delays, production delays, capacity constraints demand multipliers and demand averaging periods. Five dimensions of supply chain performance were monitored independently in three autonomous supply chain roles and operational settings adjusted accordingly. Uniqueness of this research stems from the application of the five SCOR performance attributes with modelled operational responses in a dynamic discrete event simulation model. This project makes its primary contribution to knowledge by measuring the impact, on supply chain dynamics, of applying a representative performance measurement system.
Resumo:
Exploratory analysis of data in all sciences seeks to find common patterns to gain insights into the structure and distribution of the data. Typically visualisation methods like principal components analysis are used but these methods are not easily able to deal with missing data nor can they capture non-linear structure in the data. One approach to discovering complex, non-linear structure in the data is through the use of linked plots, or brushing, while ignoring the missing data. In this technical report we discuss a complementary approach based on a non-linear probabilistic model. The generative topographic mapping enables the visualisation of the effects of very many variables on a single plot, which is able to incorporate far more structure than a two dimensional principal components plot could, and deal at the same time with missing data. We show that using the generative topographic mapping provides us with an optimal method to explore the data while being able to replace missing values in a dataset, particularly where a large proportion of the data is missing.
Resumo:
Visualising data for exploratory analysis is a big challenge in scientific and engineering domains where there is a need to gain insight into the structure and distribution of the data. Typically, visualisation methods like principal component analysis and multi-dimensional scaling are used, but it is difficult to incorporate prior knowledge about structure of the data into the analysis. In this technical report we discuss a complementary approach based on an extension of a well known non-linear probabilistic model, the Generative Topographic Mapping. We show that by including prior information of the covariance structure into the model, we are able to improve both the data visualisation and the model fit.
Resumo:
Stochastic differential equations arise naturally in a range of contexts, from financial to environmental modeling. Current solution methods are limited in their representation of the posterior process in the presence of data. In this work, we present a novel Gaussian process approximation to the posterior measure over paths for a general class of stochastic differential equations in the presence of observations. The method is applied to two simple problems: the Ornstein-Uhlenbeck process, of which the exact solution is known and can be compared to, and the double-well system, for which standard approaches such as the ensemble Kalman smoother fail to provide a satisfactory result. Experiments show that our variational approximation is viable and that the results are very promising as the variational approximate solution outperforms standard Gaussian process regression for non-Gaussian Markov processes.
Resumo:
Different types of numerical data can be collected in a scientific investigation and the choice of statistical analysis will often depend on the distribution of the data. A basic distinction between variables is whether they are ‘parametric’ or ‘non-parametric’. When a variable is parametric, the data come from a symmetrically shaped distribution known as the ‘Gaussian’ or ‘normal distribution’ whereas non-parametric variables may have a distribution which deviates markedly in shape from normal. This article describes several aspects of the problem of non-normality including: (1) how to test for two common types of deviation from a normal distribution, viz., ‘skew’ and ‘kurtosis’, (2) how to fit the normal distribution to a sample of data, (3) the transformation of non-normally distributed data and scores, and (4) commonly used ‘non-parametric’ statistics which can be used in a variety of circumstances.
Resumo:
Local shell side coefficient measurements in the end conpartments of a model shell and tube heat exchanger have been made using an electrochemical technique. Limited data are also reported far the second compartment. The end compartment average coefficients have been found to be smaller than reported data for a corresponding internal conpartment. The second compartment data. have been shown to lie between those for the end compartments and the reported internal compartment data. Experimental data are reported fcr two port types and two baffle orientations. with data for the case of an inlet compartment impingement baffle also being given . Port type is shown to have a small effect on compartment coefficients, these being largely unaffected. Likewise, the outlet compartment average coefficients are slightly snaller than those for the inlet compartment, with the distribution of individual tube coefficients being similar. Baffle orientation has been shown to have no effect on average coefficients, but the distribution of the data is substantially affected. The use of an impingement baffle in the inlet compartment lessens the efect of baffle orientation on distribution . Recommendations are made for future work.
Resumo:
Visualising data for exploratory analysis is a major challenge in many applications. Visualisation allows scientists to gain insight into the structure and distribution of the data, for example finding common patterns and relationships between samples as well as variables. Typically, visualisation methods like principal component analysis and multi-dimensional scaling are employed. These methods are favoured because of their simplicity, but they cannot cope with missing data and it is difficult to incorporate prior knowledge about properties of the variable space into the analysis; this is particularly important in the high-dimensional, sparse datasets typical in geochemistry. In this paper we show how to utilise a block-structured correlation matrix using a modification of a well known non-linear probabilistic visualisation model, the Generative Topographic Mapping (GTM), which can cope with missing data. The block structure supports direct modelling of strongly correlated variables. We show that including prior structural information it is possible to improve both the data visualisation and the model fit. These benefits are demonstrated on artificial data as well as a real geochemical dataset used for oil exploration, where the proposed modifications improved the missing data imputation results by 3 to 13%.
Resumo:
The thesis is concerned with relationships between profit, technology and environmental change. Existing work has concentrated on only a few questions, treated at either micro or macro levels of analysis. And there has been something of an impasse since the neoclassical and neomarxist approaches are either in direct conflict (macro level), or hardly interact (micro level). The aim of the thesis was to bypass this impasse by starting to develop a meso level of analysis that focusses on issues largely ignored in the traditional approaches - on questions about distribution. The first questions looked at were descriptive - what were the patterns of distribution over time of the variability in types and rates of environmental change, and in particular, was there any evidence of periodization? Two case studies were used to examine these issues. The first looked at environmental change in the iron and steel industry since 1700, and the second studied pollution in five industries in the basic processing sector. It was established that environmental change has been markedly periodized, with an apparently fairly regular `cycle length' of about fifty years. The second questions considered were explanatory - whether and how this periodization could be accounted for by reference to variations in aspects of profitability and technical change. In the iron and steel industry, it was found that diffusion rates and the rate of nature of innovation were periodized on the same pattern as was environmental change. And the same sort of variation was also present in the realm of profits, as evidenced by cyclical changes in output growth. Simple theoretical accounts could be given for all the empirically demonstrable links, and it was suggested that the most useful models at this meso level of analysis are provided by structural change models of economic development.
Resumo:
This study was concerned with the computer automation of land evaluation. This is a broad subject with many issues to be resolved, so the study concentrated on three key problems: knowledge based programming; the integration of spatial information from remote sensing and other sources; and the inclusion of socio-economic information into the land evaluation analysis. Land evaluation and land use planning were considered in the context of overseas projects in the developing world. Knowledge based systems were found to provide significant advantages over conventional programming techniques for some aspects of the land evaluation process. Declarative languages, in particular Prolog, were ideally suited to integration of social information which changes with every situation. Rule-based expert system shells were also found to be suitable for this role, including knowledge acquisition at the interview stage. All the expert system shells examined suffered from very limited constraints to problem size, but new products now overcome this. Inductive expert system shells were useful as a guide to knowledge gaps and possible relationships, but the number of examples required was unrealistic for typical land use planning situations. The accuracy of classified satellite imagery was significantly enhanced by integrating spatial information on soil distribution for Thailand data. Estimates of the rice producing area were substantially improved (30% change in area) by the addition of soil information. Image processing work on Mozambique showed that satellite remote sensing was a useful tool in stratifying vegetation cover at provincial level to identify key development areas, but its full utility could not be realised on typical planning projects, without treatment as part of a complete spatial information system.
Resumo:
In this work we experimentally investigate the response time of humidity sensors based on polymer optical fibre (POF) Bragg gratings. By the use of etching with acetone we can control the diameter of POF based on poly (methyl methacrylate) in order to reduce the diffusion time of water into the polymer and hence speed up the relative wavelength change caused by humidity variations. A much improved response time of 11 minutes has been achieved by using a POF FBG with a reduced diameter of 135 microns.
Resumo:
Exploratory analysis of data seeks to find common patterns to gain insights into the structure and distribution of the data. In geochemistry it is a valuable means to gain insights into the complicated processes making up a petroleum system. Typically linear visualisation methods like principal components analysis, linked plots, or brushing are used. These methods can not directly be employed when dealing with missing data and they struggle to capture global non-linear structures in the data, however they can do so locally. This thesis discusses a complementary approach based on a non-linear probabilistic model. The generative topographic mapping (GTM) enables the visualisation of the effects of very many variables on a single plot, which is able to incorporate more structure than a two dimensional principal components plot. The model can deal with uncertainty, missing data and allows for the exploration of the non-linear structure in the data. In this thesis a novel approach to initialise the GTM with arbitrary projections is developed. This makes it possible to combine GTM with algorithms like Isomap and fit complex non-linear structure like the Swiss-roll. Another novel extension is the incorporation of prior knowledge about the structure of the covariance matrix. This extension greatly enhances the modelling capabilities of the algorithm resulting in better fit to the data and better imputation capabilities for missing data. Additionally an extensive benchmark study of the missing data imputation capabilities of GTM is performed. Further a novel approach, based on missing data, will be introduced to benchmark the fit of probabilistic visualisation algorithms on unlabelled data. Finally the work is complemented by evaluating the algorithms on real-life datasets from geochemical projects.