289 resultados para Approximate Bayesian Computation
Resumo:
This thesis investigates profiling and differentiating customers through the use of statistical data mining techniques. The business application of our work centres on examining individuals’ seldomly studied yet critical consumption behaviour over an extensive time period within the context of the wireless telecommunication industry; consumption behaviour (as oppose to purchasing behaviour) is behaviour that has been performed so frequently that it become habitual and involves minimal intentions or decision making. Key variables investigated are the activity initialised timestamp and cell tower location as well as the activity type and usage quantity (e.g., voice call with duration in seconds); and the research focuses are on customers’ spatial and temporal usage behaviour. The main methodological emphasis is on the development of clustering models based on Gaussian mixture models (GMMs) which are fitted with the use of the recently developed variational Bayesian (VB) method. VB is an efficient deterministic alternative to the popular but computationally demandingMarkov chainMonte Carlo (MCMC) methods. The standard VBGMMalgorithm is extended by allowing component splitting such that it is robust to initial parameter choices and can automatically and efficiently determine the number of components. The new algorithm we propose allows more effective modelling of individuals’ highly heterogeneous and spiky spatial usage behaviour, or more generally human mobility patterns; the term spiky describes data patterns with large areas of low probability mixed with small areas of high probability. Customers are then characterised and segmented based on the fitted GMM which corresponds to how each of them uses the products/services spatially in their daily lives; this is essentially their likely lifestyle and occupational traits. Other significant research contributions include fitting GMMs using VB to circular data i.e., the temporal usage behaviour, and developing clustering algorithms suitable for high dimensional data based on the use of VB-GMM.
Resumo:
A time series method for the determination of combustion chamber resonant frequencies is outlined. This technique employs the use of Markov-chain Monte Carlo (MCMC) to infer parameters in a chosen model of the data. The development of the model is included and the resonant frequency is characterised as a function of time. Potential applications for cycle-by-cycle analysis are discussed and the bulk temperature of the gas and the trapped mass in the combustion chamber are evaluated as a function of time from resonant frequency information.
Practical improvements to simultaneous computation of multi-view geometry and radial lens distortion
Resumo:
This paper discusses practical issues related to the use of the division model for lens distortion in multi-view geometry computation. A data normalisation strategy is presented, which has been absent from previous discussions on the topic. The convergence properties of the Rectangular Quadric Eigenvalue Problem solution for computing division model distortion are examined. It is shown that the existing method can require more than 1000 iterations when dealing with severe distortion. A method is presented for accelerating convergence to less than 10 iterations for any amount of distortion. The new method is shown to produce equivalent or better results than the existing method with up to two orders of magnitude reduction in iterations. Through detailed simulation it is found that the number of data points used to compute geometry and lens distortion has a strong influence on convergence speed and solution accuracy. It is recommended that more than the minimal number of data points be used when computing geometry using a robust estimator such as RANSAC. Adding two to four extra samples improves the convergence rate and accuracy sufficiently to compensate for the increased number of samples required by the RANSAC process.
Resumo:
The research objectives of this thesis were to contribute to Bayesian statistical methodology by contributing to risk assessment statistical methodology, and to spatial and spatio-temporal methodology, by modelling error structures using complex hierarchical models. Specifically, I hoped to consider two applied areas, and use these applications as a springboard for developing new statistical methods as well as undertaking analyses which might give answers to particular applied questions. Thus, this thesis considers a series of models, firstly in the context of risk assessments for recycled water, and secondly in the context of water usage by crops. The research objective was to model error structures using hierarchical models in two problems, namely risk assessment analyses for wastewater, and secondly, in a four dimensional dataset, assessing differences between cropping systems over time and over three spatial dimensions. The aim was to use the simplicity and insight afforded by Bayesian networks to develop appropriate models for risk scenarios, and again to use Bayesian hierarchical models to explore the necessarily complex modelling of four dimensional agricultural data. The specific objectives of the research were to develop a method for the calculation of credible intervals for the point estimates of Bayesian networks; to develop a model structure to incorporate all the experimental uncertainty associated with various constants thereby allowing the calculation of more credible credible intervals for a risk assessment; to model a single day’s data from the agricultural dataset which satisfactorily captured the complexities of the data; to build a model for several days’ data, in order to consider how the full data might be modelled; and finally to build a model for the full four dimensional dataset and to consider the timevarying nature of the contrast of interest, having satisfactorily accounted for possible spatial and temporal autocorrelations. This work forms five papers, two of which have been published, with two submitted, and the final paper still in draft. The first two objectives were met by recasting the risk assessments as directed, acyclic graphs (DAGs). In the first case, we elicited uncertainty for the conditional probabilities needed by the Bayesian net, incorporated these into a corresponding DAG, and used Markov chain Monte Carlo (MCMC) to find credible intervals, for all the scenarios and outcomes of interest. In the second case, we incorporated the experimental data underlying the risk assessment constants into the DAG, and also treated some of that data as needing to be modelled as an ‘errors-invariables’ problem [Fuller, 1987]. This illustrated a simple method for the incorporation of experimental error into risk assessments. In considering one day of the three-dimensional agricultural data, it became clear that geostatistical models or conditional autoregressive (CAR) models over the three dimensions were not the best way to approach the data. Instead CAR models are used with neighbours only in the same depth layer. This gave flexibility to the model, allowing both the spatially structured and non-structured variances to differ at all depths. We call this model the CAR layered model. Given the experimental design, the fixed part of the model could have been modelled as a set of means by treatment and by depth, but doing so allows little insight into how the treatment effects vary with depth. Hence, a number of essentially non-parametric approaches were taken to see the effects of depth on treatment, with the model of choice incorporating an errors-in-variables approach for depth in addition to a non-parametric smooth. The statistical contribution here was the introduction of the CAR layered model, the applied contribution the analysis of moisture over depth and estimation of the contrast of interest together with its credible intervals. These models were fitted using WinBUGS [Lunn et al., 2000]. The work in the fifth paper deals with the fact that with large datasets, the use of WinBUGS becomes more problematic because of its highly correlated term by term updating. In this work, we introduce a Gibbs sampler with block updating for the CAR layered model. The Gibbs sampler was implemented by Chris Strickland using pyMCMC [Strickland, 2010]. This framework is then used to consider five days data, and we show that moisture in the soil for all the various treatments reaches levels particular to each treatment at a depth of 200 cm and thereafter stays constant, albeit with increasing variances with depth. In an analysis across three spatial dimensions and across time, there are many interactions of time and the spatial dimensions to be considered. Hence, we chose to use a daily model and to repeat the analysis at all time points, effectively creating an interaction model of time by the daily model. Such an approach allows great flexibility. However, this approach does not allow insight into the way in which the parameter of interest varies over time. Hence, a two-stage approach was also used, with estimates from the first-stage being analysed as a set of time series. We see this spatio-temporal interaction model as being a useful approach to data measured across three spatial dimensions and time, since it does not assume additivity of the random spatial or temporal effects.
Resumo:
An introduction to thinking about and understanding probability that highlights the main pits and trapfalls that befall logical reasoning
Resumo:
An introduction to elicitation of experts' probabilities, which illustrates common problems with reasoning and how to circumvent them during elicitation.
Resumo:
An introduction to design of eliciting knowledge from experts.
Resumo:
An introduction to eliciting a conditional probability table in a Bayesian Network model, highlighting three efficient methods for populating a CPT.
Resumo:
The availability of bridges is crucial to people’s daily life and national economy. Bridge health prediction plays an important role in bridge management because maintenance optimization is implemented based on prediction results of bridge deterioration. Conventional bridge deterioration models can be categorised into two groups, namely condition states models and structural reliability models. Optimal maintenance strategy should be carried out based on both condition states and structural reliability of a bridge. However, none of existing deterioration models considers both condition states and structural reliability. This study thus proposes a Dynamic Objective Oriented Bayesian Network (DOOBN) based method to overcome the limitations of the existing methods. This methodology has the ability to act upon as a flexible unifying tool, which can integrate a variety of approaches and information for better bridge deterioration prediction. Two demonstrative case studies are conducted to preliminarily justify the feasibility of the methodology
Resumo:
In this paper we present a sequential Monte Carlo algorithm for Bayesian sequential experimental design applied to generalised non-linear models for discrete data. The approach is computationally convenient in that the information of newly observed data can be incorporated through a simple re-weighting step. We also consider a flexible parametric model for the stimulus-response relationship together with a newly developed hybrid design utility that can produce more robust estimates of the target stimulus in the presence of substantial model and parameter uncertainty. The algorithm is applied to hypothetical clinical trial or bioassay scenarios. In the discussion, potential generalisations of the algorithm are suggested to possibly extend its applicability to a wide variety of scenarios
Resumo:
Mixture models are a flexible tool for unsupervised clustering that have found popularity in a vast array of research areas. In studies of medicine, the use of mixtures holds the potential to greatly enhance our understanding of patient responses through the identification of clinically meaningful clusters that, given the complexity of many data sources, may otherwise by intangible. Furthermore, when developed in the Bayesian framework, mixture models provide a natural means for capturing and propagating uncertainty in different aspects of a clustering solution, arguably resulting in richer analyses of the population under study. This thesis aims to investigate the use of Bayesian mixture models in analysing varied and detailed sources of patient information collected in the study of complex disease. The first aim of this thesis is to showcase the flexibility of mixture models in modelling markedly different types of data. In particular, we examine three common variants on the mixture model, namely, finite mixtures, Dirichlet Process mixtures and hidden Markov models. Beyond the development and application of these models to different sources of data, this thesis also focuses on modelling different aspects relating to uncertainty in clustering. Examples of clustering uncertainty considered are uncertainty in a patient’s true cluster membership and accounting for uncertainty in the true number of clusters present. Finally, this thesis aims to address and propose solutions to the task of comparing clustering solutions, whether this be comparing patients or observations assigned to different subgroups or comparing clustering solutions over multiple datasets. To address these aims, we consider a case study in Parkinson’s disease (PD), a complex and commonly diagnosed neurodegenerative disorder. In particular, two commonly collected sources of patient information are considered. The first source of data are on symptoms associated with PD, recorded using the Unified Parkinson’s Disease Rating Scale (UPDRS) and constitutes the first half of this thesis. The second half of this thesis is dedicated to the analysis of microelectrode recordings collected during Deep Brain Stimulation (DBS), a popular palliative treatment for advanced PD. Analysis of this second source of data centers on the problems of unsupervised detection and sorting of action potentials or "spikes" in recordings of multiple cell activity, providing valuable information on real time neural activity in the brain.
Resumo:
We describe the population pharmacokinetics of an acepromazine (ACP) metabolite (2-(1-hydroxyethyl)promazine) (HEPS) in horses for the estimation of likely detection times in plasma and urine. Acepromazine (30 mg) was administered to 12 horses, and blood and urine samples were taken at frequent intervals for chemical analysis. A Bayesian hierarchical model was fitted to describe concentration-time data and cumulative urine amounts for HEPS. The metabolite HEPS was modelled separately from the parent ACP as the half-life of the parent was considerably less than that of the metabolite. The clearance ($Cl/F_{PM}$) and volume of distribution ($V/F_{PM}$), scaled by the fraction of parent converted to metabolite, were estimated as 769 L/h and 6874 L, respectively. For a typical horse in the study, after receiving 30 mg of ACP, the upper limit of the detection time was 35 hours in plasma and 100 hours in urine, assuming an arbitrary limit of detection of 1 $\mu$g/L, and a small ($\approx 0.01$) probability of detection. The model derived allowed the probability of detection to be estimated at the population level. This analysis was conducted on data collected from only 12 horses, but we assume that this is representative of the wider population.
Resumo:
Modern technology now has the ability to generate large datasets over space and time. Such data typically exhibit high autocorrelations over all dimensions. The field trial data motivating the methods of this paper were collected to examine the behaviour of traditional cropping and to determine a cropping system which could maximise water use for grain production while minimising leakage below the crop root zone. They consist of moisture measurements made at 15 depths across 3 rows and 18 columns, in the lattice framework of an agricultural field. Bayesian conditional autoregressive (CAR) models are used to account for local site correlations. Conditional autoregressive models have not been widely used in analyses of agricultural data. This paper serves to illustrate the usefulness of these models in this field, along with the ease of implementation in WinBUGS, a freely available software package. The innovation is the fitting of separate conditional autoregressive models for each depth layer, the ‘layered CAR model’, while simultaneously estimating depth profile functions for each site treatment. Modelling interest also lay in how best to model the treatment effect depth profiles, and in the choice of neighbourhood structure for the spatial autocorrelation model. The favoured model fitted the treatment effects as splines over depth, and treated depth, the basis for the regression model, as measured with error, while fitting CAR neighbourhood models by depth layer. It is hierarchical, with separate onditional autoregressive spatial variance components at each depth, and the fixed terms which involve an errors-in-measurement model treat depth errors as interval-censored measurement error. The Bayesian framework permits transparent specification and easy comparison of the various complex models compared.
Resumo:
Discrete Markov random field models provide a natural framework for representing images or spatial datasets. They model the spatial association present while providing a convenient Markovian dependency structure and strong edge-preservation properties. However, parameter estimation for discrete Markov random field models is difficult due to the complex form of the associated normalizing constant for the likelihood function. For large lattices, the reduced dependence approximation to the normalizing constant is based on the concept of performing computationally efficient and feasible forward recursions on smaller sublattices which are then suitably combined to estimate the constant for the whole lattice. We present an efficient computational extension of the forward recursion approach for the autologistic model to lattices that have an irregularly shaped boundary and which may contain regions with no data; these lattices are typical in applications. Consequently, we also extend the reduced dependence approximation to these scenarios enabling us to implement a practical and efficient non-simulation based approach for spatial data analysis within the variational Bayesian framework. The methodology is illustrated through application to simulated data and example images. The supplemental materials include our C++ source code for computing the approximate normalizing constant and simulation studies.