965 resultados para Bayesian Analysis
Resumo:
Performing organization: Dept. of Statistics, University of Michigan.
Resumo:
The founding of new populations by small numbers of colonists has been considered a potentially important mechanism promoting evolutionary change in island populations. Colonizing species, such as members of the avian species complex Zosterops lateralis, have been used to support this idea. A large amount of background information on recent colonization history is available for one Zosterops subspecies, Z. lateralis lateralis, providing the opportunity to reconstruct the population dynamics of its colonization sequence. We used a Bayesian approach to combine historical and demographic information available on Z. l. lateralis with genotypic data from six microsatellite loci, and a rejection algorithm to make simultaneous inferences on the demographic parameters describing the recent colonization history of this subspecies in four southwest Pacific islands. Demographic models assuming mutation–drift equilibrium or a large number of founders were better supported than models assuming founder events for three of four recently colonized island populations. Posterior distributions of demographic parameters supported (i) a large stable effective population size of several thousands individuals with point estimates around 4000–5000; (ii) a founder event of very low intensity with a large effective number of founders around 150–200 individuals for each island in three of four islands, suggesting the colonization of those islands by one flock of large size or several flocks of average size; and (iii) a founder event of higher intensity on Norfolk Island with an effective number of founders around 20 individuals, suggesting colonization by a single flock of moderate size. Our inferences on demographic parameters, especially those on the number of founders, were relatively insensitive to the precise choice of prior distributions for microsatellite mutation processes and demographic parameters, suggesting that our analysis provides a robust description of the recent colonization history of the subspecies.
Resumo:
Inferring the spatial expansion dynamics of invading species from molecular data is notoriously difficult due to the complexity of the processes involved. For these demographic scenarios, genetic data obtained from highly variable markers may be profitably combined with specific sampling schemes and information from other sources using a Bayesian approach. The geographic range of the introduced toad Bufo marinus is still expanding in eastern and northern Australia, in each case from isolates established around 1960. A large amount of demographic and historical information is available on both expansion areas. In each area, samples were collected along a transect representing populations of different ages and genotyped at 10 microsatellite loci. Five demographic models of expansion, differing in the dispersal pattern for migrants and founders and in the number of founders, were considered. Because the demographic history is complex, we used an approximate Bayesian method, based on a rejection-regression algorithm. to formally test the relative likelihoods of the five models of expansion and to infer demographic parameters. A stepwise migration-foundation model with founder events was statistically better supported than other four models in both expansion areas. Posterior distributions supported different dynamics of expansion in the studied areas. Populations in the eastern expansion area have a lower stable effective population size and have been founded by a smaller number of individuals than those in the northern expansion area. Once demographically stabilized, populations exchange a substantial number of effective migrants per generation in both expansion areas, and such exchanges are larger in northern than in eastern Australia. The effective number of migrants appears to be considerably lower than that of founders in both expansion areas. We found our inferences to be relatively robust to various assumptions on marker. demographic, and historical features. The method presented here is the only robust, model-based method available so far, which allows inferring complex population dynamics over a short time scale. It also provides the basis for investigating the interplay between population dynamics, drift, and selection in invasive species.
Resumo:
Many studies on birds focus on the collection of data through an experimental design, suitable for investigation in a classical analysis of variance (ANOVA) framework. Although many findings are confirmed by one or more experts, expert information is rarely used in conjunction with the survey data to enhance the explanatory and predictive power of the model. We explore this neglected aspect of ecological modelling through a study on Australian woodland birds, focusing on the potential impact of different intensities of commercial cattle grazing on bird density in woodland habitat. We examine a number of Bayesian hierarchical random effects models, which cater for overdispersion and a high frequency of zeros in the data using WinBUGS and explore the variation between and within different grazing regimes and species. The impact and value of expert information is investigated through the inclusion of priors that reflect the experience of 20 experts in the field of bird responses to disturbance. Results indicate that expert information moderates the survey data, especially in situations where there are little or no data. When experts agreed, credible intervals for predictions were tightened considerably. When experts failed to agree, results were similar to those evaluated in the absence of expert information. Overall, we found that without expert opinion our knowledge was quite weak. The fact that the survey data is quite consistent, in general, with expert opinion shows that we do know something about birds and grazing and we could learn a lot faster if we used this approach more in ecology, where data are scarce. Copyright (c) 2005 John Wiley & Sons, Ltd.
Resumo:
The paper investigates a Bayesian hierarchical model for the analysis of categorical longitudinal data from a large social survey of immigrants to Australia. Data for each subject are observed on three separate occasions, or waves, of the survey. One of the features of the data set is that observations for some variables are missing for at least one wave. A model for the employment status of immigrants is developed by introducing, at the first stage of a hierarchical model, a multinomial model for the response and then subsequent terms are introduced to explain wave and subject effects. To estimate the model, we use the Gibbs sampler, which allows missing data for both the response and the explanatory variables to be imputed at each iteration of the algorithm, given some appropriate prior distributions. After accounting for significant covariate effects in the model, results show that the relative probability of remaining unemployed diminished with time following arrival in Australia.
Resumo:
The judicial interest in ‘scientific’ evidence has driven recent work to quantify results for forensic linguistic authorship analysis. Through a methodological discussion and a worked example this paper examines the issues which complicate attempts to quantify results in work. The solution suggested to some of the difficulties is a sampling and testing strategy which helps to identify potentially useful, valid and reliable markers of authorship. An important feature of the sampling strategy is that these markers identified as being generally valid and reliable are retested for use in specific authorship analysis cases. The suggested approach for drawing quantified conclusions combines discriminant function analysis and Bayesian likelihood measures. The worked example starts with twenty comparison texts for each of three potential authors and then uses a progressively smaller comparison corpus, reducing to fifteen, ten, five and finally three texts per author. This worked example demonstrates how reducing the amount of data affects the way conclusions can be drawn. With greater numbers of reference texts quantified and safe attributions are shown to be possible, but as the number of reference texts reduces the analysis shows how the conclusion which should be reached is that no attribution can be made. The testing process at no point results in instances of a misattribution.
Resumo:
In the present study, multilayer perceptron (MLP) neural networks were applied to help in the diagnosis of obstructive sleep apnoea syndrome (OSAS). Oxygen saturation (SaO2) recordings from nocturnal pulse oximetry were used for this purpose. We performed time and spectral analysis of these signals to extract 14 features related to OSAS. The performance of two different MLP classifiers was compared: maximum likelihood (ML) and Bayesian (BY) MLP networks. A total of 187 subjects suspected of suffering from OSAS took part in the study. Their SaO2 signals were divided into a training set with 74 recordings and a test set with 113 recordings. BY-MLP networks achieved the best performance on the test set with 85.58% accuracy (87.76% sensitivity and 82.39% specificity). These results were substantially better than those provided by ML-MLP networks, which were affected by overfitting and achieved an accuracy of 76.81% (86.42% sensitivity and 62.83% specificity). Our results suggest that the Bayesian framework is preferred to implement our MLP classifiers. The proposed BY-MLP networks could be used for early OSAS detection. They could contribute to overcome the difficulties of nocturnal polysomnography (PSG) and thus reduce the demand for these studies.
Resumo:
We are concerned with the problem of image segmentation in which each pixel is assigned to one of a predefined finite number of classes. In Bayesian image analysis, this requires fusing together local predictions for the class labels with a prior model of segmentations. Markov Random Fields (MRFs) have been used to incorporate some of this prior knowledge, but this not entirely satisfactory as inference in MRFs is NP-hard. The multiscale quadtree model of Bouman and Shapiro (1994) is an attractive alternative, as this is a tree-structured belief network in which inference can be carried out in linear time (Pearl 1988). It is an hierarchical model where the bottom-level nodes are pixels, and higher levels correspond to downsampled versions of the image. The conditional-probability tables (CPTs) in the belief network encode the knowledge of how the levels interact. In this paper we discuss two methods of learning the CPTs given training data, using (a) maximum likelihood and the EM algorithm and (b) emphconditional maximum likelihood (CML). Segmentations obtained using networks trained by CML show a statistically-significant improvement in performance on synthetic images. We also demonstrate the methods on a real-world outdoor-scene segmentation task.
Resumo:
This thesis is concerned with approximate inference in dynamical systems, from a variational Bayesian perspective. When modelling real world dynamical systems, stochastic differential equations appear as a natural choice, mainly because of their ability to model the noise of the system by adding a variant of some stochastic process to the deterministic dynamics. Hence, inference in such processes has drawn much attention. Here two new extended frameworks are derived and presented that are based on basis function expansions and local polynomial approximations of a recently proposed variational Bayesian algorithm. It is shown that the new extensions converge to the original variational algorithm and can be used for state estimation (smoothing). However, the main focus is on estimating the (hyper-) parameters of these systems (i.e. drift parameters and diffusion coefficients). The new methods are numerically validated on a range of different systems which vary in dimensionality and non-linearity. These are the Ornstein-Uhlenbeck process, for which the exact likelihood can be computed analytically, the univariate and highly non-linear, stochastic double well and the multivariate chaotic stochastic Lorenz '63 (3-dimensional model). The algorithms are also applied to the 40 dimensional stochastic Lorenz '96 system. In this investigation these new approaches are compared with a variety of other well known methods such as the ensemble Kalman filter / smoother, a hybrid Monte Carlo sampler, the dual unscented Kalman filter (for jointly estimating the systems states and model parameters) and full weak-constraint 4D-Var. Empirical analysis of their asymptotic behaviour as a function of observation density or length of time window increases is provided.
Resumo:
We consider the problem of assigning an input vector to one of m classes by predicting P(c|x) for c=1,...,m. For a two-class problem, the probability of class one given x is estimated by s(y(x)), where s(y)=1/(1+e-y). A Gaussian process prior is placed on y(x), and is combined with the training data to obtain predictions for new x points. We provide a Bayesian treatment, integrating over uncertainty in y and in the parameters that control the Gaussian process prior the necessary integration over y is carried out using Laplace's approximation. The method is generalized to multiclass problems (m>2) using the softmax function. We demonstrate the effectiveness of the method on a number of datasets.
Resumo:
This work introduces a new variational Bayes data assimilation method for the stochastic estimation of precipitation dynamics using radar observations for short term probabilistic forecasting (nowcasting). A previously developed spatial rainfall model based on the decomposition of the observed precipitation field using a basis function expansion captures the precipitation intensity from radar images as a set of ‘rain cells’. The prior distributions for the basis function parameters are carefully chosen to have a conjugate structure for the precipitation field model to allow a novel variational Bayes method to be applied to estimate the posterior distributions in closed form, based on solving an optimisation problem, in a spirit similar to 3D VAR analysis, but seeking approximations to the posterior distribution rather than simply the most probable state. A hierarchical Kalman filter is used to estimate the advection field based on the assimilated precipitation fields at two times. The model is applied to tracking precipitation dynamics in a realistic setting, using UK Met Office radar data from both a summer convective event and a winter frontal event. The performance of the model is assessed both traditionally and using probabilistic measures of fit based on ROC curves. The model is shown to provide very good assimilation characteristics, and promising forecast skill. Improvements to the forecasting scheme are discussed
Resumo:
The principled statistical application of Gaussian random field models used in geostatistics has historically been limited to data sets of a small size. This limitation is imposed by the requirement to store and invert the covariance matrix of all the samples to obtain a predictive distribution at unsampled locations, or to use likelihood-based covariance estimation. Various ad hoc approaches to solve this problem have been adopted, such as selecting a neighborhood region and/or a small number of observations to use in the kriging process, but these have no sound theoretical basis and it is unclear what information is being lost. In this article, we present a Bayesian method for estimating the posterior mean and covariance structures of a Gaussian random field using a sequential estimation algorithm. By imposing sparsity in a well-defined framework, the algorithm retains a subset of “basis vectors” that best represent the “true” posterior Gaussian random field model in the relative entropy sense. This allows a principled treatment of Gaussian random field models on very large data sets. The method is particularly appropriate when the Gaussian random field model is regarded as a latent variable model, which may be nonlinearly related to the observations. We show the application of the sequential, sparse Bayesian estimation in Gaussian random field models and discuss its merits and drawbacks.
Resumo:
The assessment of the reliability of systems which learn from data is a key issue to investigate thoroughly before the actual application of information processing techniques to real-world problems. Over the recent years Gaussian processes and Bayesian neural networks have come to the fore and in this thesis their generalisation capabilities are analysed from theoretical and empirical perspectives. Upper and lower bounds on the learning curve of Gaussian processes are investigated in order to estimate the amount of data required to guarantee a certain level of generalisation performance. In this thesis we analyse the effects on the bounds and the learning curve induced by the smoothness of stochastic processes described by four different covariance functions. We also explain the early, linearly-decreasing behaviour of the curves and we investigate the asymptotic behaviour of the upper bounds. The effect of the noise and the characteristic lengthscale of the stochastic process on the tightness of the bounds are also discussed. The analysis is supported by several numerical simulations. The generalisation error of a Gaussian process is affected by the dimension of the input vector and may be decreased by input-variable reduction techniques. In conventional approaches to Gaussian process regression, the positive definite matrix estimating the distance between input points is often taken diagonal. In this thesis we show that a general distance matrix is able to estimate the effective dimensionality of the regression problem as well as to discover the linear transformation from the manifest variables to the hidden-feature space, with a significant reduction of the input dimension. Numerical simulations confirm the significant superiority of the general distance matrix with respect to the diagonal one.In the thesis we also present an empirical investigation of the generalisation errors of neural networks trained by two Bayesian algorithms, the Markov Chain Monte Carlo method and the evidence framework; the neural networks have been trained on the task of labelling segmented outdoor images.
Resumo:
This thesis addresses data assimilation, which typically refers to the estimation of the state of a physical system given a model and observations, and its application to short-term precipitation forecasting. A general introduction to data assimilation is given, both from a deterministic and' stochastic point of view. Data assimilation algorithms are reviewed, in the static case (when no dynamics are involved), then in the dynamic case. A double experiment on two non-linear models, the Lorenz 63 and the Lorenz 96 models, is run and the comparative performance of the methods is discussed in terms of quality of the assimilation, robustness "in the non-linear regime and computational time. Following the general review and analysis, data assimilation is discussed in the particular context of very short-term rainfall forecasting (nowcasting) using radar images. An extended Bayesian precipitation nowcasting model is introduced. The model is stochastic in nature and relies on the spatial decomposition of the rainfall field into rain "cells". Radar observations are assimilated using a Variational Bayesian method in which the true posterior distribution of the parameters is approximated by a more tractable distribution. The motion of the cells is captured by a 20 Gaussian process. The model is tested on two precipitation events, the first dominated by convective showers, the second by precipitation fronts. Several deterministic and probabilistic validation methods are applied and the model is shown to retain reasonable prediction skill at up to 3 hours lead time. Extensions to the model are discussed.