901 resultados para Asymptotic behaviour, Bayesian methods, Mixture models, Overfitting, Posterior concentration


Relevância:

50.00% 50.00%

Publicador:

Resumo:

Bayesian phylogenetic analyses are now very popular in systematics and molecular evolution because they allow the use of much more realistic models than currently possible with maximum likelihood methods. There are, however, a growing number of examples in which large Bayesian posterior clade probabilities are associated with very short edge lengths and low values for non-Bayesian measures of support such as nonparametric bootstrapping. For the four-taxon case when the true tree is the star phylogeny, Bayesian analyses become increasingly unpredictable in their preference for one of the three possible resolved tree topologies as data set size increases. This leads to the prediction that hard (or near-hard) polytomies in nature will cause unpredictable behavior in Bayesian analyses, with arbitrary resolutions of the polytomy receiving very high posterior probabilities in some cases. We present a simple solution to this problem involving a reversible-jump Markov chain Monte Carlo (MCMC) algorithm that allows exploration of all of tree space, including unresolved tree topologies with one or more polytomies. The reversible-jump MCMC approach allows prior distributions to place some weight on less-resolved tree topologies, which eliminates misleadingly high posteriors associated with arbitrary resolutions of hard polytomies. Fortunately, assigning some prior probability to polytomous tree topologies does not appear to come with a significant cost in terms of the ability to assess the level of support for edges that do exist in the true tree. Methods are discussed for applying arbitrary prior distributions to tree topologies of varying resolution, and an empirical example showing evidence of polytomies is analyzed and discussed.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Bayesian phylogenetic analyses are now very popular in systematics and molecular evolution because they allow the use of much more realistic models than currently possible with maximum likelihood methods. There are, however, a growing number of examples in which large Bayesian posterior clade probabilities are associated with very short edge lengths and low values for non-Bayesian measures of support such as nonparametric bootstrapping. For the four-taxon case when the true tree is the star phylogeny, Bayesian analyses become increasingly unpredictable in their preference for one of the three possible resolved tree topologies as data set size increases. This leads to the prediction that hard (or near-hard) polytomies in nature will cause unpredictable behavior in Bayesian analyses, with arbitrary resolutions of the polytomy receiving very high posterior probabilities in some cases. We present a simple solution to this problem involving a reversible-jump Markov chain Monte Carlo (MCMC) algorithm that allows exploration of all of tree space, including unresolved tree topologies with one or more polytomies. The reversible-jump MCMC approach allows prior distributions to place some weight on less-resolved tree topologies, which eliminates misleadingly high posteriors associated with arbitrary resolutions of hard polytomies. Fortunately, assigning some prior probability to polytomous tree topologies does not appear to come with a significant cost in terms of the ability to assess the level of support for edges that do exist in the true tree. Methods are discussed for applying arbitrary prior distributions to tree topologies of varying resolution, and an empirical example showing evidence of polytomies is analyzed and discussed.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

In geographical epidemiology, maps of disease rates and disease risk provide a spatial perspective for researching disease etiology. For rare diseases or when the population base is small, the rate and risk estimates may be unstable. Empirical Bayesian (EB) methods have been used to spatially smooth the estimates by permitting an area estimate to "borrow strength" from its neighbors. Such EB methods include the use of a Gamma model, of a James-Stein estimator, and of a conditional autoregressive (CAR) process. A fully Bayesian analysis of the CAR process is proposed. One advantage of this fully Bayesian analysis is that it can be implemented simply by using repeated sampling from the posterior densities. Use of a Markov chain Monte Carlo technique such as Gibbs sampler was not necessary. Direct resampling from the posterior densities provides exact small sample inferences instead of the approximate asymptotic analyses of maximum likelihood methods (Clayton & Kaldor, 1987). Further, the proposed CAR model provides for covariates to be included in the model. A simulation demonstrates the effect of sample size on the fully Bayesian analysis of the CAR process. The methods are applied to lip cancer data from Scotland, and the results are compared. ^

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Accurate quantitative estimation of exposure using retrospective data has been one of the most challenging tasks in the exposure assessment field. To improve these estimates, some models have been developed using published exposure databases with their corresponding exposure determinants. These models are designed to be applied to reported exposure determinants obtained from study subjects or exposure levels assigned by an industrial hygienist, so quantitative exposure estimates can be obtained. ^ In an effort to improve the prediction accuracy and generalizability of these models, and taking into account that the limitations encountered in previous studies might be due to limitations in the applicability of traditional statistical methods and concepts, the use of computer science- derived data analysis methods, predominantly machine learning approaches, were proposed and explored in this study. ^ The goal of this study was to develop a set of models using decision trees/ensemble and neural networks methods to predict occupational outcomes based on literature-derived databases, and compare, using cross-validation and data splitting techniques, the resulting prediction capacity to that of traditional regression models. Two cases were addressed: the categorical case, where the exposure level was measured as an exposure rating following the American Industrial Hygiene Association guidelines and the continuous case, where the result of the exposure is expressed as a concentration value. Previously developed literature-based exposure databases for 1,1,1 trichloroethane, methylene dichloride and, trichloroethylene were used. ^ When compared to regression estimations, results showed better accuracy of decision trees/ensemble techniques for the categorical case while neural networks were better for estimation of continuous exposure values. Overrepresentation of classes and overfitting were the main causes for poor neural network performance and accuracy. Estimations based on literature-based databases using machine learning techniques might provide an advantage when they are applied to other methodologies that combine `expert inputs' with current exposure measurements, like the Bayesian Decision Analysis tool. The use of machine learning techniques to more accurately estimate exposures from literature-based exposure databases might represent the starting point for the independence from the expert judgment.^

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The objective of this thesis is the development of cooperative localization and tracking algorithms using nonparametric message passing techniques. In contrast to the most well-known techniques, the goal is to estimate the posterior probability density function (PDF) of the position of each sensor. This problem can be solved using Bayesian approach, but it is intractable in general case. Nevertheless, the particle-based approximation (via nonparametric representation), and an appropriate factorization of the joint PDFs (using message passing methods), make Bayesian approach acceptable for inference in sensor networks. The well-known method for this problem, nonparametric belief propagation (NBP), can lead to inaccurate beliefs and possible non-convergence in loopy networks. Therefore, we propose four novel algorithms which alleviate these problems: nonparametric generalized belief propagation (NGBP) based on junction tree (NGBP-JT), NGBP based on pseudo-junction tree (NGBP-PJT), NBP based on spanning trees (NBP-ST), and uniformly-reweighted NBP (URW-NBP). We also extend NBP for cooperative localization in mobile networks. In contrast to the previous methods, we use an optional smoothing, provide a novel communication protocol, and increase the efficiency of the sampling techniques. Moreover, we propose novel algorithms for distributed tracking, in which the goal is to track the passive object which cannot locate itself. In particular, we develop distributed particle filtering (DPF) based on three asynchronous belief consensus (BC) algorithms: standard belief consensus (SBC), broadcast gossip (BG), and belief propagation (BP). Finally, the last part of this thesis includes the experimental analysis of some of the proposed algorithms, in which we found that the results based on real measurements are very similar with the results based on theoretical models.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The authors are from UPM and are relatively grouped, and all have intervened in different academic or real cases on the subject, at different times as being of different age. With precedent from E. Torroja and A. Páez in Madrid Spain Safety Probabilistic models for concrete about 1957, now in ICOSSAR conferences, author J.M. Antón involved since autumn 1967 for euro-steel construction in CECM produced a math model for independent load superposition reductions, and using it a load coefficient pattern for codes in Rome Feb. 1969, practically adopted for European constructions, giving in JCSS Lisbon Feb. 1974 suggestion of union for concrete-steel-al.. That model uses model for loads like Gumbel type I, for 50 years for one type of load, reduced to 1 year to be added to other independent loads, the sum set in Gumbel theories to 50 years return period, there are parallel models. A complete reliability system was produced, including non linear effects as from buckling, phenomena considered somehow in actual Construction Eurocodes produced from Model Codes. The system was considered by author in CEB in presence of Hydraulic effects from rivers, floods, sea, in reference with actual practice. When redacting a Road Drainage Norm in MOPU Spain an optimization model was realized by authors giving a way to determine the figure of Return Period, 10 to 50 years, for the cases of hydraulic flows to be considered in road drainage. Satisfactory examples were a stream in SE of Spain with Gumbel Type I model and a paper of Ven Te Chow with Mississippi in Keokuk using Gumbel type II, and the model can be modernized with more varied extreme laws. In fact in the MOPU drainage norm the redacting commission acted also as expert to set a table of return periods for elements of road drainage, in fact as a multi-criteria complex decision system. These precedent ideas were used e.g. in wide Codes, indicated in symposia or meetings, but not published in journals in English, and a condensate of contributions of authors is presented. The authors are somehow involved in optimization for hydraulic and agro planning, and give modest hints of intended applications in presence of agro and environment planning as a selection of the criteria and utility functions involved in bayesian, multi-criteria or mixed decision systems. Modest consideration is made of changing in climate, and on the production and commercial systems, and on others as social and financial.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Motivation: An important problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. We provide a straightforward and easily implemented method for estimating the posterior probability that an individual gene is null. The problem can be expressed in a two-component mixture framework, using an empirical Bayes approach. Current methods of implementing this approach either have some limitations due to the minimal assumptions made or with more specific assumptions are computationally intensive. Results: By converting to a z-score the value of the test statistic used to test the significance of each gene, we propose a simple two-component normal mixture that models adequately the distribution of this score. The usefulness of our approach is demonstrated on three real datasets.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Objective: It is usual that data collected from routine clinical care is sparse and unable to support the more complex pharmacokinetic (PK) models that may have been reported in previous rich data studies. Informative priors may be a pre-requisite for model development. The aim of this study was to estimate the population PK parameters of sirolimus using a fully Bayesian approach with informative priors. Methods: Informative priors including prior mean and precision of the prior mean were elicited from previous published studies using a meta-analytic technique. Precision of between-subject variability was determined by simulations from a Wishart distribution using MATLAB (version 6.5). Concentration-time data of sirolimus retrospectively collected from kidney transplant patients were analysed using WinBUGS (version 1.3). The candidate models were either one- or two-compartment with first order absorption and first order elimination. Model discrimination was based on computation of the posterior odds supporting the model. Results: A total of 315 concentration-time points were obtained from 25 patients. Most data were clustered at trough concentrations with range of 1.6 to 77 hours post-dose. Using informative priors, either a one- or two-compartment model could be used to describe the data. When a one-compartment model was applied, information was gained from the data for the value of apparent clearance (CL/F = 18.5 L/h), and apparent volume of distribution (V/F = 1406 L) but no information was gained about the absorption rate constant (ka). When a two-compartment model was fitted to the data, the data were informative about CL/F, apparent inter-compartmental clearance, and apparent volume of distribution of the peripheral compartment (13.2 L/h, 20.8 L/h, and 579 L, respectively). The posterior distribution of the volume distribution of central compartment and ka were the same as priors. The posterior odds for the two-compartment model was 8.1, indicating the data supported the two-compartment model. Conclusion: The use of informative priors supported the choice of a more complex and informative model that would otherwise have not been supported by the sparse data.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Mixture Density Networks (MDNs) are a well-established method for modelling the conditional probability density which is useful for complex multi-valued functions where regression methods (such as MLPs) fail. In this paper we extend earlier research of a regularisation method for a special case of MDNs to the general case using evidence based regularisation and we show how the Hessian of the MDN error function can be evaluated using R-propagation. The method is tested on two data sets and compared with early stopping.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The problem of regression under Gaussian assumptions is treated generally. The relationship between Bayesian prediction, regularization and smoothing is elucidated. The ideal regression is the posterior mean and its computation scales as O(n3), where n is the sample size. We show that the optimal m-dimensional linear model under a given prior is spanned by the first m eigenfunctions of a covariance operator, which is a trace-class operator. This is an infinite dimensional analogue of principal component analysis. The importance of Hilbert space methods to practical statistics is also discussed.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The retrieval of wind vectors from satellite scatterometer observations is a non-linear inverse problem. A common approach to solving inverse problems is to adopt a Bayesian framework and to infer the posterior distribution of the parameters of interest given the observations by using a likelihood model relating the observations to the parameters, and a prior distribution over the parameters. We show how Gaussian process priors can be used efficiently with a variety of likelihood models, using local forward (observation) models and direct inverse models for the scatterometer. We present an enhanced Markov chain Monte Carlo method to sample from the resulting multimodal posterior distribution. We go on to show how the computational complexity of the inference can be controlled by using a sparse, sequential Bayes algorithm for estimation with Gaussian processes. This helps to overcome the most serious barrier to the use of probabilistic, Gaussian process methods in remote sensing inverse problems, which is the prohibitively large size of the data sets. We contrast the sampling results with the approximations that are found by using the sparse, sequential Bayes algorithm.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Regression problems are concerned with predicting the values of one or more continuous quantities, given the values of a number of input variables. For virtually every application of regression, however, it is also important to have an indication of the uncertainty in the predictions. Such uncertainties are expressed in terms of the error bars, which specify the standard deviation of the distribution of predictions about the mean. Accurate estimate of error bars is of practical importance especially when safety and reliability is an issue. The Bayesian view of regression leads naturally to two contributions to the error bars. The first arises from the intrinsic noise on the target data, while the second comes from the uncertainty in the values of the model parameters which manifests itself in the finite width of the posterior distribution over the space of these parameters. The Hessian matrix which involves the second derivatives of the error function with respect to the weights is needed for implementing the Bayesian formalism in general and estimating the error bars in particular. A study of different methods for evaluating this matrix is given with special emphasis on the outer product approximation method. The contribution of the uncertainty in model parameters to the error bars is a finite data size effect, which becomes negligible as the number of data points in the training set increases. A study of this contribution is given in relation to the distribution of data in input space. It is shown that the addition of data points to the training set can only reduce the local magnitude of the error bars or leave it unchanged. Using the asymptotic limit of an infinite data set, it is shown that the error bars have an approximate relation to the density of data in input space.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

This thesis addresses data assimilation, which typically refers to the estimation of the state of a physical system given a model and observations, and its application to short-term precipitation forecasting. A general introduction to data assimilation is given, both from a deterministic and' stochastic point of view. Data assimilation algorithms are reviewed, in the static case (when no dynamics are involved), then in the dynamic case. A double experiment on two non-linear models, the Lorenz 63 and the Lorenz 96 models, is run and the comparative performance of the methods is discussed in terms of quality of the assimilation, robustness "in the non-linear regime and computational time. Following the general review and analysis, data assimilation is discussed in the particular context of very short-term rainfall forecasting (nowcasting) using radar images. An extended Bayesian precipitation nowcasting model is introduced. The model is stochastic in nature and relies on the spatial decomposition of the rainfall field into rain "cells". Radar observations are assimilated using a Variational Bayesian method in which the true posterior distribution of the parameters is approximated by a more tractable distribution. The motion of the cells is captured by a 20 Gaussian process. The model is tested on two precipitation events, the first dominated by convective showers, the second by precipitation fronts. Several deterministic and probabilistic validation methods are applied and the model is shown to retain reasonable prediction skill at up to 3 hours lead time. Extensions to the model are discussed.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

As more of the economy moves from traditional manufacturing to the service sector, the nature of work is becoming less tangible and thus, the representation of human behaviour in models is becoming more important. Representing human behaviour and decision making in models is challenging, both in terms of capturing the essence of the processes, and also the way that those behaviours and decisions are or can be represented in the models themselves. In order to advance understanding in this area, a useful first step is to evaluate and start to classify the various types of behaviour and decision making that are required to be modelled. This talk will attempt to set out and provide an initial classification of the different types of behaviour and decision making that a modeller might want to represent in a model. Then, it will be useful to start to assess the main methods of simulation in terms of their capability in representing these various aspects. The three main simulation methods, System Dynamics, Agent Based Modelling and Discrete Event Simulation all achieve this to varying degrees. There is some evidence that all three methods can, within limits, represent the key aspects of the system being modelled. The three simulation approaches are then assessed for their suitability in modelling these various aspects. Illustration of behavioural modelling will be provided from cases in supply chain management, evacuation modelling and rail disruption.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

This paper presents the main achievements of the author’s PhD dissertation. The work is dedicated to mathematical and semi-empirical approaches applied to the case of Bulgarian wildland fires. After the introductory explanations, short information from every chapter is extracted to cover the main parts of the obtained results. The methods used are described in brief and main outcomes are listed. ACM Computing Classification System (1998): D.1.3, D.2.0, K.5.1.