609 resultados para bayesian methods


Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Now in its second edition, this book describes tools that are commonly used in transportation data analysis. The first part of the text provides statistical fundamentals while the second part presents continuous dependent variable models. With a focus on count and discrete dependent variable models, the third part features new chapters on mixed logit models, logistic regression, and ordered probability models. The last section provides additional coverage of Bayesian statistical modeling, including Bayesian inference and Markov chain Monte Carlo methods. Data sets are available online to use with the modeling techniques discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Identifying crash “hotspots”, “blackspots”, “sites with promise”, or “high risk” locations is standard practice in departments of transportation throughout the US. The literature is replete with the development and discussion of statistical methods for hotspot identification (HSID). Theoretical derivations and empirical studies have been used to weigh the benefits of various HSID methods; however, a small number of studies have used controlled experiments to systematically assess various methods. Using experimentally derived simulated data—which are argued to be superior to empirical data, three hot spot identification methods observed in practice are evaluated: simple ranking, confidence interval, and Empirical Bayes. Using simulated data, sites with promise are known a priori, in contrast to empirical data where high risk sites are not known for certain. To conduct the evaluation, properties of observed crash data are used to generate simulated crash frequency distributions at hypothetical sites. A variety of factors is manipulated to simulate a host of ‘real world’ conditions. Various levels of confidence are explored, and false positives (identifying a safe site as high risk) and false negatives (identifying a high risk site as safe) are compared across methods. Finally, the effects of crash history duration in the three HSID approaches are assessed. The results illustrate that the Empirical Bayes technique significantly outperforms ranking and confidence interval techniques (with certain caveats). As found by others, false positives and negatives are inversely related. Three years of crash history appears, in general, to provide an appropriate crash history duration.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Statisticians along with other scientists have made significant computational advances that enable the estimation of formerly complex statistical models. The Bayesian inference framework combined with Markov chain Monte Carlo estimation methods such as the Gibbs sampler enable the estimation of discrete choice models such as the multinomial logit (MNL) model. MNL models are frequently applied in transportation research to model choice outcomes such as mode, destination, or route choices or to model categorical outcomes such as crash outcomes. Recent developments allow for the modification of the potentially limiting assumptions of MNL such as the independence from irrelevant alternatives (IIA) property. However, relatively little transportation-related research has focused on Bayesian MNL models, the tractability of which is of great value to researchers and practitioners alike. This paper addresses MNL model specification issues in the Bayesian framework, such as the value of including prior information on parameters, allowing for nonlinear covariate effects, and extensions to random parameter models, so changing the usual limiting IIA assumption. This paper also provides an example that demonstrates, using route-choice data, the considerable potential of the Bayesian MNL approach with many transportation applications. This paper then concludes with a discussion of the pros and cons of this Bayesian approach and identifies when its application is worthwhile

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We estimate the parameters of a stochastic process model for a macroparasite population within a host using approximate Bayesian computation (ABC). The immunity of the host is an unobserved model variable and only mature macroparasites at sacrifice of the host are counted. With very limited data, process rates are inferred reasonably precisely. Modeling involves a three variable Markov process for which the observed data likelihood is computationally intractable. ABC methods are particularly useful when the likelihood is analytically or computationally intractable. The ABC algorithm we present is based on sequential Monte Carlo, is adaptive in nature, and overcomes some drawbacks of previous approaches to ABC. The algorithm is validated on a test example involving simulated data from an autologistic model before being used to infer parameters of the Markov process model for experimental data. The fitted model explains the observed extra-binomial variation in terms of a zero-one immunity variable, which has a short-lived presence in the host.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a novel approach for developing summary statistics for use in approximate Bayesian computation (ABC) algorithms by using indirect inference. ABC methods are useful for posterior inference in the presence of an intractable likelihood function. In the indirect inference approach to ABC the parameters of an auxiliary model fitted to the data become the summary statistics. Although applicable to any ABC technique, we embed this approach within a sequential Monte Carlo algorithm that is completely adaptive and requires very little tuning. This methodological development was motivated by an application involving data on macroparasite population evolution modelled by a trivariate stochastic process for which there is no tractable likelihood function. The auxiliary model here is based on a beta–binomial distribution. The main objective of the analysis is to determine which parameters of the stochastic model are estimable from the observed data on mature parasite worms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Plant biosecurity requires statistical tools to interpret field surveillance data in order to manage pest incursions that threaten crop production and trade. Ultimately, management decisions need to be based on the probability that an area is infested or free of a pest. Current informal approaches to delimiting pest extent rely upon expert ecological interpretation of presence / absence data over space and time. Hierarchical Bayesian models provide a cohesive statistical framework that can formally integrate the available information on both pest ecology and data. The overarching method involves constructing an observation model for the surveillance data, conditional on the hidden extent of the pest and uncertain detection sensitivity. The extent of the pest is then modelled as a dynamic invasion process that includes uncertainty in ecological parameters. Modelling approaches to assimilate this information are explored through case studies on spiralling whitefly, Aleurodicus dispersus and red banded mango caterpillar, Deanolis sublimbalis. Markov chain Monte Carlo simulation is used to estimate the probable extent of pests, given the observation and process model conditioned by surveillance data. Statistical methods, based on time-to-event models, are developed to apply hierarchical Bayesian models to early detection programs and to demonstrate area freedom from pests. The value of early detection surveillance programs is demonstrated through an application to interpret surveillance data for exotic plant pests with uncertain spread rates. The model suggests that typical early detection programs provide a moderate reduction in the probability of an area being infested but a dramatic reduction in the expected area of incursions at a given time. Estimates of spiralling whitefly extent are examined at local, district and state-wide scales. The local model estimates the rate of natural spread and the influence of host architecture, host suitability and inspector efficiency. These parameter estimates can support the development of robust surveillance programs. Hierarchical Bayesian models for the human-mediated spread of spiralling whitefly are developed for the colonisation of discrete cells connected by a modified gravity model. By estimating dispersal parameters, the model can be used to predict the extent of the pest over time. An extended model predicts the climate restricted distribution of the pest in Queensland. These novel human-mediated movement models are well suited to demonstrating area freedom at coarse spatio-temporal scales. At finer scales, and in the presence of ecological complexity, exploratory models are developed to investigate the capacity for surveillance information to estimate the extent of red banded mango caterpillar. It is apparent that excessive uncertainty about observation and ecological parameters can impose limits on inference at the scales required for effective management of response programs. The thesis contributes novel statistical approaches to estimating the extent of pests and develops applications to assist decision-making across a range of plant biosecurity surveillance activities. Hierarchical Bayesian modelling is demonstrated as both a useful analytical tool for estimating pest extent and a natural investigative paradigm for developing and focussing biosecurity programs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Different from conventional methods for structural reliability evaluation, such as, first/second-order reliability methods (FORM/SORM) or Monte Carlo simulation based on corresponding limit state functions, a novel approach based on dynamic objective oriented Bayesian network (DOOBN) for prediction of structural reliability of a steel bridge element has been proposed in this paper. The DOOBN approach can effectively model the deterioration processes of a steel bridge element and predict their structural reliability over time. This approach is also able to achieve Bayesian updating with observed information from measurements, monitoring and visual inspection. Moreover, the computational capacity embedded in the approach can be used to facilitate integrated management and maintenance optimization in a bridge system. A steel bridge girder is used to validate the proposed approach. The predicted results are compared with those evaluated by FORM method.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article explores the use of probabilistic classification, namely finite mixture modelling, for identification of complex disease phenotypes, given cross-sectional data. In particular, if focuses on posterior probabilities of subgroup membership, a standard output of finite mixture modelling, and how the quantification of uncertainty in these probabilities can lead to more detailed analyses. Using a Bayesian approach, we describe two practical uses of this uncertainty: (i) as a means of describing a person’s membership to a single or multiple latent subgroups and (ii) as a means of describing identified subgroups by patient-centred covariates not included in model estimation. These proposed uses are demonstrated on a case study in Parkinson’s disease (PD), where latent subgroups are identified using multiple symptoms from the Unified Parkinson’s Disease Rating Scale (UPDRS).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We consider the problem of how to efficiently and safely design dose finding studies. Both current and novel utility functions are explored using Bayesian adaptive design methodology for the estimation of a maximum tolerated dose (MTD). In particular, we explore widely adopted approaches such as the continual reassessment method and minimizing the variance of the estimate of an MTD. New utility functions are constructed in the Bayesian framework and are evaluated against current approaches. To reduce computing time, importance sampling is implemented to re-weight posterior samples thus avoiding the need to draw samples using Markov chain Monte Carlo techniques. Further, as such studies are generally first-in-man, the safety of patients is paramount. We therefore explore methods for the incorporation of safety considerations into utility functions to ensure that only safe and well-predicted doses are administered. The amalgamation of Bayesian methodology, adaptive design and compound utility functions is termed adaptive Bayesian compound design (ABCD). The performance of this amalgamation of methodology is investigated via the simulation of dose finding studies. The paper concludes with a discussion of results and extensions that could be included into our approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We review the literature on the combined effect of asbestos exposure and smoking on lung cancer, and explore a Bayesian approach to assess evidence of interaction. Previous approaches have focussed on separate tests for an additive or multiplicative relation. We extend these approaches by exploring the strength of evidence for either relation using approaches which allow the data to choose between both models. We then compare the different approaches.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Genetic research of complex diseases is a challenging, but exciting, area of research. The early development of the research was limited, however, until the completion of the Human Genome and HapMap projects, along with the reduction in the cost of genotyping, which paves the way for understanding the genetic composition of complex diseases. In this thesis, we focus on the statistical methods for two aspects of genetic research: phenotype definition for diseases with complex etiology and methods for identifying potentially associated Single Nucleotide Polymorphisms (SNPs) and SNP-SNP interactions. With regard to phenotype definition for diseases with complex etiology, we firstly investigated the effects of different statistical phenotyping approaches on the subsequent analysis. In light of the findings, and the difficulties in validating the estimated phenotype, we proposed two different methods for reconciling phenotypes of different models using Bayesian model averaging as a coherent mechanism for accounting for model uncertainty. In the second part of the thesis, the focus is turned to the methods for identifying associated SNPs and SNP interactions. We review the use of Bayesian logistic regression with variable selection for SNP identification and extended the model for detecting the interaction effects for population based case-control studies. In this part of study, we also develop a machine learning algorithm to cope with the large scale data analysis, namely modified Logic Regression with Genetic Program (MLR-GEP), which is then compared with the Bayesian model, Random Forests and other variants of logic regression.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis investigates profiling and differentiating customers through the use of statistical data mining techniques. The business application of our work centres on examining individuals’ seldomly studied yet critical consumption behaviour over an extensive time period within the context of the wireless telecommunication industry; consumption behaviour (as oppose to purchasing behaviour) is behaviour that has been performed so frequently that it become habitual and involves minimal intentions or decision making. Key variables investigated are the activity initialised timestamp and cell tower location as well as the activity type and usage quantity (e.g., voice call with duration in seconds); and the research focuses are on customers’ spatial and temporal usage behaviour. The main methodological emphasis is on the development of clustering models based on Gaussian mixture models (GMMs) which are fitted with the use of the recently developed variational Bayesian (VB) method. VB is an efficient deterministic alternative to the popular but computationally demandingMarkov chainMonte Carlo (MCMC) methods. The standard VBGMMalgorithm is extended by allowing component splitting such that it is robust to initial parameter choices and can automatically and efficiently determine the number of components. The new algorithm we propose allows more effective modelling of individuals’ highly heterogeneous and spiky spatial usage behaviour, or more generally human mobility patterns; the term spiky describes data patterns with large areas of low probability mixed with small areas of high probability. Customers are then characterised and segmented based on the fitted GMM which corresponds to how each of them uses the products/services spatially in their daily lives; this is essentially their likely lifestyle and occupational traits. Other significant research contributions include fitting GMMs using VB to circular data i.e., the temporal usage behaviour, and developing clustering algorithms suitable for high dimensional data based on the use of VB-GMM.