868 resultados para bayesian hierarchical models
Resumo:
Motorcycles are overrepresented in road traffic crashes and particularly vulnerable at signalized intersections. The objective of this study is to identify causal factors affecting the motorcycle crashes at both four-legged and T signalized intersections. Treating the data in time-series cross-section panels, this study explores different Hierarchical Poisson models and found that the model allowing autoregressive lag 1 dependent specification in the error term is the most suitable. Results show that the number of lanes at the four-legged signalized intersections significantly increases motorcycle crashes largely because of the higher exposure resulting from higher motorcycle accumulation at the stop line. Furthermore, the presence of a wide median and an uncontrolled left-turn lane at major roadways of four-legged intersections exacerbate this potential hazard. For T signalized intersections, the presence of exclusive right-turn lane at both major and minor roadways and an uncontrolled left-turn lane at major roadways of T intersections increases motorcycle crashes. Motorcycle crashes increase on high-speed roadways because they are more vulnerable and less likely to react in time during conflicts. The presence of red light cameras reduces motorcycle crashes significantly for both four-legged and T intersections. With the red-light camera, motorcycles are less exposed to conflicts because it is observed that they are more disciplined in queuing at the stop line and less likely to jump start at the start of green.
Resumo:
Quality oriented management systems and methods have become the dominant business and governance paradigm. From this perspective, satisfying customers’ expectations by supplying reliable, good quality products and services is the key factor for an organization and even government. During recent decades, Statistical Quality Control (SQC) methods have been developed as the technical core of quality management and continuous improvement philosophy and now are being applied widely to improve the quality of products and services in industrial and business sectors. Recently SQC tools, in particular quality control charts, have been used in healthcare surveillance. In some cases, these tools have been modified and developed to better suit the health sector characteristics and needs. It seems that some of the work in the healthcare area has evolved independently of the development of industrial statistical process control methods. Therefore analysing and comparing paradigms and the characteristics of quality control charts and techniques across the different sectors presents some opportunities for transferring knowledge and future development in each sectors. Meanwhile considering capabilities of Bayesian approach particularly Bayesian hierarchical models and computational techniques in which all uncertainty are expressed as a structure of probability, facilitates decision making and cost-effectiveness analyses. Therefore, this research investigates the use of quality improvement cycle in a health vii setting using clinical data from a hospital. The need of clinical data for monitoring purposes is investigated in two aspects. A framework and appropriate tools from the industrial context are proposed and applied to evaluate and improve data quality in available datasets and data flow; then a data capturing algorithm using Bayesian decision making methods is developed to determine economical sample size for statistical analyses within the quality improvement cycle. Following ensuring clinical data quality, some characteristics of control charts in the health context including the necessity of monitoring attribute data and correlated quality characteristics are considered. To this end, multivariate control charts from an industrial context are adapted to monitor radiation delivered to patients undergoing diagnostic coronary angiogram and various risk-adjusted control charts are constructed and investigated in monitoring binary outcomes of clinical interventions as well as postintervention survival time. Meanwhile, adoption of a Bayesian approach is proposed as a new framework in estimation of change point following control chart’s signal. This estimate aims to facilitate root causes efforts in quality improvement cycle since it cuts the search for the potential causes of detected changes to a tighter time-frame prior to the signal. This approach enables us to obtain highly informative estimates for change point parameters since probability distribution based results are obtained. Using Bayesian hierarchical models and Markov chain Monte Carlo computational methods, Bayesian estimators of the time and the magnitude of various change scenarios including step change, linear trend and multiple change in a Poisson process are developed and investigated. The benefits of change point investigation is revisited and promoted in monitoring hospital outcomes where the developed Bayesian estimator reports the true time of the shifts, compared to priori known causes, detected by control charts in monitoring rate of excess usage of blood products and major adverse events during and after cardiac surgery in a local hospital. The development of the Bayesian change point estimators are then followed in a healthcare surveillances for processes in which pre-intervention characteristics of patients are viii affecting the outcomes. In this setting, at first, the Bayesian estimator is extended to capture the patient mix, covariates, through risk models underlying risk-adjusted control charts. Variations of the estimator are developed to estimate the true time of step changes and linear trends in odds ratio of intensive care unit outcomes in a local hospital. Secondly, the Bayesian estimator is extended to identify the time of a shift in mean survival time after a clinical intervention which is being monitored by riskadjusted survival time control charts. In this context, the survival time after a clinical intervention is also affected by patient mix and the survival function is constructed using survival prediction model. The simulation study undertaken in each research component and obtained results highly recommend the developed Bayesian estimators as a strong alternative in change point estimation within quality improvement cycle in healthcare surveillances as well as industrial and business contexts. The superiority of the proposed Bayesian framework and estimators are enhanced when probability quantification, flexibility and generalizability of the developed model are also considered. The empirical results and simulations indicate that the Bayesian estimators are a strong alternative in change point estimation within quality improvement cycle in healthcare surveillances. The superiority of the proposed Bayesian framework and estimators are enhanced when probability quantification, flexibility and generalizability of the developed model are also considered. The advantages of the Bayesian approach seen in general context of quality control may also be extended in the industrial and business domains where quality monitoring was initially developed.
Resumo:
Spatial data are now prevalent in a wide range of fields including environmental and health science. This has led to the development of a range of approaches for analysing patterns in these data. In this paper, we compare several Bayesian hierarchical models for analysing point-based data based on the discretization of the study region, resulting in grid-based spatial data. The approaches considered include two parametric models and a semiparametric model. We highlight the methodology and computation for each approach. Two simulation studies are undertaken to compare the performance of these models for various structures of simulated point-based data which resemble environmental data. A case study of a real dataset is also conducted to demonstrate a practical application of the modelling approaches. Goodness-of-fit statistics are computed to compare estimates of the intensity functions. The deviance information criterion is also considered as an alternative model evaluation criterion. The results suggest that the adaptive Gaussian Markov random field model performs well for highly sparse point-based data where there are large variations or clustering across the space; whereas the discretized log Gaussian Cox process produces good fit in dense and clustered point-based data. One should generally consider the nature and structure of the point-based data in order to choose the appropriate method in modelling a discretized spatial point-based data.
Resumo:
description and analysis of geographically indexed health data with respect to demographic, environmental, behavioural, socioeconomic, genetic, and infectious risk factors (Elliott andWartenberg 2004). Disease maps can be useful for estimating relative risk; ecological analyses, incorporating area and/or individual-level covariates; or cluster analyses (Lawson 2009). As aggregated data are often more readily available, one common method of mapping disease is to aggregate the counts of disease at some geographical areal level, and present them as choropleth maps (Devesa et al. 1999; Population Health Division 2006). Therefore, this chapter will focus exclusively on methods appropriate for areal data...
Resumo:
Quantifying the health effects associated with simultaneous exposure to many air pollutants is now a research priority of the US EPA. Bayesian hierarchical models (BHM) have been extensively used in multisite time series studies of air pollution and health to estimate health effects of a single pollutant adjusted for potential confounding of other pollutants and other time-varying factors. However, when the scientific goal is to estimate the impacts of many pollutants jointly, a straightforward application of BHM is challenged by the need to specify a random-effect distribution on a high-dimensional vector of nuisance parameters, which often do not have an easy interpretation. In this paper we introduce a new BHM formulation, which we call "reduced BHM", aimed at analyzing clustered data sets in the presence of a large number of random effects that are not of primary scientific interest. At the first stage of the reduced BHM, we calculate the integrated likelihood of the parameter of interest (e.g. excess number of deaths attributed to simultaneous exposure to high levels of many pollutants). At the second stage, we specify a flexible random-effect distribution directly on the parameter of interest. The reduced BHM overcomes many of the challenges in the specification and implementation of full BHM in the context of a large number of nuisance parameters. In simulation studies we show that the reduced BHM performs comparably to the full BHM in many scenarios, and even performs better in some cases. Methods are applied to estimate location-specific and overall relative risks of cardiovascular hospital admissions associated with simultaneous exposure to elevated levels of particulate matter and ozone in 51 US counties during the period 1999-2005.
Resumo:
Traditional crash prediction models, such as generalized linear regression models, are incapable of taking into account the multilevel data structure, which extensively exists in crash data. Disregarding the possible within-group correlations can lead to the production of models giving unreliable and biased estimates of unknowns. This study innovatively proposes a -level hierarchy, viz. (Geographic region level – Traffic site level – Traffic crash level – Driver-vehicle unit level – Vehicle-occupant level) Time level, to establish a general form of multilevel data structure in traffic safety analysis. To properly model the potential cross-group heterogeneity due to the multilevel data structure, a framework of Bayesian hierarchical models that explicitly specify multilevel structure and correctly yield parameter estimates is introduced and recommended. The proposed method is illustrated in an individual-severity analysis of intersection crashes using the Singapore crash records. This study proved the importance of accounting for the within-group correlations and demonstrated the flexibilities and effectiveness of the Bayesian hierarchical method in modeling multilevel structure of traffic crash data.
Resumo:
This study proposes a full Bayes (FB) hierarchical modeling approach in traffic crash hotspot identification. The FB approach is able to account for all uncertainties associated with crash risk and various risk factors by estimating a posterior distribution of the site safety on which various ranking criteria could be based. Moreover, by use of hierarchical model specification, FB approach is able to flexibly take into account various heterogeneities of crash occurrence due to spatiotemporal effects on traffic safety. Using Singapore intersection crash data(1997-2006), an empirical evaluate was conducted to compare the proposed FB approach to the state-of-the-art approaches. Results show that the Bayesian hierarchical models with accommodation for site specific effect and serial correlation have better goodness-of-fit than non hierarchical models. Furthermore, all model-based approaches perform significantly better in safety ranking than the naive approach using raw crash count. The FB hierarchical models were found to significantly outperform the standard EB approach in correctly identifying hotspots.
Resumo:
The paper investigates a Bayesian hierarchical model for the analysis of categorical longitudinal data from a large social survey of immigrants to Australia. Data for each subject are observed on three separate occasions, or waves, of the survey. One of the features of the data set is that observations for some variables are missing for at least one wave. A model for the employment status of immigrants is developed by introducing, at the first stage of a hierarchical model, a multinomial model for the response and then subsequent terms are introduced to explain wave and subject effects. To estimate the model, we use the Gibbs sampler, which allows missing data for both the response and the explanatory variables to be imputed at each iteration of the algorithm, given some appropriate prior distributions. After accounting for significant covariate effects in the model, results show that the relative probability of remaining unemployed diminished with time following arrival in Australia.
Resumo:
In this paper, we develop Bayesian hierarchical distributed lag models for estimating associations between daily variations in summer ozone levels and daily variations in cardiovascular and respiratory (CVDRESP) mortality counts for 19 U.S. large cities included in the National Morbidity Mortality Air Pollution Study (NMMAPS) for the period 1987 - 1994. At the first stage, we define a semi-parametric distributed lag Poisson regression model to estimate city-specific relative rates of CVDRESP associated with short-term exposure to summer ozone. At the second stage, we specify a class of distributions for the true city-specific relative rates to estimate an overall effect by taking into account the variability within and across cities. We perform the calculations with respect to several random effects distributions (normal, t-student, and mixture of normal), thus relaxing the common assumption of a two-stage normal-normal hierarchical model. We assess the sensitivity of the results to: 1) lag structure for ozone exposure; 2) degree of adjustment for long-term trends; 3) inclusion of other pollutants in the model;4) heat waves; 5) random effects distributions; and 6) prior hyperparameters. On average across cities, we found that a 10ppb increase in summer ozone level for every day in the previous week is associated with 1.25 percent increase in CVDRESP mortality (95% posterior regions: 0.47, 2.03). The relative rate estimates are also positive and statistically significant at lags 0, 1, and 2. We found that associations between summer ozone and CVDRESP mortality are sensitive to the confounding adjustment for PM_10, but are robust to: 1) the adjustment for long-term trends, other gaseous pollutants (NO_2, SO_2, and CO); 2) the distributional assumptions at the second stage of the hierarchical model; and 3) the prior distributions on all unknown parameters. Bayesian hierarchical distributed lag models and their application to the NMMAPS data allow us estimation of an acute health effect associated with exposure to ambient air pollution in the last few days on average across several locations. The application of these methods and the systematic assessment of the sensitivity of findings to model assumptions provide important epidemiological evidence for future air quality regulations.
Resumo:
Most statistical analysis, theory and practice, is concerned with static models; models with a proposed set of parameters whose values are fixed across observational units. Static models implicitly assume that the quantified relationships remain the same across the design space of the data. While this is reasonable under many circumstances this can be a dangerous assumption when dealing with sequentially ordered data. The mere passage of time always brings fresh considerations and the interrelationships among parameters, or subsets of parameters, may need to be continually revised. ^ When data are gathered sequentially dynamic interim monitoring may be useful as new subject-specific parameters are introduced with each new observational unit. Sequential imputation via dynamic hierarchical models is an efficient strategy for handling missing data and analyzing longitudinal studies. Dynamic conditional independence models offers a flexible framework that exploits the Bayesian updating scheme for capturing the evolution of both the population and individual effects over time. While static models often describe aggregate information well they often do not reflect conflicts in the information at the individual level. Dynamic models prove advantageous over static models in capturing both individual and aggregate trends. Computations for such models can be carried out via the Gibbs sampler. An application using a small sample repeated measures normally distributed growth curve data is presented. ^
Resumo:
Understanding how virus strains offer protection against closely related emerging strains is vital for creating effective vaccines. For many viruses, including Foot-and-Mouth Disease Virus (FMDV) and the Influenza virus where multiple serotypes often co-circulate, in vitro testing of large numbers of vaccines can be infeasible. Therefore the development of an in silico predictor of cross-protection between strains is important to help optimise vaccine choice. Vaccines will offer cross-protection against closely related strains, but not against those that are antigenically distinct. To be able to predict cross-protection we must understand the antigenic variability within a virus serotype, distinct lineages of a virus, and identify the antigenic residues and evolutionary changes that cause the variability. In this thesis we present a family of sparse hierarchical Bayesian models for detecting relevant antigenic sites in virus evolution (SABRE), as well as an extended version of the method, the extended SABRE (eSABRE) method, which better takes into account the data collection process. The SABRE methods are a family of sparse Bayesian hierarchical models that use spike and slab priors to identify sites in the viral protein which are important for the neutralisation of the virus. In this thesis we demonstrate how the SABRE methods can be used to identify antigenic residues within different serotypes and show how the SABRE method outperforms established methods, mixed-effects models based on forward variable selection or l1 regularisation, on both synthetic and viral datasets. In addition we also test a number of different versions of the SABRE method, compare conjugate and semi-conjugate prior specifications and an alternative to the spike and slab prior; the binary mask model. We also propose novel proposal mechanisms for the Markov chain Monte Carlo (MCMC) simulations, which improve mixing and convergence over that of the established component-wise Gibbs sampler. The SABRE method is then applied to datasets from FMDV and the Influenza virus in order to identify a number of known antigenic residue and to provide hypotheses of other potentially antigenic residues. We also demonstrate how the SABRE methods can be used to create accurate predictions of the important evolutionary changes of the FMDV serotypes. In this thesis we provide an extended version of the SABRE method, the eSABRE method, based on a latent variable model. The eSABRE method takes further into account the structure of the datasets for FMDV and the Influenza virus through the latent variable model and gives an improvement in the modelling of the error. We show how the eSABRE method outperforms the SABRE methods in simulation studies and propose a new information criterion for selecting the random effects factors that should be included in the eSABRE method; block integrated Widely Applicable Information Criterion (biWAIC). We demonstrate how biWAIC performs equally to two other methods for selecting the random effects factors and combine it with the eSABRE method to apply it to two large Influenza datasets. Inference in these large datasets is computationally infeasible with the SABRE methods, but as a result of the improved structure of the likelihood, we are able to show how the eSABRE method offers a computational improvement, leading it to be used on these datasets. The results of the eSABRE method show that we can use the method in a fully automatic manner to identify a large number of antigenic residues on a variety of the antigenic sites of two Influenza serotypes, as well as making predictions of a number of nearby sites that may also be antigenic and are worthy of further experiment investigation.
Resumo:
This dissertation is primarily an applied statistical modelling investigation, motivated by a case study comprising real data and real questions. Theoretical questions on modelling and computation of normalization constants arose from pursuit of these data analytic questions. The essence of the thesis can be described as follows. Consider binary data observed on a two-dimensional lattice. A common problem with such data is the ambiguity of zeroes recorded. These may represent zero response given some threshold (presence) or that the threshold has not been triggered (absence). Suppose that the researcher wishes to estimate the effects of covariates on the binary responses, whilst taking into account underlying spatial variation, which is itself of some interest. This situation arises in many contexts and the dingo, cypress and toad case studies described in the motivation chapter are examples of this. Two main approaches to modelling and inference are investigated in this thesis. The first is frequentist and based on generalized linear models, with spatial variation modelled by using a block structure or by smoothing the residuals spatially. The EM algorithm can be used to obtain point estimates, coupled with bootstrapping or asymptotic MLE estimates for standard errors. The second approach is Bayesian and based on a three- or four-tier hierarchical model, comprising a logistic regression with covariates for the data layer, a binary Markov Random field (MRF) for the underlying spatial process, and suitable priors for parameters in these main models. The three-parameter autologistic model is a particular MRF of interest. Markov chain Monte Carlo (MCMC) methods comprising hybrid Metropolis/Gibbs samplers is suitable for computation in this situation. Model performance can be gauged by MCMC diagnostics. Model choice can be assessed by incorporating another tier in the modelling hierarchy. This requires evaluation of a normalization constant, a notoriously difficult problem. Difficulty with estimating the normalization constant for the MRF can be overcome by using a path integral approach, although this is a highly computationally intensive method. Different methods of estimating ratios of normalization constants (N Cs) are investigated, including importance sampling Monte Carlo (ISMC), dependent Monte Carlo based on MCMC simulations (MCMC), and reverse logistic regression (RLR). I develop an idea present though not fully developed in the literature, and propose the Integrated mean canonical statistic (IMCS) method for estimating log NC ratios for binary MRFs. The IMCS method falls within the framework of the newly identified path sampling methods of Gelman & Meng (1998) and outperforms ISMC, MCMC and RLR. It also does not rely on simplifying assumptions, such as ignoring spatio-temporal dependence in the process. A thorough investigation is made of the application of IMCS to the three-parameter Autologistic model. This work introduces background computations required for the full implementation of the four-tier model in Chapter 7. Two different extensions of the three-tier model to a four-tier version are investigated. The first extension incorporates temporal dependence in the underlying spatio-temporal process. The second extensions allows the successes and failures in the data layer to depend on time. The MCMC computational method is extended to incorporate the extra layer. A major contribution of the thesis is the development of a fully Bayesian approach to inference for these hierarchical models for the first time. Note: The author of this thesis has agreed to make it open access but invites people downloading the thesis to send her an email via the 'Contact Author' function.
Resumo:
The research objectives of this thesis were to contribute to Bayesian statistical methodology by contributing to risk assessment statistical methodology, and to spatial and spatio-temporal methodology, by modelling error structures using complex hierarchical models. Specifically, I hoped to consider two applied areas, and use these applications as a springboard for developing new statistical methods as well as undertaking analyses which might give answers to particular applied questions. Thus, this thesis considers a series of models, firstly in the context of risk assessments for recycled water, and secondly in the context of water usage by crops. The research objective was to model error structures using hierarchical models in two problems, namely risk assessment analyses for wastewater, and secondly, in a four dimensional dataset, assessing differences between cropping systems over time and over three spatial dimensions. The aim was to use the simplicity and insight afforded by Bayesian networks to develop appropriate models for risk scenarios, and again to use Bayesian hierarchical models to explore the necessarily complex modelling of four dimensional agricultural data. The specific objectives of the research were to develop a method for the calculation of credible intervals for the point estimates of Bayesian networks; to develop a model structure to incorporate all the experimental uncertainty associated with various constants thereby allowing the calculation of more credible credible intervals for a risk assessment; to model a single day’s data from the agricultural dataset which satisfactorily captured the complexities of the data; to build a model for several days’ data, in order to consider how the full data might be modelled; and finally to build a model for the full four dimensional dataset and to consider the timevarying nature of the contrast of interest, having satisfactorily accounted for possible spatial and temporal autocorrelations. This work forms five papers, two of which have been published, with two submitted, and the final paper still in draft. The first two objectives were met by recasting the risk assessments as directed, acyclic graphs (DAGs). In the first case, we elicited uncertainty for the conditional probabilities needed by the Bayesian net, incorporated these into a corresponding DAG, and used Markov chain Monte Carlo (MCMC) to find credible intervals, for all the scenarios and outcomes of interest. In the second case, we incorporated the experimental data underlying the risk assessment constants into the DAG, and also treated some of that data as needing to be modelled as an ‘errors-invariables’ problem [Fuller, 1987]. This illustrated a simple method for the incorporation of experimental error into risk assessments. In considering one day of the three-dimensional agricultural data, it became clear that geostatistical models or conditional autoregressive (CAR) models over the three dimensions were not the best way to approach the data. Instead CAR models are used with neighbours only in the same depth layer. This gave flexibility to the model, allowing both the spatially structured and non-structured variances to differ at all depths. We call this model the CAR layered model. Given the experimental design, the fixed part of the model could have been modelled as a set of means by treatment and by depth, but doing so allows little insight into how the treatment effects vary with depth. Hence, a number of essentially non-parametric approaches were taken to see the effects of depth on treatment, with the model of choice incorporating an errors-in-variables approach for depth in addition to a non-parametric smooth. The statistical contribution here was the introduction of the CAR layered model, the applied contribution the analysis of moisture over depth and estimation of the contrast of interest together with its credible intervals. These models were fitted using WinBUGS [Lunn et al., 2000]. The work in the fifth paper deals with the fact that with large datasets, the use of WinBUGS becomes more problematic because of its highly correlated term by term updating. In this work, we introduce a Gibbs sampler with block updating for the CAR layered model. The Gibbs sampler was implemented by Chris Strickland using pyMCMC [Strickland, 2010]. This framework is then used to consider five days data, and we show that moisture in the soil for all the various treatments reaches levels particular to each treatment at a depth of 200 cm and thereafter stays constant, albeit with increasing variances with depth. In an analysis across three spatial dimensions and across time, there are many interactions of time and the spatial dimensions to be considered. Hence, we chose to use a daily model and to repeat the analysis at all time points, effectively creating an interaction model of time by the daily model. Such an approach allows great flexibility. However, this approach does not allow insight into the way in which the parameter of interest varies over time. Hence, a two-stage approach was also used, with estimates from the first-stage being analysed as a set of time series. We see this spatio-temporal interaction model as being a useful approach to data measured across three spatial dimensions and time, since it does not assume additivity of the random spatial or temporal effects.