204 resultados para Asymptotic behaviour, Bayesian methods, Mixture models, Overfitting, Posterior concentration
Resumo:
The main objective of this PhD was to further develop Bayesian spatio-temporal models (specifically the Conditional Autoregressive (CAR) class of models), for the analysis of sparse disease outcomes such as birth defects. The motivation for the thesis arose from problems encountered when analyzing a large birth defect registry in New South Wales. The specific components and related research objectives of the thesis were developed from gaps in the literature on current formulations of the CAR model, and health service planning requirements. Data from a large probabilistically-linked database from 1990 to 2004, consisting of fields from two separate registries: the Birth Defect Registry (BDR) and Midwives Data Collection (MDC) were used in the analyses in this thesis. The main objective was split into smaller goals. The first goal was to determine how the specification of the neighbourhood weight matrix will affect the smoothing properties of the CAR model, and this is the focus of chapter 6. Secondly, I hoped to evaluate the usefulness of incorporating a zero-inflated Poisson (ZIP) component as well as a shared-component model in terms of modeling a sparse outcome, and this is carried out in chapter 7. The third goal was to identify optimal sampling and sample size schemes designed to select individual level data for a hybrid ecological spatial model, and this is done in chapter 8. Finally, I wanted to put together the earlier improvements to the CAR model, and along with demographic projections, provide forecasts for birth defects at the SLA level. Chapter 9 describes how this is done. For the first objective, I examined a series of neighbourhood weight matrices, and showed how smoothing the relative risk estimates according to similarity by an important covariate (i.e. maternal age) helped improve the model’s ability to recover the underlying risk, as compared to the traditional adjacency (specifically the Queen) method of applying weights. Next, to address the sparseness and excess zeros commonly encountered in the analysis of rare outcomes such as birth defects, I compared a few models, including an extension of the usual Poisson model to encompass excess zeros in the data. This was achieved via a mixture model, which also encompassed the shared component model to improve on the estimation of sparse counts through borrowing strength across a shared component (e.g. latent risk factor/s) with the referent outcome (caesarean section was used in this example). Using the Deviance Information Criteria (DIC), I showed how the proposed model performed better than the usual models, but only when both outcomes shared a strong spatial correlation. The next objective involved identifying the optimal sampling and sample size strategy for incorporating individual-level data with areal covariates in a hybrid study design. I performed extensive simulation studies, evaluating thirteen different sampling schemes along with variations in sample size. This was done in the context of an ecological regression model that incorporated spatial correlation in the outcomes, as well as accommodating both individual and areal measures of covariates. Using the Average Mean Squared Error (AMSE), I showed how a simple random sample of 20% of the SLAs, followed by selecting all cases in the SLAs chosen, along with an equal number of controls, provided the lowest AMSE. The final objective involved combining the improved spatio-temporal CAR model with population (i.e. women) forecasts, to provide 30-year annual estimates of birth defects at the Statistical Local Area (SLA) level in New South Wales, Australia. The projections were illustrated using sixteen different SLAs, representing the various areal measures of socio-economic status and remoteness. A sensitivity analysis of the assumptions used in the projection was also undertaken. By the end of the thesis, I will show how challenges in the spatial analysis of rare diseases such as birth defects can be addressed, by specifically formulating the neighbourhood weight matrix to smooth according to a key covariate (i.e. maternal age), incorporating a ZIP component to model excess zeros in outcomes and borrowing strength from a referent outcome (i.e. caesarean counts). An efficient strategy to sample individual-level data and sample size considerations for rare disease will also be presented. Finally, projections in birth defect categories at the SLA level will be made.
Resumo:
In this thesis, the issue of incorporating uncertainty for environmental modelling informed by imagery is explored by considering uncertainty in deterministic modelling, measurement uncertainty and uncertainty in image composition. Incorporating uncertainty in deterministic modelling is extended for use with imagery using the Bayesian melding approach. In the application presented, slope steepness is shown to be the main contributor to total uncertainty in the Revised Universal Soil Loss Equation. A spatial sampling procedure is also proposed to assist in implementing Bayesian melding given the increased data size with models informed by imagery. Measurement error models are another approach to incorporating uncertainty when data is informed by imagery. These models for measurement uncertainty, considered in a Bayesian conditional independence framework, are applied to ecological data generated from imagery. The models are shown to be appropriate and useful in certain situations. Measurement uncertainty is also considered in the context of change detection when two images are not co-registered. An approach for detecting change in two successive images is proposed that is not affected by registration. The procedure uses the Kolmogorov-Smirnov test on homogeneous segments of an image to detect change, with the homogeneous segments determined using a Bayesian mixture model of pixel values. Using the mixture model to segment an image also allows for uncertainty in the composition of an image. This thesis concludes by comparing several different Bayesian image segmentation approaches that allow for uncertainty regarding the allocation of pixels to different ground components. Each segmentation approach is applied to a data set of chlorophyll values and shown to have different benefits and drawbacks depending on the aims of the analysis.
Resumo:
Longitudinal data, where data are repeatedly observed or measured on a temporal basis of time or age provides the foundation of the analysis of processes which evolve over time, and these can be referred to as growth or trajectory models. One of the traditional ways of looking at growth models is to employ either linear or polynomial functional forms to model trajectory shape, and account for variation around an overall mean trend with the inclusion of random eects or individual variation on the functional shape parameters. The identification of distinct subgroups or sub-classes (latent classes) within these trajectory models which are not based on some pre-existing individual classification provides an important methodology with substantive implications. The identification of subgroups or classes has a wide application in the medical arena where responder/non-responder identification based on distinctly diering trajectories delivers further information for clinical processes. This thesis develops Bayesian statistical models and techniques for the identification of subgroups in the analysis of longitudinal data where the number of time intervals is limited. These models are then applied to a single case study which investigates the neuropsychological cognition for early stage breast cancer patients undergoing adjuvant chemotherapy treatment from the Cognition in Breast Cancer Study undertaken by the Wesley Research Institute of Brisbane, Queensland. Alternative formulations to the linear or polynomial approach are taken which use piecewise linear models with a single turning point, change-point or knot at a known time point and latent basis models for the non-linear trajectories found for the verbal memory domain of cognitive function before and after chemotherapy treatment. Hierarchical Bayesian random eects models are used as a starting point for the latent class modelling process and are extended with the incorporation of covariates in the trajectory profiles and as predictors of class membership. The Bayesian latent basis models enable the degree of recovery post-chemotherapy to be estimated for short and long-term followup occasions, and the distinct class trajectories assist in the identification of breast cancer patients who maybe at risk of long-term verbal memory impairment.
Resumo:
This paper describes the formalization and application of a methodology to evaluate the safety benefit of countermeasures in the face of uncertainty. To illustrate the methodology, 18 countermeasures for improving safety of at grade railroad crossings (AGRXs) in the Republic of Korea are considered. Akin to “stated preference” methods in travel survey research, the methodology applies random selection and laws of large numbers to derive accident modification factor (AMF) densities from expert opinions. In a full Bayesian analysis framework, the collective opinions in the form of AMF densities (data likelihood) are combined with prior knowledge (AMF density priors) for the 18 countermeasures to obtain ‘best’ estimates of AMFs (AMF posterior credible intervals). The countermeasures are then compared and recommended based on the largest safety returns with minimum risk (uncertainty). To the author's knowledge the complete methodology is new and has not previously been applied or reported in the literature. The results demonstrate that the methodology is able to discern anticipated safety benefit differences across candidate countermeasures. For the 18 at grade railroad crossings considered in this analysis, it was found that the top three performing countermeasures for reducing crashes are in-vehicle warning systems, obstacle detection systems, and constant warning time systems.
Resumo:
We present a novel approach for developing summary statistics for use in approximate Bayesian computation (ABC) algorithms by using indirect inference. ABC methods are useful for posterior inference in the presence of an intractable likelihood function. In the indirect inference approach to ABC the parameters of an auxiliary model fitted to the data become the summary statistics. Although applicable to any ABC technique, we embed this approach within a sequential Monte Carlo algorithm that is completely adaptive and requires very little tuning. This methodological development was motivated by an application involving data on macroparasite population evolution modelled by a trivariate stochastic process for which there is no tractable likelihood function. The auxiliary model here is based on a beta–binomial distribution. The main objective of the analysis is to determine which parameters of the stochastic model are estimable from the observed data on mature parasite worms.
Resumo:
Sequence data often have competing signals that are detected by network programs or Lento plots. Such data can be formed by generating sequences on more than one tree, and combining the results, a mixture model. We report that with such mixture models, the estimates of edge (branch) lengths from maximum likelihood (ML) methods that assume a single tree are biased. Based on the observed number of competing signals in real data, such a bias of ML is expected to occur frequently. Because network methods can recover competing signals more accurately, there is a need for ML methods allowing a network. A fundamental problem is that mixture models can have more parameters than can be recovered from the data, so that some mixtures are not, in principle, identifiable. We recommend that network programs be incorporated into best practice analysis, along with ML and Bayesian trees.
Resumo:
Motorcycles are overrepresented in road traffic crashes and particularly vulnerable at signalized intersections. The objective of this study is to identify causal factors affecting the motorcycle crashes at both four-legged and T signalized intersections. Treating the data in time-series cross-section panels, this study explores different Hierarchical Poisson models and found that the model allowing autoregressive lag 1 dependent specification in the error term is the most suitable. Results show that the number of lanes at the four-legged signalized intersections significantly increases motorcycle crashes largely because of the higher exposure resulting from higher motorcycle accumulation at the stop line. Furthermore, the presence of a wide median and an uncontrolled left-turn lane at major roadways of four-legged intersections exacerbate this potential hazard. For T signalized intersections, the presence of exclusive right-turn lane at both major and minor roadways and an uncontrolled left-turn lane at major roadways of T intersections increases motorcycle crashes. Motorcycle crashes increase on high-speed roadways because they are more vulnerable and less likely to react in time during conflicts. The presence of red light cameras reduces motorcycle crashes significantly for both four-legged and T intersections. With the red-light camera, motorcycles are less exposed to conflicts because it is observed that they are more disciplined in queuing at the stop line and less likely to jump start at the start of green.
Resumo:
This paper presents a novel framework for the modelling of passenger facilitation in a complex environment. The research is motivated by the challenges in the airport complex system, where there are multiple stakeholders, differing operational objectives and complex interactions and interdependencies between different parts of the airport system. Traditional methods for airport terminal modelling do not explicitly address the need for understanding causal relationships in a dynamic environment. Additionally, existing Bayesian Network (BN) models, which provide a means for capturing causal relationships, only present a static snapshot of a system. A method to integrate a BN complex systems model with stochastic queuing theory is developed based on the properties of the Poisson and Exponential distributions. The resultant Hybrid Queue-based Bayesian Network (HQBN) framework enables the simulation of arbitrary factors, their relationships, and their effects on passenger flow and vice versa. A case study implementation of the framework is demonstrated on the inbound passenger facilitation process at Brisbane International Airport. The predicted outputs of the model, in terms of cumulative passenger flow at intermediary and end points in the inbound process, are found to have an $R^2$ goodness of fit of 0.9994 and 0.9982 respectively over a 10 hour test period. The utility of the framework is demonstrated on a number of usage scenarios including real time monitoring and `what-if' analysis. This framework provides the ability to analyse and simulate a dynamic complex system, and can be applied to other socio-technical systems such as hospitals.
Resumo:
This paper presents a novel framework for the modelling of passenger facilitation in a complex environment. The research is motivated by the challenges in the airport complex system, where there are multiple stakeholders, differing operational objectives and complex interactions and interdependencies between different parts of the airport system. Traditional methods for airport terminal modelling do not explicitly address the need for understanding causal relationships in a dynamic environment. Additionally, existing Bayesian Network (BN) models, which provide a means for capturing causal relationships, only present a static snapshot of a system. A method to integrate a BN complex systems model with stochastic queuing theory is developed based on the properties of the Poisson and exponential distributions. The resultant Hybrid Queue-based Bayesian Network (HQBN) framework enables the simulation of arbitrary factors, their relationships, and their effects on passenger flow and vice versa. A case study implementation of the framework is demonstrated on the inbound passenger facilitation process at Brisbane International Airport. The predicted outputs of the model, in terms of cumulative passenger flow at intermediary and end points in the inbound process, are found to have an R2 goodness of fit of 0.9994 and 0.9982 respectively over a 10 h test period. The utility of the framework is demonstrated on a number of usage scenarios including causal analysis and ‘what-if’ analysis. This framework provides the ability to analyse and simulate a dynamic complex system, and can be applied to other socio-technical systems such as hospitals.
Resumo:
Provides an accessible foundation to Bayesian analysis using real world models This book aims to present an introduction to Bayesian modelling and computation, by considering real case studies drawn from diverse fields spanning ecology, health, genetics and finance. Each chapter comprises a description of the problem, the corresponding model, the computational method, results and inferences as well as the issues that arise in the implementation of these approaches. Case Studies in Bayesian Statistical Modelling and Analysis: •Illustrates how to do Bayesian analysis in a clear and concise manner using real-world problems. •Each chapter focuses on a real-world problem and describes the way in which the problem may be analysed using Bayesian methods. •Features approaches that can be used in a wide area of application, such as, health, the environment, genetics, information science, medicine, biology, industry and remote sensing. Case Studies in Bayesian Statistical Modelling and Analysis is aimed at statisticians, researchers and practitioners who have some expertise in statistical modelling and analysis, and some understanding of the basics of Bayesian statistics, but little experience in its application. Graduate students of statistics and biostatistics will also find this book beneficial.
Resumo:
A flexible and simple Bayesian decision-theoretic design for dose-finding trials is proposed in this paper. In order to reduce the computational burden, we adopt a working model with conjugate priors, which is flexible to fit all monotonic dose-toxicity curves and produces analytic posterior distributions. We also discuss how to use a proper utility function to reflect the interest of the trial. Patients are allocated based on not only the utility function but also the chosen dose selection rule. The most popular dose selection rule is the one-step-look-ahead (OSLA), which selects the best-so-far dose. A more complicated rule, such as the two-step-look-ahead, is theoretically more efficient than the OSLA only when the required distributional assumptions are met, which is, however, often not the case in practice. We carried out extensive simulation studies to evaluate these two dose selection rules and found that OSLA was often more efficient than two-step-look-ahead under the proposed Bayesian structure. Moreover, our simulation results show that the proposed Bayesian method's performance is superior to several popular Bayesian methods and that the negative impact of prior misspecification can be managed in the design stage.
Resumo:
- Objective To evaluate dietary intake impact outcomes up to 3.5 years after the NOURISH early feeding intervention (concealed allocation, assessor masked RCT). - Methods 698 first-time mothers with healthy term infants were allocated to receive anticipatory guidance on protective feeding practices or usual care. Outcomes were assessed at 2, 3.7 and 5 years (3.5 years post-intervention). Dietary intake was assessed by 24-hour recall and Child Dietary Questionnaire. Mothers completed a food preference questionnaire and Children’s Eating Behaviour Questionnaire. Linear mixed models assessed group, time and time x group effects. - Results There were no group or time x group effects for fruit, vegetables, discretionary food and non-milk sweetened beverages intake. Intervention children showed a higher preference for fruits (74.6% vs 69.0% liked, P<.001), higher Child Dietary Questionnaire score for fruit and vegetables (15.3 vs 14.5, target>18, P=0.03), lower food responsiveness (2.3 vs 2.4, of maximum 5, P=.04) and higher satiety responsiveness (3.1 vs 3.0, of maximum 5, P=.04). - Conclusions Compared to usual care, an early feeding intervention providing anticipatory guidance regarding positive feeding practices led to small improvements in child dietary score, food preferences and eating behaviours up to 5 years of age, but not in dietary intake measured by 24-hour recall.
Resumo:
Abstract Background Understanding spatio-temporal variation in malaria incidence provides a basis for effective disease control planning and monitoring. Methods Monthly surveillance data between 1991 and 2006 for Plasmodium vivax and Plasmodium falciparum malaria across 128 counties were assembled for Yunnan, a province of China with one of the highest burdens of malaria. County-level Bayesian Poisson regression models of incidence were constructed, with effects for rainfall, maximum temperature and temporal trend. The model also allowed for spatial variation in county-level incidence and temporal trend, and dependence between incidence in June–September and the preceding January–February. Results Models revealed strong associations between malaria incidence and both rainfall and maximum temperature. There was a significant association between incidence in June–September and the preceding January–February. Raw standardised morbidity ratios showed a high incidence in some counties bordering Myanmar, Laos and Vietnam, and counties in the Red River valley. Clusters of counties in south-western and northern Yunnan were identified that had high incidence not explained by climate. The overall trend in incidence decreased, but there was significant variation between counties. Conclusion Dependence between incidence in summer and the preceding January–February suggests a role of intrinsic host-pathogen dynamics. Incidence during the summer peak might be predictable based on incidence in January–February, facilitating malaria control planning, scaled months in advance to the magnitude of the summer malaria burden. Heterogeneities in county-level temporal trends suggest that reductions in the burden of malaria have been unevenly distributed throughout the province.
Resumo:
Background The problem of silent multiple comparisons is one of the most difficult statistical problems faced by scientists. It is a particular problem for investigating a one-off cancer cluster reported to a health department because any one of hundreds, or possibly thousands, of neighbourhoods, schools, or workplaces could have reported a cluster, which could have been for any one of several types of cancer or any one of several time periods. Methods This paper contrasts the frequentist approach with a Bayesian approach for dealing with silent multiple comparisons in the context of a one-off cluster reported to a health department. Two published cluster investigations were re-analysed using the Dunn-Sidak method to adjust frequentist p-values and confidence intervals for silent multiple comparisons. Bayesian methods were based on the Gamma distribution. Results Bayesian analysis with non-informative priors produced results similar to the frequentist analysis, and suggested that both clusters represented a statistical excess. In the frequentist framework, the statistical significance of both clusters was extremely sensitive to the number of silent multiple comparisons, which can only ever be a subjective "guesstimate". The Bayesian approach is also subjective: whether there is an apparent statistical excess depends on the specified prior. Conclusion In cluster investigations, the frequentist approach is just as subjective as the Bayesian approach, but the Bayesian approach is less ambitious in that it treats the analysis as a synthesis of data and personal judgements (possibly poor ones), rather than objective reality. Bayesian analysis is (arguably) a useful tool to support complicated decision-making, because it makes the uncertainty associated with silent multiple comparisons explicit.
Resumo:
This chapter will address psychodynamic, cognitive-behavioural, and developmental models in supervision by initially considering the historical underpinnings of each and then examining in turn some of the key processes that are evident in the supervisory relationships. Case studies are included where appropriate to highlight the application of theory to practice and several processes are fully elaborated over all models to enable a contemporary view of style and substance in the supervision context.