959 resultados para Bayesian
Resumo:
Definition of disease phenotype is a necessary preliminary to research into genetic causes of a complex disease. Clinical diagnosis of migraine is currently based on diagnostic criteria developed by the International Headache Society. Previously, we examined the natural clustering of these diagnostic symptoms using latent class analysis (LCA) and found that a four-class model was preferred. However, the classes can be ordered such that all symptoms progressively intensify, suggesting that a single continuous variable representing disease severity may provide a better model. Here, we compare two models: item response theory and LCA, each constructed within a Bayesian context. A deviance information criterion is used to assess model fit. We phenotyped our population sample using these models, estimated heritability and conducted genome-wide linkage analysis using Merlin-qtl. LCA with four classes was again preferred. After transformation, phenotypic trait values derived from both models are highly correlated (correlation = 0.99) and consequently results from subsequent genetic analyses were similar. Heritability was estimated at 0.37, while multipoint linkage analysis produced genome-wide significant linkage to chromosome 7q31-q33 and suggestive linkage to chromosomes 1 and 2. We argue that such continuous measures are a powerful tool for identifying genes contributing to migraine susceptibility.
Resumo:
The main objective of this PhD was to further develop Bayesian spatio-temporal models (specifically the Conditional Autoregressive (CAR) class of models), for the analysis of sparse disease outcomes such as birth defects. The motivation for the thesis arose from problems encountered when analyzing a large birth defect registry in New South Wales. The specific components and related research objectives of the thesis were developed from gaps in the literature on current formulations of the CAR model, and health service planning requirements. Data from a large probabilistically-linked database from 1990 to 2004, consisting of fields from two separate registries: the Birth Defect Registry (BDR) and Midwives Data Collection (MDC) were used in the analyses in this thesis. The main objective was split into smaller goals. The first goal was to determine how the specification of the neighbourhood weight matrix will affect the smoothing properties of the CAR model, and this is the focus of chapter 6. Secondly, I hoped to evaluate the usefulness of incorporating a zero-inflated Poisson (ZIP) component as well as a shared-component model in terms of modeling a sparse outcome, and this is carried out in chapter 7. The third goal was to identify optimal sampling and sample size schemes designed to select individual level data for a hybrid ecological spatial model, and this is done in chapter 8. Finally, I wanted to put together the earlier improvements to the CAR model, and along with demographic projections, provide forecasts for birth defects at the SLA level. Chapter 9 describes how this is done. For the first objective, I examined a series of neighbourhood weight matrices, and showed how smoothing the relative risk estimates according to similarity by an important covariate (i.e. maternal age) helped improve the model’s ability to recover the underlying risk, as compared to the traditional adjacency (specifically the Queen) method of applying weights. Next, to address the sparseness and excess zeros commonly encountered in the analysis of rare outcomes such as birth defects, I compared a few models, including an extension of the usual Poisson model to encompass excess zeros in the data. This was achieved via a mixture model, which also encompassed the shared component model to improve on the estimation of sparse counts through borrowing strength across a shared component (e.g. latent risk factor/s) with the referent outcome (caesarean section was used in this example). Using the Deviance Information Criteria (DIC), I showed how the proposed model performed better than the usual models, but only when both outcomes shared a strong spatial correlation. The next objective involved identifying the optimal sampling and sample size strategy for incorporating individual-level data with areal covariates in a hybrid study design. I performed extensive simulation studies, evaluating thirteen different sampling schemes along with variations in sample size. This was done in the context of an ecological regression model that incorporated spatial correlation in the outcomes, as well as accommodating both individual and areal measures of covariates. Using the Average Mean Squared Error (AMSE), I showed how a simple random sample of 20% of the SLAs, followed by selecting all cases in the SLAs chosen, along with an equal number of controls, provided the lowest AMSE. The final objective involved combining the improved spatio-temporal CAR model with population (i.e. women) forecasts, to provide 30-year annual estimates of birth defects at the Statistical Local Area (SLA) level in New South Wales, Australia. The projections were illustrated using sixteen different SLAs, representing the various areal measures of socio-economic status and remoteness. A sensitivity analysis of the assumptions used in the projection was also undertaken. By the end of the thesis, I will show how challenges in the spatial analysis of rare diseases such as birth defects can be addressed, by specifically formulating the neighbourhood weight matrix to smooth according to a key covariate (i.e. maternal age), incorporating a ZIP component to model excess zeros in outcomes and borrowing strength from a referent outcome (i.e. caesarean counts). An efficient strategy to sample individual-level data and sample size considerations for rare disease will also be presented. Finally, projections in birth defect categories at the SLA level will be made.
Resumo:
Ecological problems are typically multi faceted and need to be addressed from a scientific and a management perspective. There is a wealth of modelling and simulation software available, each designed to address a particular aspect of the issue of concern. Choosing the appropriate tool, making sense of the disparate outputs, and taking decisions when little or no empirical data is available, are everyday challenges facing the ecologist and environmental manager. Bayesian Networks provide a statistical modelling framework that enables analysis and integration of information in its own right as well as integration of a variety of models addressing different aspects of a common overall problem. There has been increased interest in the use of BNs to model environmental systems and issues of concern. However, the development of more sophisticated BNs, utilising dynamic and object oriented (OO) features, is still at the frontier of ecological research. Such features are particularly appealing in an ecological context, since the underlying facts are often spatial and temporal in nature. This thesis focuses on an integrated BN approach which facilitates OO modelling. Our research devises a new heuristic method, the Iterative Bayesian Network Development Cycle (IBNDC), for the development of BN models within a multi-field and multi-expert context. Expert elicitation is a popular method used to quantify BNs when data is sparse, but expert knowledge is abundant. The resulting BNs need to be substantiated and validated taking this uncertainty into account. Our research demonstrates the application of the IBNDC approach to support these aspects of BN modelling. The complex nature of environmental issues makes them ideal case studies for the proposed integrated approach to modelling. Moreover, they lend themselves to a series of integrated sub-networks describing different scientific components, combining scientific and management perspectives, or pooling similar contributions developed in different locations by different research groups. In southern Africa the two largest free-ranging cheetah (Acinonyx jubatus) populations are in Namibia and Botswana, where the majority of cheetahs are located outside protected areas. Consequently, cheetah conservation in these two countries is focussed primarily on the free-ranging populations as well as the mitigation of conflict between humans and cheetahs. In contrast, in neighbouring South Africa, the majority of cheetahs are found in fenced reserves. Nonetheless, conflict between humans and cheetahs remains an issue here. Conservation effort in South Africa is also focussed on managing the geographically isolated cheetah populations as one large meta-population. Relocation is one option among a suite of tools used to resolve human-cheetah conflict in southern Africa. Successfully relocating captured problem cheetahs, and maintaining a viable free-ranging cheetah population, are two environmental issues in cheetah conservation forming the first case study in this thesis. The second case study involves the initiation of blooms of Lyngbya majuscula, a blue-green algae, in Deception Bay, Australia. L. majuscula is a toxic algal bloom which has severe health, ecological and economic impacts on the community located in the vicinity of this algal bloom. Deception Bay is an important tourist destination with its proximity to Brisbane, Australia’s third largest city. Lyngbya is one of several algae considered to be a Harmful Algal Bloom (HAB). This group of algae includes other widespread blooms such as red tides. The occurrence of Lyngbya blooms is not a local phenomenon, but blooms of this toxic weed occur in coastal waters worldwide. With the increase in frequency and extent of these HAB blooms, it is important to gain a better understanding of the underlying factors contributing to the initiation and sustenance of these blooms. This knowledge will contribute to better management practices and the identification of those management actions which could prevent or diminish the severity of these blooms.
Resumo:
This thesis addresses computational challenges arising from Bayesian analysis of complex real-world problems. Many of the models and algorithms designed for such analysis are ‘hybrid’ in nature, in that they are a composition of components for which their individual properties may be easily described but the performance of the model or algorithm as a whole is less well understood. The aim of this research project is to after a better understanding of the performance of hybrid models and algorithms. The goal of this thesis is to analyse the computational aspects of hybrid models and hybrid algorithms in the Bayesian context. The first objective of the research focuses on computational aspects of hybrid models, notably a continuous finite mixture of t-distributions. In the mixture model, an inference of interest is the number of components, as this may relate to both the quality of model fit to data and the computational workload. The analysis of t-mixtures using Markov chain Monte Carlo (MCMC) is described and the model is compared to the Normal case based on the goodness of fit. Through simulation studies, it is demonstrated that the t-mixture model can be more flexible and more parsimonious in terms of number of components, particularly for skewed and heavytailed data. The study also reveals important computational issues associated with the use of t-mixtures, which have not been adequately considered in the literature. The second objective of the research focuses on computational aspects of hybrid algorithms for Bayesian analysis. Two approaches will be considered: a formal comparison of the performance of a range of hybrid algorithms and a theoretical investigation of the performance of one of these algorithms in high dimensions. For the first approach, the delayed rejection algorithm, the pinball sampler, the Metropolis adjusted Langevin algorithm, and the hybrid version of the population Monte Carlo (PMC) algorithm are selected as a set of examples of hybrid algorithms. Statistical literature shows how statistical efficiency is often the only criteria for an efficient algorithm. In this thesis the algorithms are also considered and compared from a more practical perspective. This extends to the study of how individual algorithms contribute to the overall efficiency of hybrid algorithms, and highlights weaknesses that may be introduced by the combination process of these components in a single algorithm. The second approach to considering computational aspects of hybrid algorithms involves an investigation of the performance of the PMC in high dimensions. It is well known that as a model becomes more complex, computation may become increasingly difficult in real time. In particular the importance sampling based algorithms, including the PMC, are known to be unstable in high dimensions. This thesis examines the PMC algorithm in a simplified setting, a single step of the general sampling, and explores a fundamental problem that occurs in applying importance sampling to a high-dimensional problem. The precision of the computed estimate from the simplified setting is measured by the asymptotic variance of the estimate under conditions on the importance function. Additionally, the exponential growth of the asymptotic variance with the dimension is demonstrated and we illustrates that the optimal covariance matrix for the importance function can be estimated in a special case.
Resumo:
Longitudinal data, where data are repeatedly observed or measured on a temporal basis of time or age provides the foundation of the analysis of processes which evolve over time, and these can be referred to as growth or trajectory models. One of the traditional ways of looking at growth models is to employ either linear or polynomial functional forms to model trajectory shape, and account for variation around an overall mean trend with the inclusion of random eects or individual variation on the functional shape parameters. The identification of distinct subgroups or sub-classes (latent classes) within these trajectory models which are not based on some pre-existing individual classification provides an important methodology with substantive implications. The identification of subgroups or classes has a wide application in the medical arena where responder/non-responder identification based on distinctly diering trajectories delivers further information for clinical processes. This thesis develops Bayesian statistical models and techniques for the identification of subgroups in the analysis of longitudinal data where the number of time intervals is limited. These models are then applied to a single case study which investigates the neuropsychological cognition for early stage breast cancer patients undergoing adjuvant chemotherapy treatment from the Cognition in Breast Cancer Study undertaken by the Wesley Research Institute of Brisbane, Queensland. Alternative formulations to the linear or polynomial approach are taken which use piecewise linear models with a single turning point, change-point or knot at a known time point and latent basis models for the non-linear trajectories found for the verbal memory domain of cognitive function before and after chemotherapy treatment. Hierarchical Bayesian random eects models are used as a starting point for the latent class modelling process and are extended with the incorporation of covariates in the trajectory profiles and as predictors of class membership. The Bayesian latent basis models enable the degree of recovery post-chemotherapy to be estimated for short and long-term followup occasions, and the distinct class trajectories assist in the identification of breast cancer patients who maybe at risk of long-term verbal memory impairment.
Resumo:
Regional safety program managers face a daunting challenge in the attempt to reduce deaths, injuries, and economic losses that result from motor vehicle crashes. This difficult mission is complicated by the combination of a large perceived need, small budget, and uncertainty about how effective each proposed countermeasure would be if implemented. A manager can turn to the research record for insight, but the measured effect of a single countermeasure often varies widely from study to study and across jurisdictions. The challenge of converting widespread and conflicting research results into a regionally meaningful conclusion can be addressed by incorporating "subjective" information into a Bayesian analysis framework. Engineering evaluations of crashes provide the subjective input on countermeasure effectiveness in the proposed Bayesian analysis framework. Empirical Bayes approaches are widely used in before-and-after studies and "hot-spot" identification; however, in these cases, the prior information was typically obtained from the data (empirically), not subjective sources. The power and advantages of Bayesian methods for assessing countermeasure effectiveness are presented. Also, an engineering evaluation approach developed at the Georgia Institute of Technology is described. Results are presented from an experiment conducted to assess the repeatability and objectivity of subjective engineering evaluations. In particular, the focus is on the importance, methodology, and feasibility of the subjective engineering evaluation for assessing countermeasures.