980 resultados para Bayesian probability
Resumo:
Expert elicitation is the process of retrieving and quantifying expert knowledge in a particular domain. Such information is of particular value when the empirical data is expensive, limited, or unreliable. This paper describes a new software tool, called Elicitator, which assists in quantifying expert knowledge in a form suitable for use as a prior model in Bayesian regression. Potential environmental domains for applying this elicitation tool include habitat modeling, assessing detectability or eradication, ecological condition assessments, risk analysis, and quantifying inputs to complex models of ecological processes. The tool has been developed to be user-friendly, extensible, and facilitate consistent and repeatable elicitation of expert knowledge across these various domains. We demonstrate its application to elicitation for logistic regression in a geographically based ecological context. The underlying statistical methodology is also novel, utilizing an indirect elicitation approach to target expert knowledge on a case-by-case basis. For several elicitation sites (or cases), experts are asked simply to quantify their estimated ecological response (e.g. probability of presence), and its range of plausible values, after inspecting (habitat) covariates via GIS.
Resumo:
Definition of disease phenotype is a necessary preliminary to research into genetic causes of a complex disease. Clinical diagnosis of migraine is currently based on diagnostic criteria developed by the International Headache Society. Previously, we examined the natural clustering of these diagnostic symptoms using latent class analysis (LCA) and found that a four-class model was preferred. However, the classes can be ordered such that all symptoms progressively intensify, suggesting that a single continuous variable representing disease severity may provide a better model. Here, we compare two models: item response theory and LCA, each constructed within a Bayesian context. A deviance information criterion is used to assess model fit. We phenotyped our population sample using these models, estimated heritability and conducted genome-wide linkage analysis using Merlin-qtl. LCA with four classes was again preferred. After transformation, phenotypic trait values derived from both models are highly correlated (correlation = 0.99) and consequently results from subsequent genetic analyses were similar. Heritability was estimated at 0.37, while multipoint linkage analysis produced genome-wide significant linkage to chromosome 7q31-q33 and suggestive linkage to chromosomes 1 and 2. We argue that such continuous measures are a powerful tool for identifying genes contributing to migraine susceptibility.
Resumo:
Suicide has drawn much attention from both the scientific community and the public. Examining the impact of socio-environmental factors on suicide is essential in developing suicide prevention strategies and interventions, because it will provide health authorities with important information for their decision-making. However, previous studies did not examine the impact of socio-environmental factors on suicide using a spatial analysis approach. The purpose of this study was to identify the patterns of suicide and to examine how socio-environmental factors impact on suicide over time and space at the Local Governmental Area (LGA) level in Queensland. The suicide data between 1999 and 2003 were collected from the Australian Bureau of Statistics (ABS). Socio-environmental variables at the LGA level included climate (rainfall, maximum and minimum temperature), Socioeconomic Indexes for Areas (SEIFA) and demographic variables (proportion of Indigenous population, unemployment rate, proportion of population with low income and low education level). Climate data were obtained from Australian Bureau of Meteorology. SEIFA and demographic variables were acquired from ABS. A series of statistical and geographical information system (GIS) approaches were applied in the analysis. This study included two stages. The first stage used average annual data to view the spatial pattern of suicide and to examine the association between socio-environmental factors and suicide over space. The second stage examined the spatiotemporal pattern of suicide and assessed the socio-environmental determinants of suicide, using more detailed seasonal data. In this research, 2,445 suicide cases were included, with 1,957 males (80.0%) and 488 females (20.0%). In the first stage, we examined the spatial pattern and the determinants of suicide using 5-year aggregated data. Spearman correlations were used to assess associations between variables. Then a Poisson regression model was applied in the multivariable analysis, as the occurrence of suicide is a small probability event and this model fitted the data quite well. Suicide mortality varied across LGAs and was associated with a range of socio-environmental factors. The multivariable analysis showed that maximum temperature was significantly and positively associated with male suicide (relative risk [RR] = 1.03, 95% CI: 1.00 to 1.07). Higher proportion of Indigenous population was accompanied with more suicide in male population (male: RR = 1.02, 95% CI: 1.01 to 1.03). There was a positive association between unemployment rate and suicide in both genders (male: RR = 1.04, 95% CI: 1.02 to 1.06; female: RR = 1.07, 95% CI: 1.00 to 1.16). No significant association was observed for rainfall, minimum temperature, SEIFA, proportion of population with low individual income and low educational attainment. In the second stage of this study, we undertook a preliminary spatiotemporal analysis of suicide using seasonal data. Firstly, we assessed the interrelations between variables. Secondly, a generalised estimating equations (GEE) model was used to examine the socio-environmental impact on suicide over time and space, as this model is well suited to analyze repeated longitudinal data (e.g., seasonal suicide mortality in a certain LGA) and it fitted the data better than other models (e.g., Poisson model). The suicide pattern varied with season and LGA. The north of Queensland had the highest suicide mortality rate in all the seasons, while there was no suicide case occurred in the southwest. Northwest had consistently higher suicide mortality in spring, autumn and winter. In other areas, suicide mortality varied between seasons. This analysis showed that maximum temperature was positively associated with suicide among male population (RR = 1.24, 95% CI: 1.04 to 1.47) and total population (RR = 1.15, 95% CI: 1.00 to 1.32). Higher proportion of Indigenous population was accompanied with more suicide among total population (RR = 1.16, 95% CI: 1.13 to 1.19) and by gender (male: RR = 1.07, 95% CI: 1.01 to 1.13; female: RR = 1.23, 95% CI: 1.03 to 1.48). Unemployment rate was positively associated with total (RR = 1.40, 95% CI: 1.24 to 1.59) and female (RR=1.09, 95% CI: 1.01 to 1.18) suicide. There was also a positive association between proportion of population with low individual income and suicide in total (RR = 1.28, 95% CI: 1.10 to 1.48) and male (RR = 1.45, 95% CI: 1.23 to 1.72) population. Rainfall was only positively associated with suicide in total population (RR = 1.11, 95% CI: 1.04 to 1.19). There was no significant association for rainfall, minimum temperature, SEIFA, proportion of population with low educational attainment. The second stage is the extension of the first stage. Different spatial scales of dataset were used between the two stages (i.e., mean yearly data in the first stage, and seasonal data in the second stage), but the results are generally consistent with each other. Compared with other studies, this research explored the variety of the impact of a wide range of socio-environmental factors on suicide in different geographical units. Maximum temperature, proportion of Indigenous population, unemployment rate and proportion of population with low individual income were among the major determinants of suicide in Queensland. However, the influence from other factors (e.g. socio-culture background, alcohol and drug use) influencing suicide cannot be ignored. An in-depth understanding of these factors is vital in planning and implementing suicide prevention strategies. Five recommendations for future research are derived from this study: (1) It is vital to acquire detailed personal information on each suicide case and relevant information among the population in assessing the key socio-environmental determinants of suicide; (2) Bayesian model could be applied to compare mortality rates and their socio-environmental determinants across LGAs in future research; (3) In the LGAs with warm weather, high proportion of Indigenous population and/or unemployment rate, concerted efforts need to be made to control and prevent suicide and other mental health problems; (4) The current surveillance, forecasting and early warning system needs to be strengthened, to trace the climate and socioeconomic change over time and space and its impact on population health; (5) It is necessary to evaluate and improve the facilities of mental health care, psychological consultation, suicide prevention and control programs; especially in the areas with low socio-economic status, high unemployment rate, extreme weather events and natural disasters.
Resumo:
The main objective of this PhD was to further develop Bayesian spatio-temporal models (specifically the Conditional Autoregressive (CAR) class of models), for the analysis of sparse disease outcomes such as birth defects. The motivation for the thesis arose from problems encountered when analyzing a large birth defect registry in New South Wales. The specific components and related research objectives of the thesis were developed from gaps in the literature on current formulations of the CAR model, and health service planning requirements. Data from a large probabilistically-linked database from 1990 to 2004, consisting of fields from two separate registries: the Birth Defect Registry (BDR) and Midwives Data Collection (MDC) were used in the analyses in this thesis. The main objective was split into smaller goals. The first goal was to determine how the specification of the neighbourhood weight matrix will affect the smoothing properties of the CAR model, and this is the focus of chapter 6. Secondly, I hoped to evaluate the usefulness of incorporating a zero-inflated Poisson (ZIP) component as well as a shared-component model in terms of modeling a sparse outcome, and this is carried out in chapter 7. The third goal was to identify optimal sampling and sample size schemes designed to select individual level data for a hybrid ecological spatial model, and this is done in chapter 8. Finally, I wanted to put together the earlier improvements to the CAR model, and along with demographic projections, provide forecasts for birth defects at the SLA level. Chapter 9 describes how this is done. For the first objective, I examined a series of neighbourhood weight matrices, and showed how smoothing the relative risk estimates according to similarity by an important covariate (i.e. maternal age) helped improve the model’s ability to recover the underlying risk, as compared to the traditional adjacency (specifically the Queen) method of applying weights. Next, to address the sparseness and excess zeros commonly encountered in the analysis of rare outcomes such as birth defects, I compared a few models, including an extension of the usual Poisson model to encompass excess zeros in the data. This was achieved via a mixture model, which also encompassed the shared component model to improve on the estimation of sparse counts through borrowing strength across a shared component (e.g. latent risk factor/s) with the referent outcome (caesarean section was used in this example). Using the Deviance Information Criteria (DIC), I showed how the proposed model performed better than the usual models, but only when both outcomes shared a strong spatial correlation. The next objective involved identifying the optimal sampling and sample size strategy for incorporating individual-level data with areal covariates in a hybrid study design. I performed extensive simulation studies, evaluating thirteen different sampling schemes along with variations in sample size. This was done in the context of an ecological regression model that incorporated spatial correlation in the outcomes, as well as accommodating both individual and areal measures of covariates. Using the Average Mean Squared Error (AMSE), I showed how a simple random sample of 20% of the SLAs, followed by selecting all cases in the SLAs chosen, along with an equal number of controls, provided the lowest AMSE. The final objective involved combining the improved spatio-temporal CAR model with population (i.e. women) forecasts, to provide 30-year annual estimates of birth defects at the Statistical Local Area (SLA) level in New South Wales, Australia. The projections were illustrated using sixteen different SLAs, representing the various areal measures of socio-economic status and remoteness. A sensitivity analysis of the assumptions used in the projection was also undertaken. By the end of the thesis, I will show how challenges in the spatial analysis of rare diseases such as birth defects can be addressed, by specifically formulating the neighbourhood weight matrix to smooth according to a key covariate (i.e. maternal age), incorporating a ZIP component to model excess zeros in outcomes and borrowing strength from a referent outcome (i.e. caesarean counts). An efficient strategy to sample individual-level data and sample size considerations for rare disease will also be presented. Finally, projections in birth defect categories at the SLA level will be made.
Resumo:
Ecological problems are typically multi faceted and need to be addressed from a scientific and a management perspective. There is a wealth of modelling and simulation software available, each designed to address a particular aspect of the issue of concern. Choosing the appropriate tool, making sense of the disparate outputs, and taking decisions when little or no empirical data is available, are everyday challenges facing the ecologist and environmental manager. Bayesian Networks provide a statistical modelling framework that enables analysis and integration of information in its own right as well as integration of a variety of models addressing different aspects of a common overall problem. There has been increased interest in the use of BNs to model environmental systems and issues of concern. However, the development of more sophisticated BNs, utilising dynamic and object oriented (OO) features, is still at the frontier of ecological research. Such features are particularly appealing in an ecological context, since the underlying facts are often spatial and temporal in nature. This thesis focuses on an integrated BN approach which facilitates OO modelling. Our research devises a new heuristic method, the Iterative Bayesian Network Development Cycle (IBNDC), for the development of BN models within a multi-field and multi-expert context. Expert elicitation is a popular method used to quantify BNs when data is sparse, but expert knowledge is abundant. The resulting BNs need to be substantiated and validated taking this uncertainty into account. Our research demonstrates the application of the IBNDC approach to support these aspects of BN modelling. The complex nature of environmental issues makes them ideal case studies for the proposed integrated approach to modelling. Moreover, they lend themselves to a series of integrated sub-networks describing different scientific components, combining scientific and management perspectives, or pooling similar contributions developed in different locations by different research groups. In southern Africa the two largest free-ranging cheetah (Acinonyx jubatus) populations are in Namibia and Botswana, where the majority of cheetahs are located outside protected areas. Consequently, cheetah conservation in these two countries is focussed primarily on the free-ranging populations as well as the mitigation of conflict between humans and cheetahs. In contrast, in neighbouring South Africa, the majority of cheetahs are found in fenced reserves. Nonetheless, conflict between humans and cheetahs remains an issue here. Conservation effort in South Africa is also focussed on managing the geographically isolated cheetah populations as one large meta-population. Relocation is one option among a suite of tools used to resolve human-cheetah conflict in southern Africa. Successfully relocating captured problem cheetahs, and maintaining a viable free-ranging cheetah population, are two environmental issues in cheetah conservation forming the first case study in this thesis. The second case study involves the initiation of blooms of Lyngbya majuscula, a blue-green algae, in Deception Bay, Australia. L. majuscula is a toxic algal bloom which has severe health, ecological and economic impacts on the community located in the vicinity of this algal bloom. Deception Bay is an important tourist destination with its proximity to Brisbane, Australia’s third largest city. Lyngbya is one of several algae considered to be a Harmful Algal Bloom (HAB). This group of algae includes other widespread blooms such as red tides. The occurrence of Lyngbya blooms is not a local phenomenon, but blooms of this toxic weed occur in coastal waters worldwide. With the increase in frequency and extent of these HAB blooms, it is important to gain a better understanding of the underlying factors contributing to the initiation and sustenance of these blooms. This knowledge will contribute to better management practices and the identification of those management actions which could prevent or diminish the severity of these blooms.
Resumo:
This paper illustrates the prediction of opponent behaviour in a competitive, highly dynamic, multi-agent and partially observable environment, namely RoboCup small size league robot soccer. The performance is illustrated in the context of the highly successful robot soccer team, the RoboRoos. The project is broken into three tasks; classification of behaviours, modelling and prediction of behaviours and integration of the predictions into the existing planning system. A probabilistic approach is taken to dealing with the uncertainty in the observations and with representing the uncertainty in the prediction of the behaviours. Results are shown for a classification system using a Naïve Bayesian Network that determines the opponent’s current behaviour. These results are compared to an expert designed fuzzy behaviour classification system. The paper illustrates how the modelling system will use the information from behaviour classification to produce probability distributions that model the manner with which the opponents perform their behaviours. These probability distributions are show to match well with the existing multi-agent planning system (MAPS) that forms the core of the RoboRoos system.
Resumo:
This thesis addresses computational challenges arising from Bayesian analysis of complex real-world problems. Many of the models and algorithms designed for such analysis are ‘hybrid’ in nature, in that they are a composition of components for which their individual properties may be easily described but the performance of the model or algorithm as a whole is less well understood. The aim of this research project is to after a better understanding of the performance of hybrid models and algorithms. The goal of this thesis is to analyse the computational aspects of hybrid models and hybrid algorithms in the Bayesian context. The first objective of the research focuses on computational aspects of hybrid models, notably a continuous finite mixture of t-distributions. In the mixture model, an inference of interest is the number of components, as this may relate to both the quality of model fit to data and the computational workload. The analysis of t-mixtures using Markov chain Monte Carlo (MCMC) is described and the model is compared to the Normal case based on the goodness of fit. Through simulation studies, it is demonstrated that the t-mixture model can be more flexible and more parsimonious in terms of number of components, particularly for skewed and heavytailed data. The study also reveals important computational issues associated with the use of t-mixtures, which have not been adequately considered in the literature. The second objective of the research focuses on computational aspects of hybrid algorithms for Bayesian analysis. Two approaches will be considered: a formal comparison of the performance of a range of hybrid algorithms and a theoretical investigation of the performance of one of these algorithms in high dimensions. For the first approach, the delayed rejection algorithm, the pinball sampler, the Metropolis adjusted Langevin algorithm, and the hybrid version of the population Monte Carlo (PMC) algorithm are selected as a set of examples of hybrid algorithms. Statistical literature shows how statistical efficiency is often the only criteria for an efficient algorithm. In this thesis the algorithms are also considered and compared from a more practical perspective. This extends to the study of how individual algorithms contribute to the overall efficiency of hybrid algorithms, and highlights weaknesses that may be introduced by the combination process of these components in a single algorithm. The second approach to considering computational aspects of hybrid algorithms involves an investigation of the performance of the PMC in high dimensions. It is well known that as a model becomes more complex, computation may become increasingly difficult in real time. In particular the importance sampling based algorithms, including the PMC, are known to be unstable in high dimensions. This thesis examines the PMC algorithm in a simplified setting, a single step of the general sampling, and explores a fundamental problem that occurs in applying importance sampling to a high-dimensional problem. The precision of the computed estimate from the simplified setting is measured by the asymptotic variance of the estimate under conditions on the importance function. Additionally, the exponential growth of the asymptotic variance with the dimension is demonstrated and we illustrates that the optimal covariance matrix for the importance function can be estimated in a special case.
Resumo:
Road agencies require comprehensive, relevan and quality data describing their road assets to support their investment decisions. An investment decision support system for raod maintenance and rehabilitation mainly comprise three important supporting elements namely: road asset data, decision support tools and criteria for decision-making. Probability-based methods have played a crucial role in helping decision makers understand the relationship among road related data, asset performance and uncertainties in estimating budgets/costs for road management investment. This paper presents applications of the probability-bsed method for road asset management.