873 resultados para DNA Sequence, Hidden Markov Model, Bayesian Model, Sensitive Analysis, Markov Chain Monte Carlo
Resumo:
The purpose of this paper is to develop a Bayesian analysis for the right-censored survival data when immune or cured individuals may be present in the population from which the data is taken. In our approach the number of competing causes of the event of interest follows the Conway-Maxwell-Poisson distribution which generalizes the Poisson distribution. Markov chain Monte Carlo (MCMC) methods are used to develop a Bayesian procedure for the proposed model. Also, some discussions on the model selection and an illustration with a real data set are considered.
Resumo:
Item response theory (IRT) comprises a set of statistical models which are useful in many fields, especially when there is an interest in studying latent variables (or latent traits). Usually such latent traits are assumed to be random variables and a convenient distribution is assigned to them. A very common choice for such a distribution has been the standard normal. Recently, Azevedo et al. [Bayesian inference for a skew-normal IRT model under the centred parameterization, Comput. Stat. Data Anal. 55 (2011), pp. 353-365] proposed a skew-normal distribution under the centred parameterization (SNCP) as had been studied in [R. B. Arellano-Valle and A. Azzalini, The centred parametrization for the multivariate skew-normal distribution, J. Multivariate Anal. 99(7) (2008), pp. 1362-1382], to model the latent trait distribution. This approach allows one to represent any asymmetric behaviour concerning the latent trait distribution. Also, they developed a Metropolis-Hastings within the Gibbs sampling (MHWGS) algorithm based on the density of the SNCP. They showed that the algorithm recovers all parameters properly. Their results indicated that, in the presence of asymmetry, the proposed model and the estimation algorithm perform better than the usual model and estimation methods. Our main goal in this paper is to propose another type of MHWGS algorithm based on a stochastic representation (hierarchical structure) of the SNCP studied in [N. Henze, A probabilistic representation of the skew-normal distribution, Scand. J. Statist. 13 (1986), pp. 271-275]. Our algorithm has only one Metropolis-Hastings step, in opposition to the algorithm developed by Azevedo et al., which has two such steps. This not only makes the implementation easier but also reduces the number of proposal densities to be used, which can be a problem in the implementation of MHWGS algorithms, as can be seen in [R.J. Patz and B.W. Junker, A straightforward approach to Markov Chain Monte Carlo methods for item response models, J. Educ. Behav. Stat. 24(2) (1999), pp. 146-178; R. J. Patz and B. W. Junker, The applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses, J. Educ. Behav. Stat. 24(4) (1999), pp. 342-366; A. Gelman, G.O. Roberts, and W.R. Gilks, Efficient Metropolis jumping rules, Bayesian Stat. 5 (1996), pp. 599-607]. Moreover, we consider a modified beta prior (which generalizes the one considered in [3]) and a Jeffreys prior for the asymmetry parameter. Furthermore, we study the sensitivity of such priors as well as the use of different kernel densities for this parameter. Finally, we assess the impact of the number of examinees, number of items and the asymmetry level on the parameter recovery. Results of the simulation study indicated that our approach performed equally as well as that in [3], in terms of parameter recovery, mainly using the Jeffreys prior. Also, they indicated that the asymmetry level has the highest impact on parameter recovery, even though it is relatively small. A real data analysis is considered jointly with the development of model fitting assessment tools. The results are compared with the ones obtained by Azevedo et al. The results indicate that using the hierarchical approach allows us to implement MCMC algorithms more easily, it facilitates diagnosis of the convergence and also it can be very useful to fit more complex skew IRT models.
Resumo:
In this article, we propose a new Bayesian flexible cure rate survival model, which generalises the stochastic model of Klebanov et al. [Klebanov LB, Rachev ST and Yakovlev AY. A stochastic-model of radiation carcinogenesis - latent time distributions and their properties. Math Biosci 1993; 113: 51-75], and has much in common with the destructive model formulated by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de Sao Carlos, Sao Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)]. In our approach, the accumulated number of lesions or altered cells follows a compound weighted Poisson distribution. This model is more flexible than the promotion time cure model in terms of dispersion. Moreover, it possesses an interesting and realistic interpretation of the biological mechanism of the occurrence of the event of interest as it includes a destructive process of tumour cells after an initial treatment or the capacity of an individual exposed to irradiation to repair altered cells that results in cancer induction. In other words, what is recorded is only the damaged portion of the original number of altered cells not eliminated by the treatment or repaired by the repair system of an individual. Markov Chain Monte Carlo (MCMC) methods are then used to develop Bayesian inference for the proposed model. Also, some discussions on the model selection and an illustration with a cutaneous melanoma data set analysed by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de Sao Carlos, Sao Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)] are presented.
Resumo:
The aim of the thesi is to formulate a suitable Item Response Theory (IRT) based model to measure HRQoL (as latent variable) using a mixed responses questionnaire and relaxing the hypothesis of normal distributed latent variable. The new model is a combination of two models already presented in literature, that is, a latent trait model for mixed responses and an IRT model for Skew Normal latent variable. It is developed in a Bayesian framework, a Markov chain Monte Carlo procedure is used to generate samples of the posterior distribution of the parameters of interest. The proposed model is test on a questionnaire composed by 5 discrete items and one continuous to measure HRQoL in children, the EQ-5D-Y questionnaire. A large sample of children collected in the schools was used. In comparison with a model for only discrete responses and a model for mixed responses and normal latent variable, the new model has better performances, in term of deviance information criterion (DIC), chain convergences times and precision of the estimates.
Resumo:
Breast cancer is the most common non-skin cancer and the second leading cause of cancer-related death in women in the United States. Studies on ipsilateral breast tumor relapse (IBTR) status and disease-specific survival will help guide clinic treatment and predict patient prognosis.^ After breast conservation therapy, patients with breast cancer may experience breast tumor relapse. This relapse is classified into two distinct types: true local recurrence (TR) and new ipsilateral primary tumor (NP). However, the methods used to classify the relapse types are imperfect and are prone to misclassification. In addition, some observed survival data (e.g., time to relapse and time from relapse to death)are strongly correlated with relapse types. The first part of this dissertation presents a Bayesian approach to (1) modeling the potentially misclassified relapse status and the correlated survival information, (2) estimating the sensitivity and specificity of the diagnostic methods, and (3) quantify the covariate effects on event probabilities. A shared frailty was used to account for the within-subject correlation between survival times. The inference was conducted using a Bayesian framework via Markov Chain Monte Carlo simulation implemented in softwareWinBUGS. Simulation was used to validate the Bayesian method and assess its frequentist properties. The new model has two important innovations: (1) it utilizes the additional survival times correlated with the relapse status to improve the parameter estimation, and (2) it provides tools to address the correlation between the two diagnostic methods conditional to the true relapse types.^ Prediction of patients at highest risk for IBTR after local excision of ductal carcinoma in situ (DCIS) remains a clinical concern. The goals of the second part of this dissertation were to evaluate a published nomogram from Memorial Sloan-Kettering Cancer Center, to determine the risk of IBTR in patients with DCIS treated with local excision, and to determine whether there is a subset of patients at low risk of IBTR. Patients who had undergone local excision from 1990 through 2007 at MD Anderson Cancer Center with a final diagnosis of DCIS (n=794) were included in this part. Clinicopathologic factors and the performance of the Memorial Sloan-Kettering Cancer Center nomogram for prediction of IBTR were assessed for 734 patients with complete data. Nomogram for prediction of 5- and 10-year IBTR probabilities were found to demonstrate imperfect calibration and discrimination, with an area under the receiver operating characteristic curve of .63 and a concordance index of .63. In conclusion, predictive models for IBTR in DCIS patients treated with local excision are imperfect. Our current ability to accurately predict recurrence based on clinical parameters is limited.^ The American Joint Committee on Cancer (AJCC) staging of breast cancer is widely used to determine prognosis, yet survival within each AJCC stage shows wide variation and remains unpredictable. For the third part of this dissertation, biologic markers were hypothesized to be responsible for some of this variation, and the addition of biologic markers to current AJCC staging were examined for possibly provide improved prognostication. The initial cohort included patients treated with surgery as first intervention at MDACC from 1997 to 2006. Cox proportional hazards models were used to create prognostic scoring systems. AJCC pathologic staging parameters and biologic tumor markers were investigated to devise the scoring systems. Surveillance Epidemiology and End Results (SEER) data was used as the external cohort to validate the scoring systems. Binary indicators for pathologic stage (PS), estrogen receptor status (E), and tumor grade (G) were summed to create PS+EG scoring systems devised to predict 5-year patient outcomes. These scoring systems facilitated separation of the study population into more refined subgroups than the current AJCC staging system. The ability of the PS+EG score to stratify outcomes was confirmed in both internal and external validation cohorts. The current study proposes and validates a new staging system by incorporating tumor grade and ER status into current AJCC staging. We recommend that biologic markers be incorporating into revised versions of the AJCC staging system for patients receiving surgery as the first intervention.^ Chapter 1 focuses on developing a Bayesian method to solve misclassified relapse status and application to breast cancer data. Chapter 2 focuses on evaluation of a breast cancer nomogram for predicting risk of IBTR in patients with DCIS after local excision gives the statement of the problem in the clinical research. Chapter 3 focuses on validation of a novel staging system for disease-specific survival in patients with breast cancer treated with surgery as the first intervention. ^
Resumo:
Plant biosecurity requires statistical tools to interpret field surveillance data in order to manage pest incursions that threaten crop production and trade. Ultimately, management decisions need to be based on the probability that an area is infested or free of a pest. Current informal approaches to delimiting pest extent rely upon expert ecological interpretation of presence / absence data over space and time. Hierarchical Bayesian models provide a cohesive statistical framework that can formally integrate the available information on both pest ecology and data. The overarching method involves constructing an observation model for the surveillance data, conditional on the hidden extent of the pest and uncertain detection sensitivity. The extent of the pest is then modelled as a dynamic invasion process that includes uncertainty in ecological parameters. Modelling approaches to assimilate this information are explored through case studies on spiralling whitefly, Aleurodicus dispersus and red banded mango caterpillar, Deanolis sublimbalis. Markov chain Monte Carlo simulation is used to estimate the probable extent of pests, given the observation and process model conditioned by surveillance data. Statistical methods, based on time-to-event models, are developed to apply hierarchical Bayesian models to early detection programs and to demonstrate area freedom from pests. The value of early detection surveillance programs is demonstrated through an application to interpret surveillance data for exotic plant pests with uncertain spread rates. The model suggests that typical early detection programs provide a moderate reduction in the probability of an area being infested but a dramatic reduction in the expected area of incursions at a given time. Estimates of spiralling whitefly extent are examined at local, district and state-wide scales. The local model estimates the rate of natural spread and the influence of host architecture, host suitability and inspector efficiency. These parameter estimates can support the development of robust surveillance programs. Hierarchical Bayesian models for the human-mediated spread of spiralling whitefly are developed for the colonisation of discrete cells connected by a modified gravity model. By estimating dispersal parameters, the model can be used to predict the extent of the pest over time. An extended model predicts the climate restricted distribution of the pest in Queensland. These novel human-mediated movement models are well suited to demonstrating area freedom at coarse spatio-temporal scales. At finer scales, and in the presence of ecological complexity, exploratory models are developed to investigate the capacity for surveillance information to estimate the extent of red banded mango caterpillar. It is apparent that excessive uncertainty about observation and ecological parameters can impose limits on inference at the scales required for effective management of response programs. The thesis contributes novel statistical approaches to estimating the extent of pests and develops applications to assist decision-making across a range of plant biosecurity surveillance activities. Hierarchical Bayesian modelling is demonstrated as both a useful analytical tool for estimating pest extent and a natural investigative paradigm for developing and focussing biosecurity programs.
Resumo:
Mixture models are a flexible tool for unsupervised clustering that have found popularity in a vast array of research areas. In studies of medicine, the use of mixtures holds the potential to greatly enhance our understanding of patient responses through the identification of clinically meaningful clusters that, given the complexity of many data sources, may otherwise by intangible. Furthermore, when developed in the Bayesian framework, mixture models provide a natural means for capturing and propagating uncertainty in different aspects of a clustering solution, arguably resulting in richer analyses of the population under study. This thesis aims to investigate the use of Bayesian mixture models in analysing varied and detailed sources of patient information collected in the study of complex disease. The first aim of this thesis is to showcase the flexibility of mixture models in modelling markedly different types of data. In particular, we examine three common variants on the mixture model, namely, finite mixtures, Dirichlet Process mixtures and hidden Markov models. Beyond the development and application of these models to different sources of data, this thesis also focuses on modelling different aspects relating to uncertainty in clustering. Examples of clustering uncertainty considered are uncertainty in a patient’s true cluster membership and accounting for uncertainty in the true number of clusters present. Finally, this thesis aims to address and propose solutions to the task of comparing clustering solutions, whether this be comparing patients or observations assigned to different subgroups or comparing clustering solutions over multiple datasets. To address these aims, we consider a case study in Parkinson’s disease (PD), a complex and commonly diagnosed neurodegenerative disorder. In particular, two commonly collected sources of patient information are considered. The first source of data are on symptoms associated with PD, recorded using the Unified Parkinson’s Disease Rating Scale (UPDRS) and constitutes the first half of this thesis. The second half of this thesis is dedicated to the analysis of microelectrode recordings collected during Deep Brain Stimulation (DBS), a popular palliative treatment for advanced PD. Analysis of this second source of data centers on the problems of unsupervised detection and sorting of action potentials or "spikes" in recordings of multiple cell activity, providing valuable information on real time neural activity in the brain.
Resumo:
This study considered the problem of predicting survival, based on three alternative models: a single Weibull, a mixture of Weibulls and a cure model. Instead of the common procedure of choosing a single “best” model, where “best” is defined in terms of goodness of fit to the data, a Bayesian model averaging (BMA) approach was adopted to account for model uncertainty. This was illustrated using a case study in which the aim was the description of lymphoma cancer survival with covariates given by phenotypes and gene expression. The results of this study indicate that if the sample size is sufficiently large, one of the three models emerge as having highest probability given the data, as indicated by the goodness of fit measure; the Bayesian information criterion (BIC). However, when the sample size was reduced, no single model was revealed as “best”, suggesting that a BMA approach would be appropriate. Although a BMA approach can compromise on goodness of fit to the data (when compared to the true model), it can provide robust predictions and facilitate more detailed investigation of the relationships between gene expression and patient survival. Keywords: Bayesian modelling; Bayesian model averaging; Cure model; Markov Chain Monte Carlo; Mixture model; Survival analysis; Weibull distribution
Resumo:
Introduced in this paper is a Bayesian model for isolating the resonant frequency from combustion chamber resonance. The model shown in this paper focused on characterising the initial rise in the resonant frequency to investigate the rise of in-cylinder bulk temperature associated with combustion. By resolving the model parameters, it is possible to determine: the start of pre-mixed combustion, the start of diffusion combustion, the initial resonant frequency, the resonant frequency as a function of crank angle, the in-cylinder bulk temperature as a function of crank angle and the trapped mass as a function of crank angle. The Bayesian method allows for individual cycles to be examined without cycle-averaging|allowing inter-cycle variability studies. Results are shown for a turbo-charged, common-rail compression ignition engine run at 2000 rpm and full load.
Resumo:
The inverse temperature hyperparameter of the hidden Potts model governs the strength of spatial cohesion and therefore has a substantial influence over the resulting model fit. The difficulty arises from the dependence of an intractable normalising constant on the value of the inverse temperature, thus there is no closed form solution for sampling from the distribution directly. We review three computational approaches for addressing this issue, namely pseudolikelihood, path sampling, and the approximate exchange algorithm. We compare the accuracy and scalability of these methods using a simulation study.
Resumo:
In this thesis the use of the Bayesian approach to statistical inference in fisheries stock assessment is studied. The work was conducted in collaboration of the Finnish Game and Fisheries Research Institute by using the problem of monitoring and prediction of the juvenile salmon population in the River Tornionjoki as an example application. The River Tornionjoki is the largest salmon river flowing into the Baltic Sea. This thesis tackles the issues of model formulation and model checking as well as computational problems related to Bayesian modelling in the context of fisheries stock assessment. Each article of the thesis provides a novel method either for extracting information from data obtained via a particular type of sampling system or for integrating the information about the fish stock from multiple sources in terms of a population dynamics model. Mark-recapture and removal sampling schemes and a random catch sampling method are covered for the estimation of the population size. In addition, a method for estimating the stock composition of a salmon catch based on DNA samples is also presented. For most of the articles, Markov chain Monte Carlo (MCMC) simulation has been used as a tool to approximate the posterior distribution. Problems arising from the sampling method are also briefly discussed and potential solutions for these problems are proposed. Special emphasis in the discussion is given to the philosophical foundation of the Bayesian approach in the context of fisheries stock assessment. It is argued that the role of subjective prior knowledge needed in practically all parts of a Bayesian model should be recognized and consequently fully utilised in the process of model formulation.
Resumo:
Quantifying distributional behavior of extreme events is crucial in hydrologic designs. Intensity Duration Frequency (IDF) relationships are used extensively in engineering especially in urban hydrology, to obtain return level of extreme rainfall event for a specified return period and duration. Major sources of uncertainty in the IDF relationships are due to insufficient quantity and quality of data leading to parameter uncertainty due to the distribution fitted to the data and uncertainty as a result of using multiple GCMs. It is important to study these uncertainties and propagate them to future for accurate assessment of return levels for future. The objective of this study is to quantify the uncertainties arising from parameters of the distribution fitted to data and the multiple GCM models using Bayesian approach. Posterior distribution of parameters is obtained from Bayes rule and the parameters are transformed to obtain return levels for a specified return period. Markov Chain Monte Carlo (MCMC) method using Metropolis Hastings algorithm is used to obtain the posterior distribution of parameters. Twenty six CMIP5 GCMs along with four RCP scenarios are considered for studying the effects of climate change and to obtain projected IDF relationships for the case study of Bangalore city in India. GCM uncertainty due to the use of multiple GCMs is treated using Reliability Ensemble Averaging (REA) technique along with the parameter uncertainty. Scale invariance theory is employed for obtaining short duration return levels from daily data. It is observed that the uncertainty in short duration rainfall return levels is high when compared to the longer durations. Further it is observed that parameter uncertainty is large compared to the model uncertainty. (C) 2015 Elsevier Ltd. All rights reserved.
Resumo:
This work addresses the problem of estimating the optimal value function in a Markov Decision Process from observed state-action pairs. We adopt a Bayesian approach to inference, which allows both the model to be estimated and predictions about actions to be made in a unified framework, providing a principled approach to mimicry of a controller on the basis of observed data. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from theposterior distribution over the optimal value function. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.
Resumo:
We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about actions can be made, in a unified framework. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from the posterior distribution. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.
Resumo:
Vibration and acoustic analysis at higher frequencies faces two challenges: computing the response without using an excessive number of degrees of freedom, and quantifying its uncertainty due to small spatial variations in geometry, material properties and boundary conditions. Efficient models make use of the observation that when the response of a decoupled vibro-acoustic subsystem is sufficiently sensitive to uncertainty in such spatial variations, the local statistics of its natural frequencies and mode shapes saturate to universal probability distributions. This holds irrespective of the causes that underly these spatial variations and thus leads to a nonparametric description of uncertainty. This work deals with the identification of uncertain parameters in such models by using experimental data. One of the difficulties is that both experimental errors and modeling errors, due to the nonparametric uncertainty that is inherent to the model type, are present. This is tackled by employing a Bayesian inference strategy. The prior probability distribution of the uncertain parameters is constructed using the maximum entropy principle. The likelihood function that is subsequently computed takes the experimental information, the experimental errors and the modeling errors into account. The posterior probability distribution, which is computed with the Markov Chain Monte Carlo method, provides a full uncertainty quantification of the identified parameters, and indicates how well their uncertainty is reduced, with respect to the prior information, by the experimental data. © 2013 Taylor & Francis Group, London.