897 resultados para DNA Sequence, Hidden Markov Model, Bayesian Model, Sensitive Analysis, Markov Chain Monte Carlo
Resumo:
Breast cancer is the most common non-skin cancer and the second leading cause of cancer-related death in women in the United States. Studies on ipsilateral breast tumor relapse (IBTR) status and disease-specific survival will help guide clinic treatment and predict patient prognosis.^ After breast conservation therapy, patients with breast cancer may experience breast tumor relapse. This relapse is classified into two distinct types: true local recurrence (TR) and new ipsilateral primary tumor (NP). However, the methods used to classify the relapse types are imperfect and are prone to misclassification. In addition, some observed survival data (e.g., time to relapse and time from relapse to death)are strongly correlated with relapse types. The first part of this dissertation presents a Bayesian approach to (1) modeling the potentially misclassified relapse status and the correlated survival information, (2) estimating the sensitivity and specificity of the diagnostic methods, and (3) quantify the covariate effects on event probabilities. A shared frailty was used to account for the within-subject correlation between survival times. The inference was conducted using a Bayesian framework via Markov Chain Monte Carlo simulation implemented in softwareWinBUGS. Simulation was used to validate the Bayesian method and assess its frequentist properties. The new model has two important innovations: (1) it utilizes the additional survival times correlated with the relapse status to improve the parameter estimation, and (2) it provides tools to address the correlation between the two diagnostic methods conditional to the true relapse types.^ Prediction of patients at highest risk for IBTR after local excision of ductal carcinoma in situ (DCIS) remains a clinical concern. The goals of the second part of this dissertation were to evaluate a published nomogram from Memorial Sloan-Kettering Cancer Center, to determine the risk of IBTR in patients with DCIS treated with local excision, and to determine whether there is a subset of patients at low risk of IBTR. Patients who had undergone local excision from 1990 through 2007 at MD Anderson Cancer Center with a final diagnosis of DCIS (n=794) were included in this part. Clinicopathologic factors and the performance of the Memorial Sloan-Kettering Cancer Center nomogram for prediction of IBTR were assessed for 734 patients with complete data. Nomogram for prediction of 5- and 10-year IBTR probabilities were found to demonstrate imperfect calibration and discrimination, with an area under the receiver operating characteristic curve of .63 and a concordance index of .63. In conclusion, predictive models for IBTR in DCIS patients treated with local excision are imperfect. Our current ability to accurately predict recurrence based on clinical parameters is limited.^ The American Joint Committee on Cancer (AJCC) staging of breast cancer is widely used to determine prognosis, yet survival within each AJCC stage shows wide variation and remains unpredictable. For the third part of this dissertation, biologic markers were hypothesized to be responsible for some of this variation, and the addition of biologic markers to current AJCC staging were examined for possibly provide improved prognostication. The initial cohort included patients treated with surgery as first intervention at MDACC from 1997 to 2006. Cox proportional hazards models were used to create prognostic scoring systems. AJCC pathologic staging parameters and biologic tumor markers were investigated to devise the scoring systems. Surveillance Epidemiology and End Results (SEER) data was used as the external cohort to validate the scoring systems. Binary indicators for pathologic stage (PS), estrogen receptor status (E), and tumor grade (G) were summed to create PS+EG scoring systems devised to predict 5-year patient outcomes. These scoring systems facilitated separation of the study population into more refined subgroups than the current AJCC staging system. The ability of the PS+EG score to stratify outcomes was confirmed in both internal and external validation cohorts. The current study proposes and validates a new staging system by incorporating tumor grade and ER status into current AJCC staging. We recommend that biologic markers be incorporating into revised versions of the AJCC staging system for patients receiving surgery as the first intervention.^ Chapter 1 focuses on developing a Bayesian method to solve misclassified relapse status and application to breast cancer data. Chapter 2 focuses on evaluation of a breast cancer nomogram for predicting risk of IBTR in patients with DCIS after local excision gives the statement of the problem in the clinical research. Chapter 3 focuses on validation of a novel staging system for disease-specific survival in patients with breast cancer treated with surgery as the first intervention. ^
Resumo:
This paper contributes to the on-going empirical debate regarding the role of the RBC model and in particular of technology shocks in explaining aggregate fluctuations. To this end we estimate the model’s posterior density using Markov-Chain Monte-Carlo (MCMC) methods. Within this framework we extend Ireland’s (2001, 2004) hybrid estimation approach to allow for a vector autoregressive moving average (VARMA) process to describe the movements and co-movements of the model’s errors not explained by the basic RBC model. The results of marginal likelihood ratio tests reveal that the more general model of the errors significantly improves the model’s fit relative to the VAR and AR alternatives. Moreover, despite setting the RBC model a more difficult task under the VARMA specification, our analysis, based on forecast error and spectral decompositions, suggests that the RBC model is still capable of explaining a significant fraction of the observed variation in macroeconomic aggregates in the post-war U.S. economy.
Resumo:
This paper develops methods for Stochastic Search Variable Selection (currently popular with regression and Vector Autoregressive models) for Vector Error Correction models where there are many possible restrictions on the cointegration space. We show how this allows the researcher to begin with a single unrestricted model and either do model selection or model averaging in an automatic and computationally efficient manner. We apply our methods to a large UK macroeconomic model.
Resumo:
Probabilistic inversion methods based on Markov chain Monte Carlo (MCMC) simulation are well suited to quantify parameter and model uncertainty of nonlinear inverse problems. Yet, application of such methods to CPU-intensive forward models can be a daunting task, particularly if the parameter space is high dimensional. Here, we present a 2-D pixel-based MCMC inversion of plane-wave electromagnetic (EM) data. Using synthetic data, we investigate how model parameter uncertainty depends on model structure constraints using different norms of the likelihood function and the model constraints, and study the added benefits of joint inversion of EM and electrical resistivity tomography (ERT) data. Our results demonstrate that model structure constraints are necessary to stabilize the MCMC inversion results of a highly discretized model. These constraints decrease model parameter uncertainty and facilitate model interpretation. A drawback is that these constraints may lead to posterior distributions that do not fully include the true underlying model, because some of its features exhibit a low sensitivity to the EM data, and hence are difficult to resolve. This problem can be partly mitigated if the plane-wave EM data is augmented with ERT observations. The hierarchical Bayesian inverse formulation introduced and used herein is able to successfully recover the probabilistic properties of the measurement data errors and a model regularization weight. Application of the proposed inversion methodology to field data from an aquifer demonstrates that the posterior mean model realization is very similar to that derived from a deterministic inversion with similar model constraints.
Resumo:
This thesis was focussed on statistical analysis methods and proposes the use of Bayesian inference to extract information contained in experimental data by estimating Ebola model parameters. The model is a system of differential equations expressing the behavior and dynamics of Ebola. Two sets of data (onset and death data) were both used to estimate parameters, which has not been done by previous researchers in (Chowell, 2004). To be able to use both data, a new version of the model has been built. Model parameters have been estimated and then used to calculate the basic reproduction number and to study the disease-free equilibrium. Estimates of the parameters were useful to determine how well the model fits the data and how good estimates were, in terms of the information they provided about the possible relationship between variables. The solution showed that Ebola model fits the observed onset data at 98.95% and the observed death data at 93.6%. Since Bayesian inference can not be performed analytically, the Markov chain Monte Carlo approach has been used to generate samples from the posterior distribution over parameters. Samples have been used to check the accuracy of the model and other characteristics of the target posteriors.
Resumo:
Statistical methods of inference typically require the likelihood function to be computable in a reasonable amount of time. The class of “likelihood-free” methods termed Approximate Bayesian Computation (ABC) is able to eliminate this requirement, replacing the evaluation of the likelihood with simulation from it. Likelihood-free methods have gained in efficiency and popularity in the past few years, following their integration with Markov Chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) in order to better explore the parameter space. They have been applied primarily to estimating the parameters of a given model, but can also be used to compare models. Here we present novel likelihood-free approaches to model comparison, based upon the independent estimation of the evidence of each model under study. Key advantages of these approaches over previous techniques are that they allow the exploitation of MCMC or SMC algorithms for exploring the parameter space, and that they do not require a sampler able to mix between models. We validate the proposed methods using a simple exponential family problem before providing a realistic problem from human population genetics: the comparison of different demographic models based upon genetic data from the Y chromosome.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
In this paper we propose a hybrid hazard regression model with threshold stress which includes the proportional hazards and the accelerated failure time models as particular cases. To express the behavior of lifetimes the generalized-gamma distribution is assumed and an inverse power law model with a threshold stress is considered. For parameter estimation we develop a sampling-based posterior inference procedure based on Markov Chain Monte Carlo techniques. We assume proper but vague priors for the parameters of interest. A simulation study investigates the frequentist properties of the proposed estimators obtained under the assumption of vague priors. Further, some discussions on model selection criteria are given. The methodology is illustrated on simulated and real lifetime data set.
Resumo:
A data set of a commercial Nellore beef cattle selection program was used to compare breeding models that assumed or not markers effects to estimate the breeding values, when a reduced number of animals have phenotypic, genotypic and pedigree information available. This herd complete data set was composed of 83,404 animals measured for weaning weight (WW), post-weaning gain (PWG), scrotal circumference (SC) and muscle score (MS), corresponding to 116,652 animals in the relationship matrix. Single trait analyses were performed by MTDFREML software to estimate fixed and random effects solutions using this complete data. The additive effects estimated were assumed as the reference breeding values for those animals. The individual observed phenotype of each trait was adjusted for fixed and random effects solutions, except for direct additive effects. The adjusted phenotype composed of the additive and residual parts of observed phenotype was used as dependent variable for models' comparison. Among all measured animals of this herd, only 3160 animals were genotyped for 106 SNP markers. Three models were compared in terms of changes on animals' rank, global fit and predictive ability. Model 1 included only polygenic effects, model 2 included only markers effects and model 3 included both polygenic and markers effects. Bayesian inference via Markov chain Monte Carlo methods performed by TM software was used to analyze the data for model comparison. Two different priors were adopted for markers effects in models 2 and 3, the first prior assumed was a uniform distribution (U) and, as a second prior, was assumed that markers effects were distributed as normal (N). Higher rank correlation coefficients were observed for models 3_U and 3_N, indicating a greater similarity of these models animals' rank and the rank based on the reference breeding values. Model 3_N presented a better global fit, as demonstrated by its low DIC. The best models in terms of predictive ability were models 1 and 3_N. Differences due prior assumed to markers effects in models 2 and 3 could be attributed to the better ability of normal prior in handle with collinear effects. The models 2_U and 2_N presented the worst performance, indicating that this small set of markers should not be used to genetically evaluate animals with no data, since its predictive ability is restricted. In conclusion, model 3_N presented a slight superiority when a reduce number of animals have phenotypic, genotypic and pedigree information. It could be attributed to the variation retained by markers and polygenic effects assumed together and the normal prior assumed to markers effects, that deals better with the collinearity between markers. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Euastacus crayfish are endemic to freshwater ecosystems of the eastern coast of Australia. While recent evolutionary studies have focused on a few of these species, here we provide a comprehensive phylogenetic estimate of relationships among the species within the genus. We sequenced three mitochondrial gene regions (COI, 16S, and 12S) and one nuclear region (28S) from 40 species of the genus Euastacus, as well as one undescribed species. Using these data, we estimated the phylogenetic relationships within the genus using maximum-likelihood, parsimony, and Bayesian Markov Chain Monte Carlo analyses. Using Bayes factors to test different model hypotheses, we found that the best phylogeny supports monophyletic groupings of all but two recognized species and suggests a widespread ancestor that diverged by vicariance. We also show that Eitastacus and Astacopsis are most likely monophyletic sister genera. We use the resulting phylogeny as a framework to test biogeographic hypotheses relating to the diversification of the genus. (c) 2005 Elsevier Inc. All rights reserved.
Resumo:
Eukaryotic genomes display segmental patterns of variation in various properties, including GC content and degree of evolutionary conservation. DNA segmentation algorithms are aimed at identifying statistically significant boundaries between such segments. Such algorithms may provide a means of discovering new classes of functional elements in eukaryotic genomes. This paper presents a model and an algorithm for Bayesian DNA segmentation and considers the feasibility of using it to segment whole eukaryotic genomes. The algorithm is tested on a range of simulated and real DNA sequences, and the following conclusions are drawn. Firstly, the algorithm correctly identifies non-segmented sequence, and can thus be used to reject the null hypothesis of uniformity in the property of interest. Secondly, estimates of the number and locations of change-points produced by the algorithm are robust to variations in algorithm parameters and initial starting conditions and correspond to real features in the data. Thirdly, the algorithm is successfully used to segment human chromosome 1 according to GC content, thus demonstrating the feasibility of Bayesian segmentation of eukaryotic genomes. The software described in this paper is available from the author's website (www.uq.edu.au/similar to uqjkeith/) or upon request to the author.
Resumo:
The ERS-1 satellite carries a scatterometer which measures the amount of radiation scattered back toward the satellite by the ocean's surface. These measurements can be used to infer wind vectors. The implementation of a neural network based forward model which maps wind vectors to radar backscatter is addressed. Input noise cannot be neglected. To account for this noise, a Bayesian framework is adopted. However, Markov Chain Monte Carlo sampling is too computationally expensive. Instead, gradient information is used with a non-linear optimisation algorithm to find the maximum em a posteriori probability values of the unknown variables. The resulting models are shown to compare well with the current operational model when visualised in the target space.
Resumo:
The ERS-1 satellite carries a scatterometer which measures the amount of radiation scattered back toward the satellite by the ocean's surface. These measurements can be used to infer wind vectors. The implementation of a neural network based forward model which maps wind vectors to radar backscatter is addressed. Input noise cannot be neglected. To account for this noise, a Bayesian framework is adopted. However, Markov Chain Monte Carlo sampling is too computationally expensive. Instead, gradient information is used with a non-linear optimisation algorithm to find the maximum em a posteriori probability values of the unknown variables. The resulting models are shown to compare well with the current operational model when visualised in the target space.
Resumo:
Large-conductance Ca(2+)-activated K(+) channels (BK) play a fundamental role in modulating membrane potential in many cell types. The gating of BK channels and its modulation by Ca(2+) and voltage has been the subject of intensive research over almost three decades, yielding several of the most complicated kinetic mechanisms ever proposed. A large number of open and closed states disposed, respectively, in two planes, named tiers, characterize these mechanisms. Transitions between states in the same plane are cooperative and modulated by Ca(2+). Transitions across planes are highly concerted and voltage-dependent. Here we reexamine the validity of the two-tiered hypothesis by restricting attention to the modulation by Ca(2+). Large single channel data sets at five Ca(2+) concentrations were simultaneously analyzed from a Bayesian perspective by using hidden Markov models and Markov-chain Monte Carlo stochastic integration techniques. Our results support a dramatic reduction in model complexity, favoring a simple mechanism derived from the Monod-Wyman-Changeux allosteric model for homotetramers, able to explain the Ca(2+) modulation of the gating process. This model differs from the standard Monod-Wyman-Changeux scheme in that one distinguishes when two Ca(2+) ions are bound to adjacent or diagonal subunits of the tetramer.
Resumo:
Hepatitis B is a worldwide health problem affecting about 2 billion people and more than 350 million are chronic carriers of the virus. Nine HBV genotypes (A to I) have been described. The geographical distribution of HBV genotypes is not completely understood due to the limited number of samples from some parts of the world. One such example is Colombia, in which few studies have described the HBV genotypes. In this study, we characterized HBV genotypes in 143 HBsAg-positive volunteer blood donors from Colombia. A fragment of 1306 bp partially comprising HBsAg and the DNA polymerase coding regions (S/POL) was amplified and sequenced. Bayesian phylogenetic analyses were conducted using the Markov Chain Monte Carlo (MCMC) approach to obtain the maximum clade credibility (MCC) tree using BEAST v.1.5.3. Of all samples, 68 were positive and 52 were successfully sequenced. Genotype F was the most prevalent in this population (77%) - subgenotypes F3 (75%) and Fib (2%). Genotype G (7.7%) and subgenotype A2 (15.3%) were also found. Genotype G sequence analysis suggests distinct introductions of this genotype in the country. Furthermore, we estimated the time of the most recent common ancestor (TMRCA) for each HBV/F subgenotype and also for Colombian F3 sequences using two different datasets: (i) 77 sequences comprising 1306 bp of S/POL region and (ii) 283 sequences comprising 681 bp of S/POL region. We also used two other previously estimated evolutionary rates: (i) 2.60 x 10(-4) s/s/y and (ii) 1.5 x 10(-5) s/s/y. Here we report the HBV genotypes circulating in Colombia and estimated the TMRCA for the four different subgenotypes of genotype F. (C) 2010 Elsevier B.V. All rights reserved.