11 resultados para Markov chains hidden Markov models Viterbi algorithm Forward-Backward algorithm maximum likelihood

em Collection Of Biostatistics Research Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

An important aspect of the QTL mapping problem is the treatment of missing genotype data. If complete genotype data were available, QTL mapping would reduce to the problem of model selection in linear regression. However, in the consideration of loci in the intervals between the available genetic markers, genotype data is inherently missing. Even at the typed genetic markers, genotype data is seldom complete, as a result of failures in the genotyping assays or for the sake of economy (for example, in the case of selective genotyping, where only individuals with extreme phenotypes are genotyped). We discuss the use of algorithms developed for hidden Markov models (HMMs) to deal with the missing genotype data problem.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Amplifications and deletions of chromosomal DNA, as well as copy-neutral loss of heterozygosity have been associated with diseases processes. High-throughput single nucleotide polymorphism (SNP) arrays are useful for making genome-wide estimates of copy number and genotype calls. Because neighboring SNPs in high throughput SNP arrays are likely to have dependent copy number and genotype due to the underlying haplotype structure and linkage disequilibrium, hidden Markov models (HMM) may be useful for improving genotype calls and copy number estimates that do not incorporate information from nearby SNPs. We improve previous approaches that utilize a HMM framework for inference in high throughput SNP arrays by integrating copy number, genotype calls, and the corresponding confidence scores when available. Using simulated data, we demonstrate how confidence scores control smoothing in a probabilistic framework. Software for fitting HMMs to SNP array data is available in the R package ICE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Latent class regression models are useful tools for assessing associations between covariates and latent variables. However, evaluation of key model assumptions cannot be performed using methods from standard regression models due to the unobserved nature of latent outcome variables. This paper presents graphical diagnostic tools to evaluate whether or not latent class regression models adhere to standard assumptions of the model: conditional independence and non-differential measurement. An integral part of these methods is the use of a Markov Chain Monte Carlo estimation procedure. Unlike standard maximum likelihood implementations for latent class regression model estimation, the MCMC approach allows us to calculate posterior distributions and point estimates of any functions of parameters. It is this convenience that allows us to provide the diagnostic methods that we introduce. As a motivating example we present an analysis focusing on the association between depression and socioeconomic status, using data from the Epidemiologic Catchment Area study. We consider a latent class regression analysis investigating the association between depression and socioeconomic status measures, where the latent variable depression is regressed on education and income indicators, in addition to age, gender, and marital status variables. While the fitted latent class regression model yields interesting results, the model parameters are found to be invalid due to the violation of model assumptions. The violation of these assumptions is clearly identified by the presented diagnostic plots. These methods can be applied to standard latent class and latent class regression models, and the general principle can be extended to evaluate model assumptions in other types of models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Latent class analysis (LCA) and latent class regression (LCR) are widely used for modeling multivariate categorical outcomes in social sciences and biomedical studies. Standard analyses assume data of different respondents to be mutually independent, excluding application of the methods to familial and other designs in which participants are clustered. In this paper, we develop multilevel latent class model, in which subpopulation mixing probabilities are treated as random effects that vary among clusters according to a common Dirichlet distribution. We apply the Expectation-Maximization (EM) algorithm for model fitting by maximum likelihood (ML). This approach works well, but is computationally intensive when either the number of classes or the cluster size is large. We propose a maximum pairwise likelihood (MPL) approach via a modified EM algorithm for this case. We also show that a simple latent class analysis, combined with robust standard errors, provides another consistent, robust, but less efficient inferential procedure. Simulation studies suggest that the three methods work well in finite samples, and that the MPL estimates often enjoy comparable precision as the ML estimates. We apply our methods to the analysis of comorbid symptoms in the Obsessive Compulsive Disorder study. Our models' random effects structure has more straightforward interpretation than those of competing methods, thus should usefully augment tools available for latent class analysis of multilevel data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Outcome-dependent, two-phase sampling designs can dramatically reduce the costs of observational studies by judicious selection of the most informative subjects for purposes of detailed covariate measurement. Here we derive asymptotic information bounds and the form of the efficient score and influence functions for the semiparametric regression models studied by Lawless, Kalbfleisch, and Wild (1999) under two-phase sampling designs. We show that the maximum likelihood estimators for both the parametric and nonparametric parts of the model are asymptotically normal and efficient. The efficient influence function for the parametric part aggress with the more general information bound calculations of Robins, Hsieh, and Newey (1995). By verifying the conditions of Murphy and Van der Vaart (2000) for a least favorable parametric submodel, we provide asymptotic justification for statistical inference based on profile likelihood.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces a novel approach to making inference about the regression parameters in the accelerated failure time (AFT) model for current status and interval censored data. The estimator is constructed by inverting a Wald type test for testing a null proportional hazards model. A numerically efficient Markov chain Monte Carlo (MCMC) based resampling method is proposed to simultaneously obtain the point estimator and a consistent estimator of its variance-covariance matrix. We illustrate our approach with interval censored data sets from two clinical studies. Extensive numerical studies are conducted to evaluate the finite sample performance of the new estimators.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is an emerging interest in modeling spatially correlated survival data in biomedical and epidemiological studies. In this paper, we propose a new class of semiparametric normal transformation models for right censored spatially correlated survival data. This class of models assumes that survival outcomes marginally follow a Cox proportional hazard model with unspecified baseline hazard, and their joint distribution is obtained by transforming survival outcomes to normal random variables, whose joint distribution is assumed to be multivariate normal with a spatial correlation structure. A key feature of the class of semiparametric normal transformation models is that it provides a rich class of spatial survival models where regression coefficients have population average interpretation and the spatial dependence of survival times is conveniently modeled using the transformed variables by flexible normal random fields. We study the relationship of the spatial correlation structure of the transformed normal variables and the dependence measures of the original survival times. Direct nonparametric maximum likelihood estimation in such models is practically prohibited due to the high dimensional intractable integration of the likelihood function and the infinite dimensional nuisance baseline hazard parameter. We hence develop a class of spatial semiparametric estimating equations, which conveniently estimate the population-level regression coefficients and the dependence parameters simultaneously. We study the asymptotic properties of the proposed estimators, and show that they are consistent and asymptotically normal. The proposed method is illustrated with an analysis of data from the East Boston Ashma Study and its performance is evaluated using simulations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper discusses estimation of the tumor incidence rate, the death rate given tumor is present and the death rate given tumor is absent using a discrete multistage model. The model was originally proposed by Dewanji and Kalbfleisch (1986) and the maximum likelihood estimate of the tumor incidence rate was obtained using EM algorithm. In this paper, we use a reparametrization to simplify the estimation procedure. The resulting estimates are not always the same as the maximum likelihood estimates but are asymptotically equivalent. In addition, an explicit expression for asymptotic variance and bias of the proposed estimators is also derived. These results can be used to compare efficiency of different sacrifice schemes in carcinogenicity experiments.