7 resultados para Hidden Markov random fields
em Collection Of Biostatistics Research Archive
Resumo:
An important aspect of the QTL mapping problem is the treatment of missing genotype data. If complete genotype data were available, QTL mapping would reduce to the problem of model selection in linear regression. However, in the consideration of loci in the intervals between the available genetic markers, genotype data is inherently missing. Even at the typed genetic markers, genotype data is seldom complete, as a result of failures in the genotyping assays or for the sake of economy (for example, in the case of selective genotyping, where only individuals with extreme phenotypes are genotyped). We discuss the use of algorithms developed for hidden Markov models (HMMs) to deal with the missing genotype data problem.
Resumo:
Genomic alterations have been linked to the development and progression of cancer. The technique of Comparative Genomic Hybridization (CGH) yields data consisting of fluorescence intensity ratios of test and reference DNA samples. The intensity ratios provide information about the number of copies in DNA. Practical issues such as the contamination of tumor cells in tissue specimens and normalization errors necessitate the use of statistics for learning about the genomic alterations from array-CGH data. As increasing amounts of array CGH data become available, there is a growing need for automated algorithms for characterizing genomic profiles. Specifically, there is a need for algorithms that can identify gains and losses in the number of copies based on statistical considerations, rather than merely detect trends in the data. We adopt a Bayesian approach, relying on the hidden Markov model to account for the inherent dependence in the intensity ratios. Posterior inferences are made about gains and losses in copy number. Localized amplifications (associated with oncogene mutations) and deletions (associated with mutations of tumor suppressors) are identified using posterior probabilities. Global trends such as extended regions of altered copy number are detected. Since the posterior distribution is analytically intractable, we implement a Metropolis-within-Gibbs algorithm for efficient simulation-based inference. Publicly available data on pancreatic adenocarcinoma, glioblastoma multiforme and breast cancer are analyzed, and comparisons are made with some widely-used algorithms to illustrate the reliability and success of the technique.
Resumo:
Amplifications and deletions of chromosomal DNA, as well as copy-neutral loss of heterozygosity have been associated with diseases processes. High-throughput single nucleotide polymorphism (SNP) arrays are useful for making genome-wide estimates of copy number and genotype calls. Because neighboring SNPs in high throughput SNP arrays are likely to have dependent copy number and genotype due to the underlying haplotype structure and linkage disequilibrium, hidden Markov models (HMM) may be useful for improving genotype calls and copy number estimates that do not incorporate information from nearby SNPs. We improve previous approaches that utilize a HMM framework for inference in high throughput SNP arrays by integrating copy number, genotype calls, and the corresponding confidence scores when available. Using simulated data, we demonstrate how confidence scores control smoothing in a probabilistic framework. Software for fitting HMMs to SNP array data is available in the R package ICE.
Resumo:
DNA sequence copy number has been shown to be associated with cancer development and progression. Array-based Comparative Genomic Hybridization (aCGH) is a recent development that seeks to identify the copy number ratio at large numbers of markers across the genome. Due to experimental and biological variations across chromosomes and across hybridizations, current methods are limited to analyses of single chromosomes. We propose a more powerful approach that borrows strength across chromosomes and across hybridizations. We assume a Gaussian mixture model, with a hidden Markov dependence structure, and with random effects to allow for intertumoral variation, as well as intratumoral clonal variation. For ease of computation, we base estimation on a pseudolikelihood function. The method produces quantitative assessments of the likelihood of genetic alterations at each clone, along with a graphical display for simple visual interpretation. We assess the characteristics of the method through simulation studies and through analysis of a brain tumor aCGH data set. We show that the pseudolikelihood approach is superior to existing methods both in detecting small regions of copy number alteration and in accurately classifying regions of change when intratumoral clonal variation is present.
Resumo:
There is an emerging interest in modeling spatially correlated survival data in biomedical and epidemiological studies. In this paper, we propose a new class of semiparametric normal transformation models for right censored spatially correlated survival data. This class of models assumes that survival outcomes marginally follow a Cox proportional hazard model with unspecified baseline hazard, and their joint distribution is obtained by transforming survival outcomes to normal random variables, whose joint distribution is assumed to be multivariate normal with a spatial correlation structure. A key feature of the class of semiparametric normal transformation models is that it provides a rich class of spatial survival models where regression coefficients have population average interpretation and the spatial dependence of survival times is conveniently modeled using the transformed variables by flexible normal random fields. We study the relationship of the spatial correlation structure of the transformed normal variables and the dependence measures of the original survival times. Direct nonparametric maximum likelihood estimation in such models is practically prohibited due to the high dimensional intractable integration of the likelihood function and the infinite dimensional nuisance baseline hazard parameter. We hence develop a class of spatial semiparametric estimating equations, which conveniently estimate the population-level regression coefficients and the dependence parameters simultaneously. We study the asymptotic properties of the proposed estimators, and show that they are consistent and asymptotically normal. The proposed method is illustrated with an analysis of data from the East Boston Ashma Study and its performance is evaluated using simulations.
Resumo:
Generalized linear mixed models (GLMM) are generalized linear models with normally distributed random effects in the linear predictor. Penalized quasi-likelihood (PQL), an approximate method of inference in GLMMs, involves repeated fitting of linear mixed models with “working” dependent variables and iterative weights that depend on parameter estimates from the previous cycle of iteration. The generality of PQL, and its implementation in commercially available software, has encouraged the application of GLMMs in many scientific fields. Caution is needed, however, since PQL may sometimes yield badly biased estimates of variance components, especially with binary outcomes. Recent developments in numerical integration, including adaptive Gaussian quadrature, higher order Laplace expansions, stochastic integration and Markov chain Monte Carlo (MCMC) algorithms, provide attractive alternatives to PQL for approximate likelihood inference in GLMMs. Analyses of some well known datasets, and simulations based on these analyses, suggest that PQL still performs remarkably well in comparison with more elaborate procedures in many practical situations. Adaptive Gaussian quadrature is a viable alternative for nested designs where the numerical integration is limited to a small number of dimensions. Higher order Laplace approximations hold the promise of accurate inference more generally. MCMC is likely the method of choice for the most complex problems that involve high dimensional integrals.