6 resultados para hidden semi-Markov model
em Collection Of Biostatistics Research Archive
Resumo:
In Malani and Neilsen (1992) we have proposed alternative estimates of survival function (for time to disease) using a simple marker that describes time to some intermediate stage in a disease process. In this paper we derive the asymptotic variance of one such proposed estimator using two different methods and compare terms of order 1/n when there is no censoring. In the absence of censoring the asymptotic variance obtained using the Greenwood type approach converges to exact variance up to terms involving 1/n. But the asymptotic variance obtained using the theory of the counting process and results from Voelkel and Crowley (1984) on semi-Markov processes has a different term of order 1/n. It is not clear to us at this point why the variance formulae using the latter approach give different results.
Resumo:
Amplifications and deletions of chromosomal DNA, as well as copy-neutral loss of heterozygosity have been associated with diseases processes. High-throughput single nucleotide polymorphism (SNP) arrays are useful for making genome-wide estimates of copy number and genotype calls. Because neighboring SNPs in high throughput SNP arrays are likely to have dependent copy number and genotype due to the underlying haplotype structure and linkage disequilibrium, hidden Markov models (HMM) may be useful for improving genotype calls and copy number estimates that do not incorporate information from nearby SNPs. We improve previous approaches that utilize a HMM framework for inference in high throughput SNP arrays by integrating copy number, genotype calls, and the corresponding confidence scores when available. Using simulated data, we demonstrate how confidence scores control smoothing in a probabilistic framework. Software for fitting HMMs to SNP array data is available in the R package ICE.
Resumo:
Genomic alterations have been linked to the development and progression of cancer. The technique of Comparative Genomic Hybridization (CGH) yields data consisting of fluorescence intensity ratios of test and reference DNA samples. The intensity ratios provide information about the number of copies in DNA. Practical issues such as the contamination of tumor cells in tissue specimens and normalization errors necessitate the use of statistics for learning about the genomic alterations from array-CGH data. As increasing amounts of array CGH data become available, there is a growing need for automated algorithms for characterizing genomic profiles. Specifically, there is a need for algorithms that can identify gains and losses in the number of copies based on statistical considerations, rather than merely detect trends in the data. We adopt a Bayesian approach, relying on the hidden Markov model to account for the inherent dependence in the intensity ratios. Posterior inferences are made about gains and losses in copy number. Localized amplifications (associated with oncogene mutations) and deletions (associated with mutations of tumor suppressors) are identified using posterior probabilities. Global trends such as extended regions of altered copy number are detected. Since the posterior distribution is analytically intractable, we implement a Metropolis-within-Gibbs algorithm for efficient simulation-based inference. Publicly available data on pancreatic adenocarcinoma, glioblastoma multiforme and breast cancer are analyzed, and comparisons are made with some widely-used algorithms to illustrate the reliability and success of the technique.
Resumo:
In this paper, we focus on the model for two types of tumors. Tumor development can be described by four types of death rates and four tumor transition rates. We present a general semi-parametric model to estimate the tumor transition rates based on data from survival/sacrifice experiments. In the model, we make a proportional assumption of tumor transition rates on a common parametric function but no assumption of the death rates from any states. We derived the likelihood function of the data observed in such an experiment, and an EM algorithm that simplified estimating procedures. This article extends work on semi-parametric models for one type of tumor (see Portier and Dinse and Dinse) to two types of tumors.
Resumo:
An important aspect of the QTL mapping problem is the treatment of missing genotype data. If complete genotype data were available, QTL mapping would reduce to the problem of model selection in linear regression. However, in the consideration of loci in the intervals between the available genetic markers, genotype data is inherently missing. Even at the typed genetic markers, genotype data is seldom complete, as a result of failures in the genotyping assays or for the sake of economy (for example, in the case of selective genotyping, where only individuals with extreme phenotypes are genotyped). We discuss the use of algorithms developed for hidden Markov models (HMMs) to deal with the missing genotype data problem.