876 resultados para Markov Model Estimation


Relevância:

30.00% 30.00%

Publicador:

Resumo:

An important aspect of the QTL mapping problem is the treatment of missing genotype data. If complete genotype data were available, QTL mapping would reduce to the problem of model selection in linear regression. However, in the consideration of loci in the intervals between the available genetic markers, genotype data is inherently missing. Even at the typed genetic markers, genotype data is seldom complete, as a result of failures in the genotyping assays or for the sake of economy (for example, in the case of selective genotyping, where only individuals with extreme phenotypes are genotyped). We discuss the use of algorithms developed for hidden Markov models (HMMs) to deal with the missing genotype data problem.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In biostatistical applications interest often focuses on the estimation of the distribution of a time-until-event variable T. If one observes whether or not T exceeds an observed monitoring time at a random number of monitoring times, then the data structure is called interval censored data. We extend this data structure by allowing the presence of a possibly time-dependent covariate process that is observed until end of follow up. If one only assumes that the censoring mechanism satisfies coarsening at random, then, by the curve of dimensionality, typically no regular estimators will exist. To fight the curse of dimensionality we follow the approach of Robins and Rotnitzky (1992) by modeling parameters of the censoring mechanism. We model the right-censoring mechanism by modeling the hazard of the follow up time, conditional on T and the covariate process. For the monitoring mechanism we avoid modeling the joint distribution of the monitoring times by only modeling a univariate hazard of the pooled monitoring times, conditional on the follow up time, T, and the covariates process, which can be estimated by treating the pooled sample of monitoring times as i.i.d. In particular, it is assumed that the monitoring times and the right-censoring times only depend on T through the observed covariate process. We introduce inverse probability of censoring weighted (IPCW) estimator of the distribution of T and of smooth functionals thereof which are guaranteed to be consistent and asymptotically normal if we have available correctly specified semiparametric models for the two hazards of the censoring process. Furthermore, given such correctly specified models for these hazards of the censoring process, we propose a one-step estimator which will improve on the IPCW estimator if we correctly specify a lower-dimensional working model for the conditional distribution of T, given the covariate process, that remains consistent and asymptotically normal if this latter working model is misspecified. It is shown that the one-step estimator is efficient if each subject is at most monitored once and the working model contains the truth. In general, it is shown that the one-step estimator optimally uses the surrogate information if the working model contains the truth. It is not optimal in using the interval information provided by the current status indicators at the monitoring times, but simulations in Peterson, van der Laan (1997) show that the efficiency loss is small.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In biostatistical applications, interest often focuses on the estimation of the distribution of time T between two consecutive events. If the initial event time is observed and the subsequent event time is only known to be larger or smaller than an observed monitoring time, then the data is described by the well known singly-censored current status model, also known as interval censored data, case I. We extend this current status model by allowing the presence of a time-dependent process, which is partly observed and allowing C to depend on T through the observed part of this time-dependent process. Because of the high dimension of the covariate process, no globally efficient estimators exist with a good practical performance at moderate sample sizes. We follow the approach of Robins and Rotnitzky (1992) by modeling the censoring variable, given the time-variable and the covariate-process, i.e., the missingness process, under the restriction that it satisfied coarsening at random. We propose a generalization of the simple current status estimator of the distribution of T and of smooth functionals of the distribution of T, which is based on an estimate of the missingness. In this estimator the covariates enter only through the estimate of the missingness process. Due to the coarsening at random assumption, the estimator has the interesting property that if we estimate the missingness process more nonparametrically, then we improve its efficiency. We show that by local estimation of an optimal model or optimal function of the covariates for the missingness process, the generalized current status estimator for smooth functionals become locally efficient; meaning it is efficient if the right model or covariate is consistently estimated and it is consistent and asymptotically normal in general. Estimation of the optimal model requires estimation of the conditional distribution of T, given the covariates. Any (prior) knowledge of this conditional distribution can be used at this stage without any risk of losing root-n consistency. We also propose locally efficient one step estimators. Finally, we show some simulation results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In many applications the observed data can be viewed as a censored high dimensional full data random variable X. By the curve of dimensionality it is typically not possible to construct estimators that are asymptotically efficient at every probability distribution in a semiparametric censored data model of such a high dimensional censored data structure. We provide a general method for construction of one-step estimators that are efficient at a chosen submodel of the full-data model, are still well behaved off this submodel and can be chosen to always improve on a given initial estimator. These one-step estimators rely on good estimators of the censoring mechanism and thus will require a parametric or semiparametric model for the censoring mechanism. We present a general theorem that provides a template for proving the desired asymptotic results. We illustrate the general one-step estimation methods by constructing locally efficient one-step estimators of marginal distributions and regression parameters with right-censored data, current status data and bivariate right-censored data, in all models allowing the presence of time-dependent covariates. The conditions of the asymptotics theorem are rigorously verified in one of the examples and the key condition of the general theorem is verified for all examples.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we propose methods for smooth hazard estimation of a time variable where that variable is interval censored. These methods allow one to model the transformed hazard in terms of either smooth (smoothing splines) or linear functions of time and other relevant time varying predictor variables. We illustrate the use of this method on a dataset of hemophiliacs where the outcome, time to seroconversion for HIV, is interval censored and left-truncated.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper discusses estimation of the tumor incidence rate, the death rate given tumor is present and the death rate given tumor is absent using a discrete multistage model. The model was originally proposed by Dewanji and Kalbfleisch (1986) and the maximum likelihood estimate of the tumor incidence rate was obtained using EM algorithm. In this paper, we use a reparametrization to simplify the estimation procedure. The resulting estimates are not always the same as the maximum likelihood estimates but are asymptotically equivalent. In addition, an explicit expression for asymptotic variance and bias of the proposed estimators is also derived. These results can be used to compare efficiency of different sacrifice schemes in carcinogenicity experiments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In recent years, researchers in the health and social sciences have become increasingly interested in mediation analysis. Specifically, upon establishing a non-null total effect of an exposure, investigators routinely wish to make inferences about the direct (indirect) pathway of the effect of the exposure not through (through) a mediator variable that occurs subsequently to the exposure and prior to the outcome. Natural direct and indirect effects are of particular interest as they generally combine to produce the total effect of the exposure and therefore provide insight on the mechanism by which it operates to produce the outcome. A semiparametric theory has recently been proposed to make inferences about marginal mean natural direct and indirect effects in observational studies (Tchetgen Tchetgen and Shpitser, 2011), which delivers multiply robust locally efficient estimators of the marginal direct and indirect effects, and thus generalizes previous results for total effects to the mediation setting. In this paper we extend the new theory to handle a setting in which a parametric model for the natural direct (indirect) effect within levels of pre-exposure variables is specified and the model for the observed data likelihood is otherwise unrestricted. We show that estimation is generally not feasible in this model because of the curse of dimensionality associated with the required estimation of auxiliary conditional densities or expectations, given high-dimensional covariates. We thus consider multiply robust estimation and propose a more general model which assumes a subset but not all of several working models holds.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper introduces a novel approach to making inference about the regression parameters in the accelerated failure time (AFT) model for current status and interval censored data. The estimator is constructed by inverting a Wald type test for testing a null proportional hazards model. A numerically efficient Markov chain Monte Carlo (MCMC) based resampling method is proposed to simultaneously obtain the point estimator and a consistent estimator of its variance-covariance matrix. We illustrate our approach with interval censored data sets from two clinical studies. Extensive numerical studies are conducted to evaluate the finite sample performance of the new estimators.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Generalized linear mixed models (GLMMs) provide an elegant framework for the analysis of correlated data. Due to the non-closed form of the likelihood, GLMMs are often fit by computational procedures like penalized quasi-likelihood (PQL). Special cases of these models are generalized linear models (GLMs), which are often fit using algorithms like iterative weighted least squares (IWLS). High computational costs and memory space constraints often make it difficult to apply these iterative procedures to data sets with very large number of cases. This paper proposes a computationally efficient strategy based on the Gauss-Seidel algorithm that iteratively fits sub-models of the GLMM to subsetted versions of the data. Additional gains in efficiency are achieved for Poisson models, commonly used in disease mapping problems, because of their special collapsibility property which allows data reduction through summaries. Convergence of the proposed iterative procedure is guaranteed for canonical link functions. The strategy is applied to investigate the relationship between ischemic heart disease, socioeconomic status and age/gender category in New South Wales, Australia, based on outcome data consisting of approximately 33 million records. A simulation study demonstrates the algorithm's reliability in analyzing a data set with 12 million records for a (non-collapsible) logistic regression model.

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

DNA sequence copy number has been shown to be associated with cancer development and progression. Array-based Comparative Genomic Hybridization (aCGH) is a recent development that seeks to identify the copy number ratio at large numbers of markers across the genome. Due to experimental and biological variations across chromosomes and across hybridizations, current methods are limited to analyses of single chromosomes. We propose a more powerful approach that borrows strength across chromosomes and across hybridizations. We assume a Gaussian mixture model, with a hidden Markov dependence structure, and with random effects to allow for intertumoral variation, as well as intratumoral clonal variation. For ease of computation, we base estimation on a pseudolikelihood function. The method produces quantitative assessments of the likelihood of genetic alterations at each clone, along with a graphical display for simple visual interpretation. We assess the characteristics of the method through simulation studies and through analysis of a brain tumor aCGH data set. We show that the pseudolikelihood approach is superior to existing methods both in detecting small regions of copy number alteration and in accurately classifying regions of change when intratumoral clonal variation is present.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the simultaneous estimation of a large number of related quantities, multilevel models provide a formal mechanism for efficiently making use of the ensemble of information for deriving individual estimates. In this article we investigate the ability of the likelihood to identify the relationship between signal and noise in multilevel linear mixed models. Specifically, we consider the ability of the likelihood to diagnose conjugacy or independence between the signals and noises. Our work was motivated by the analysis of data from high-throughput experiments in genomics. The proposed model leads to a more flexible family. However, we further demonstrate that adequately capitalizing on the benefits of a well fitting fully-specified likelihood in the terms of gene ranking is difficult.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Numerous time series studies have provided strong evidence of an association between increased levels of ambient air pollution and increased levels of hospital admissions, typically at 0, 1, or 2 days after an air pollution episode. An important research aim is to extend existing statistical models so that a more detailed understanding of the time course of hospitalization after exposure to air pollution can be obtained. Information about this time course, combined with prior knowledge about biological mechanisms, could provide the basis for hypotheses concerning the mechanism by which air pollution causes disease. Previous studies have identified two important methodological questions: (1) How can we estimate the shape of the distributed lag between increased air pollution exposure and increased mortality or morbidity? and (2) How should we estimate the cumulative population health risk from short-term exposure to air pollution? Distributed lag models are appropriate tools for estimating air pollution health effects that may be spread over several days. However, estimation for distributed lag models in air pollution and health applications is hampered by the substantial noise in the data and the inherently weak signal that is the target of investigation. We introduce an hierarchical Bayesian distributed lag model that incorporates prior information about the time course of pollution effects and combines information across multiple locations. The model has a connection to penalized spline smoothing using a special type of penalty matrix. We apply the model to estimating the distributed lag between exposure to particulate matter air pollution and hospitalization for cardiovascular and respiratory disease using data from a large United States air pollution and hospitalization database of Medicare enrollees in 94 counties covering the years 1999-2002.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Functional neuroimaging techniques enable investigations into the neural basis of human cognition, emotions, and behaviors. In practice, applications of functional magnetic resonance imaging (fMRI) have provided novel insights into the neuropathophysiology of major psychiatric,neurological, and substance abuse disorders, as well as into the neural responses to their treatments. Modern activation studies often compare localized task-induced changes in brain activity between experimental groups. One may also extend voxel-level analyses by simultaneously considering the ensemble of voxels constituting an anatomically defined region of interest (ROI) or by considering means or quantiles of the ROI. In this work we present a Bayesian extension of voxel-level analyses that offers several notable benefits. First, it combines whole-brain voxel-by-voxel modeling and ROI analyses within a unified framework. Secondly, an unstructured variance/covariance for regional mean parameters allows for the study of inter-regional functional connectivity, provided enough subjects are available to allow for accurate estimation. Finally, an exchangeable correlation structure within regions allows for the consideration of intra-regional functional connectivity. We perform estimation for our model using Markov Chain Monte Carlo (MCMC) techniques implemented via Gibbs sampling which, despite the high throughput nature of the data, can be executed quickly (less than 30 minutes). We apply our Bayesian hierarchical model to two novel fMRI data sets: one considering inhibitory control in cocaine-dependent men and the second considering verbal memory in subjects at high risk for Alzheimer’s disease. The unifying hierarchical model presented in this manuscript is shown to enhance the interpretation content of these data sets.