827 resultados para Causal inference
Resumo:
"Most quantitative empirical analyses are motivated by the desire to estimate the causal effect of an independent variable on a dependent variable. Although the randomized experiment is the most powerful design for this task, in most social science research done outside of psychology, experimental designs are infeasible. (Winship & Morgan, 1999, p. 659)." This quote from earlier work by Winship and Morgan, which was instrumental in setting the groundwork for their book, captures the essence of our review of Morgan and Winship's book: It is about causality in nonexperimental settings.
Resumo:
Resumen tomado de la publicaci??n
Resumo:
In this paper, we consider estimation of the causal effect of a treatment on an outcome from observational data collected in two phases. In the first phase, a simple random sample of individuals are drawn from a population. On these individuals, information is obtained on treatment, outcome, and a few low-dimensional confounders. These individuals are then stratified according to these factors. In the second phase, a random sub-sample of individuals are drawn from each stratum, with known, stratum-specific selection probabilities. On these individuals, a rich set of confounding factors are collected. In this setting, we introduce four estimators: (1) simple inverse weighted, (2) locally efficient, (3) doubly robust and (4)enriched inverse weighted. We evaluate the finite-sample performance of these estimators in a simulation study. We also use our methodology to estimate the causal effect of trauma care on in-hospital mortality using data from the National Study of Cost and Outcomes of Trauma.
Resumo:
Causal inference with a continuous treatment is a relatively under-explored problem. In this dissertation, we adopt the potential outcomes framework. Potential outcomes are responses that would be seen for a unit under all possible treatments. In an observational study where the treatment is continuous, the potential outcomes are an uncountably infinite set indexed by treatment dose. We parameterize this unobservable set as a linear combination of a finite number of basis functions whose coefficients vary across units. This leads to new techniques for estimating the population average dose-response function (ADRF). Some techniques require a model for the treatment assignment given covariates, some require a model for predicting the potential outcomes from covariates, and some require both. We develop these techniques using a framework of estimating functions, compare them to existing methods for continuous treatments, and simulate their performance in a population where the ADRF is linear and the models for the treatment and/or outcomes may be misspecified. We also extend the comparisons to a data set of lottery winners in Massachusetts. Next, we describe the methods and functions in the R package causaldrf using data from the National Medical Expenditure Survey (NMES) and Infant Health and Development Program (IHDP) as examples. Additionally, we analyze the National Growth and Health Study (NGHS) data set and deal with the issue of missing data. Lastly, we discuss future research goals and possible extensions.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
Whether a statistician wants to complement a probability model for observed data with a prior distribution and carry out fully probabilistic inference, or base the inference only on the likelihood function, may be a fundamental question in theory, but in practice it may well be of less importance if the likelihood contains much more information than the prior. Maximum likelihood inference can be justified as a Gaussian approximation at the posterior mode, using flat priors. However, in situations where parametric assumptions in standard statistical models would be too rigid, more flexible model formulation, combined with fully probabilistic inference, can be achieved using hierarchical Bayesian parametrization. This work includes five articles, all of which apply probability modeling under various problems involving incomplete observation. Three of the papers apply maximum likelihood estimation and two of them hierarchical Bayesian modeling. Because maximum likelihood may be presented as a special case of Bayesian inference, but not the other way round, in the introductory part of this work we present a framework for probability-based inference using only Bayesian concepts. We also re-derive some results presented in the original articles using the toolbox equipped herein, to show that they are also justifiable under this more general framework. Here the assumption of exchangeability and de Finetti's representation theorem are applied repeatedly for justifying the use of standard parametric probability models with conditionally independent likelihood contributions. It is argued that this same reasoning can be applied also under sampling from a finite population. The main emphasis here is in probability-based inference under incomplete observation due to study design. This is illustrated using a generic two-phase cohort sampling design as an example. The alternative approaches presented for analysis of such a design are full likelihood, which utilizes all observed information, and conditional likelihood, which is restricted to a completely observed set, conditioning on the rule that generated that set. Conditional likelihood inference is also applied for a joint analysis of prevalence and incidence data, a situation subject to both left censoring and left truncation. Other topics covered are model uncertainty and causal inference using posterior predictive distributions. We formulate a non-parametric monotonic regression model for one or more covariates and a Bayesian estimation procedure, and apply the model in the context of optimal sequential treatment regimes, demonstrating that inference based on posterior predictive distributions is feasible also in this case.
Resumo:
© Institute of Mathematical Statistics, 2014.Motivated by recent findings in the field of consumer science, this paper evaluates the causal effect of debit cards on household consumption using population-based data from the Italy Survey on Household Income and Wealth (SHIW). Within the Rubin Causal Model, we focus on the estimand of population average treatment effect for the treated (PATT). We consider three existing estimators, based on regression, mixed matching and regression, propensity score weighting, and propose a new doubly-robust estimator. Semiparametric specification based on power series for the potential outcomes and the propensity score is adopted. Cross-validation is used to select the order of the power series. We conduct a simulation study to compare the performance of the estimators. The key assumptions, overlap and unconfoundedness, are systematically assessed and validated in the application. Our empirical results suggest statistically significant positive effects of debit cards on the monthly household spending in Italy.
Resumo:
We study semiparametric two-step estimators which have the same structure as parametric doubly robust estimators in their second step. The key difference is that we do not impose any parametric restriction on the nuisance functions that are estimated in a first stage, but retain a fully nonparametric model instead. We call these estimators semiparametric doubly robust estimators (SDREs), and show that they possess superior theoretical and practical properties compared to generic semiparametric two-step estimators. In particular, our estimators have substantially smaller first-order bias, allow for a wider range of nonparametric first-stage estimates, rate-optimal choices of smoothing parameters and data-driven estimates thereof, and their stochastic behavior can be well-approximated by classical first-order asymptotics. SDREs exist for a wide range of parameters of interest, particularly in semiparametric missing data and causal inference models. We illustrate our method with a simulation exercise.
Resumo:
The affected sib/relative pair (ASP/ARP) design is often used with covariates to find genes that can cause a disease in pathways other than through those covariates. However, such "covariates" can themselves have genetic determinants, and the validity of existing methods has so far only been argued under implicit assumptions. We propose an explicit causal formulation of the problem using potential outcomes and principal stratification. The general role of this formulation is to identify and separate the meaning of the different assumptions that can provide valid causal inference in linkage analysis. This separation helps to (a) develop better methods under explicit assumptions, and (b) show the different ways in which these assumptions can fail, which is necessary for developing further specific designs to test these assumptions and confirm or improve the inference. Using this formulation in the specific problem above, we show that, when the "covariate" (e.g., addiction to smoking) also has genetic determinants, then existing methods, including those previously thought as valid, can declare linkage between the disease and marker loci even when no such linkage exists. We also introduce design strategies to address the problem.
Resumo:
OBJECTIVE: To demonstrate the application of causal inference methods to observational data in the obstetrics and gynecology field, particularly causal modeling and semi-parametric estimation. BACKGROUND: Human immunodeficiency virus (HIV)-positive women are at increased risk for cervical cancer and its treatable precursors. Determining whether potential risk factors such as hormonal contraception are true causes is critical for informing public health strategies as longevity increases among HIV-positive women in developing countries. METHODS: We developed a causal model of the factors related to combined oral contraceptive (COC) use and cervical intraepithelial neoplasia 2 or greater (CIN2+) and modified the model to fit the observed data, drawn from women in a cervical cancer screening program at HIV clinics in Kenya. Assumptions required for substantiation of a causal relationship were assessed. We estimated the population-level association using semi-parametric methods: g-computation, inverse probability of treatment weighting, and targeted maximum likelihood estimation. RESULTS: We identified 2 plausible causal paths from COC use to CIN2+: via HPV infection and via increased disease progression. Study data enabled estimation of the latter only with strong assumptions of no unmeasured confounding. Of 2,519 women under 50 screened per protocol, 219 (8.7%) were diagnosed with CIN2+. Marginal modeling suggested a 2.9% (95% confidence interval 0.1%, 6.9%) increase in prevalence of CIN2+ if all women under 50 were exposed to COC; the significance of this association was sensitive to method of estimation and exposure misclassification. CONCLUSION: Use of causal modeling enabled clear representation of the causal relationship of interest and the assumptions required to estimate that relationship from the observed data. Semi-parametric estimation methods provided flexibility and reduced reliance on correct model form. Although selected results suggest an increased prevalence of CIN2+ associated with COC, evidence is insufficient to conclude causality. Priority areas for future studies to better satisfy causal criteria are identified.