972 resultados para Selection Bias
Resumo:
In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue samples of known type containing the expression data on very many (possibly thousands) genes. Recently, results have been presented in the literature suggesting that it is possible to construct a prediction rule from only a few genes such that it has a negligible prediction error rate. However, in these results the test error or the leave-one-out cross-validated error is calculated without allowance for the selection bias. There is no allowance because the rule is either tested on tissue samples that were used in the first instance to select the genes being used in the rule or because the cross-validation of the rule is not external to the selection process; that is, gene selection is not performed in training the rule at each stage of the cross-validation process. We describe how in practice the selection bias can be assessed and corrected for by either performing a cross-validation or applying the bootstrap external to the selection process. We recommend using 10-fold rather than leave-one-out cross-validation, and concerning the bootstrap, we suggest using the so-called. 632+ bootstrap error estimate designed to handle overfitted prediction rules. Using two published data sets, we demonstrate that when correction is made for the selection bias, the cross-validated error is no longer zero for a subset of only a few genes.
Resumo:
This paper analyses intergenerational earnings mobility in Spain correcting for different selection biases. We address the co-residence selection problem by combining information from two samples and using the two-sample two-stage least square estimator. We find a small decrease in elasticity when we move to younger cohorts. Furthermore, we find a higher correlation in the case of daughters than in the case of sons; however, when we consider the employment selection in the case of daughters, by adopting a Heckman-type correction method, the diference between sons and daughters disappears. By decomposing the sources of earnings elasticity across generations, we find that the correlation between child's and father's occupation is the most important component. Finally, quantile regressions estimates show that the influence of the father's earnings is greater when we move to the lower tail of the offspring's earnings distribution, especially in the case of daughters' earnings.
Selection bias and unobservable heterogeneity applied at the wage equation of European married women
Resumo:
This paper utilizes a panel data sample selection model to correct the selection in the analysis of longitudinal labor market data for married women in European countries. We estimate the female wage equation in a framework of unbalanced panel data models with sample selection. The wage equations of females have several potential sources of.
Resumo:
PURPOSE OF REVIEW: Adherence to preventive measures and prescribed medications is the cornerstone of the successful management of hypertension. The role of adherence is particularly important when treatments are not providing the expected clinical results, for example, in patients with resistant hypertension. The goal of this article is to review the recent observations regarding drug adherence in resistant hypertension. RECENT FINDINGS: Today, the role of drug adherence as a potential cause of resistant hypertension is largely underestimated. Most studies suggest that a low adherence to the prescribed medications can affect up to 50% of patients with resistant hypertension.A good adherence to therapy is generally associated with an improved prognosis. Nonetheless, adherence should probably not be a target for treatment per se because data on adherence should always be interpreted in the view of clinical results. In our opinion, the availability of reliable data on drug adherence would be a major help for physicians to manage patients apparently resistant to therapy. SUMMARY: The actual development of new drugs for hypertension is slow. Thus, focusing on drug adherence to the drugs available is an important way to improve blood pressure control in the population. More emphasis should be put on measuring drug adherence in patients with resistant hypertension to avoid costly investigations and treatments.
Resumo:
We consider methods for estimating causal effects of treatment in the situation where the individuals in the treatment and the control group are self selected, i.e., the selection mechanism is not randomized. In this case, simple comparison of treated and control outcomes will not generally yield valid estimates of casual effects. The propensity score method is frequently used for the evaluation of treatment effect. However, this method is based onsome strong assumptions, which are not directly testable. In this paper, we present an alternative modeling approachto draw causal inference by using share random-effect model and the computational algorithm to draw likelihood based inference with such a model. With small numerical studies and a real data analysis, we show that our approach gives not only more efficient estimates but it is also less sensitive to model misspecifications, which we consider, than the existing methods.
Resumo:
This thesis presents a creative and practical approach to dealing with the problem of selection bias. Selection bias may be the most important vexing problem in program evaluation or in any line of research that attempts to assert causality. Some of the greatest minds in economics and statistics have scrutinized the problem of selection bias, with the resulting approaches – Rubin’s Potential Outcome Approach(Rosenbaum and Rubin,1983; Rubin, 1991,2001,2004) or Heckman’s Selection model (Heckman, 1979) – being widely accepted and used as the best fixes. These solutions to the bias that arises in particular from self selection are imperfect, and many researchers, when feasible, reserve their strongest causal inference for data from experimental rather than observational studies. The innovative aspect of this thesis is to propose a data transformation that allows measuring and testing in an automatic and multivariate way the presence of selection bias. The approach involves the construction of a multi-dimensional conditional space of the X matrix in which the bias associated with the treatment assignment has been eliminated. Specifically, we propose the use of a partial dependence analysis of the X-space as a tool for investigating the dependence relationship between a set of observable pre-treatment categorical covariates X and a treatment indicator variable T, in order to obtain a measure of bias according to their dependence structure. The measure of selection bias is then expressed in terms of inertia due to the dependence between X and T that has been eliminated. Given the measure of selection bias, we propose a multivariate test of imbalance in order to check if the detected bias is significant, by using the asymptotical distribution of inertia due to T (Estadella et al. 2005) , and by preserving the multivariate nature of data. Further, we propose the use of a clustering procedure as a tool to find groups of comparable units on which estimate local causal effects, and the use of the multivariate test of imbalance as a stopping rule in choosing the best cluster solution set. The method is non parametric, it does not call for modeling the data, based on some underlying theory or assumption about the selection process, but instead it calls for using the existing variability within the data and letting the data to speak. The idea of proposing this multivariate approach to measure selection bias and test balance comes from the consideration that in applied research all aspects of multivariate balance, not represented in the univariate variable- by-variable summaries, are ignored. The first part contains an introduction to evaluation methods as part of public and private decision process and a review of the literature of evaluation methods. The attention is focused on Rubin Potential Outcome Approach, matching methods, and briefly on Heckman’s Selection Model. The second part focuses on some resulting limitations of conventional methods, with particular attention to the problem of how testing in the correct way balancing. The third part contains the original contribution proposed , a simulation study that allows to check the performance of the method for a given dependence setting and an application to a real data set. Finally, we discuss, conclude and explain our future perspectives.
Resumo:
Whether the use of mobile phones is a risk factor for brain tumors in adolescents is currently being studied. Case--control studies investigating this possible relationship are prone to recall error and selection bias. We assessed the potential impact of random and systematic recall error and selection bias on odds ratios (ORs) by performing simulations based on real data from an ongoing case--control study of mobile phones and brain tumor risk in children and adolescents (CEFALO study). Simulations were conducted for two mobile phone exposure categories: regular and heavy use. Our choice of levels of recall error was guided by a validation study that compared objective network operator data with the self-reported amount of mobile phone use in CEFALO. In our validation study, cases overestimated their number of calls by 9% on average and controls by 34%. Cases also overestimated their duration of calls by 52% on average and controls by 163%. The participation rates in CEFALO were 83% for cases and 71% for controls. In a variety of scenarios, the combined impact of recall error and selection bias on the estimated ORs was complex. These simulations are useful for the interpretation of previous case-control studies on brain tumor and mobile phone use in adults as well as for the interpretation of future studies on adolescents.
Resumo:
Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^
Resumo:
This paper decomposes the conventional measure of selection bias in observational studies into three components. The first two components are due to differences in the distributions of characteristics between participant and nonparticipant (comparison) group members: the first arises from differences in the supports, and the second from differences in densities over the region of common support. The third component arises from selection bias precisely defined. Using data from a recent social experiment, we find that the component due to selection bias, precisely defined, is smaller than the first two components. However, selection bias still represents a substantial fraction of the experimental impact estimate. The empirical performance of matching methods of program evaluation is also examined. We find that matching based on the propensity score eliminates some but not all of the measured selection bias, with the remaining bias still a substantial fraction of the estimated impact. We find that the support of the distribution of propensity scores for the comparison group is typically only a small portion of the support for the participant group. For values outside the common support, it is impossible to reliably estimate the effect of program participation using matching methods. If the impact of participation depends on the propensity score, as we find in our data, the failure of the common support condition severely limits matching compared with random assignment as an evaluation estimator.
Resumo:
Thesis (Master's)--University of Washington, 2016-08