5 resultados para Confirmation bias
em DigitalCommons@The Texas Medical Center
Resumo:
Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^
Resumo:
Additive and multiplicative models of relative risk were used to measure the effect of cancer misclassification and DS86 random errors on lifetime risk projections in the Life Span Study (LSS) of Hiroshima and Nagasaki atomic bomb survivors. The true number of cancer deaths in each stratum of the cancer mortality cross-classification was estimated using sufficient statistics from the EM algorithm. Average survivor doses in the strata were corrected for DS86 random error ($\sigma$ = 0.45) by use of reduction factors. Poisson regression was used to model the corrected and uncorrected mortality rates with covariates for age at-time-of-bombing, age at-time-of-death and gender. Excess risks were in good agreement with risks in RERF Report 11 (Part 2) and the BEIR-V report. Bias due to DS86 random error typically ranged from $-$15% to $-$30% for both sexes, and all sites and models. The total bias, including diagnostic misclassification, of excess risk of nonleukemia for exposure to 1 Sv from age 18 to 65 under the non-constant relative projection model was $-$37.1% for males and $-$23.3% for females. Total excess risks of leukemia under the relative projection model were biased $-$27.1% for males and $-$43.4% for females. Thus, nonleukemia risks for 1 Sv from ages 18 to 85 (DRREF = 2) increased from 1.91%/Sv to 2.68%/Sv among males and from 3.23%/Sv to 4.02%/Sv among females. Leukemia excess risks increased from 0.87%/Sv to 1.10%/Sv among males and from 0.73%/Sv to 1.04%/Sv among females. Bias was dependent on the gender, site, correction method, exposure profile and projection model considered. Future studies that use LSS data for U.S. nuclear workers may be downwardly biased if lifetime risk projections are not adjusted for random and systematic errors. (Supported by U.S. NRC Grant NRC-04-091-02.) ^
Resumo:
This study establishes the extent and relevance of bias of population estimates of prevalence, incidence, and intensity of infection with Schistosoma mansoni caused by the relative sensitivity of stool examination techniques. The population studied was Parcelas de Boqueron in Las Piedras, Puerto Rico, where the Centers for Disease Control, had undertaken a prospective community-based study of infection with S. mansoni in 1972. During each January of the succeeding years stool specimens from this population were processed according to the modified Ritchie concentration (MRC) technique. During January 1979 additional stool specimens were collected from 30 individuals selected on the basis of their mean S. mansoni egg output during previous years. Each specimen was divided into ten 1-gm aliquots and three 42-mg aliquots. The relationship of egg counts obtained with the Kato-Katz (KK) thick smear technique as a function of the mean of ten counts obtained with the MRC technique was established by means of regression analysis. Additionally, the effect of fecal sample size and egg excretion level on technique sensitivity was evaluated during a blind assessment of single stool specimen samples, using both examination methods, from 125 residents with documented S. mansoni infections. The regression equation was: Ln KK = 2.3324 + 0.6319 Ln MRC, and the coefficient of determination (r('2)) was 0.73. The regression equation was then utilized to correct the term "m" for sample size in the expression P ((GREATERTHEQ) 1 egg) = 1 - e('-ms), which estimates the probability P of finding at least one egg as a function of the mean S. mansoni egg output "m" of the population and the effective stool sample size "s" utilized by the coprological technique. This algorithm closely approximated the observed sensitivity of the KK and MRC tests when these were utilized to blindly screen a population of known parasitologic status for infection with S. mansoni. In addition, the algorithm was utilized to adjust the apparent prevalence of infection for the degree of functional sensitivity exhibited by the diagnostic test. This permitted the estimation of true prevalence of infection and, hence, a means for correcting estimates of incidence of infection. ^
Resumo:
Of the large clinical trials evaluating screening mammography efficacy, none included women ages 75 and older. Recommendations on an upper age limit at which to discontinue screening are based on indirect evidence and are not consistent. Screening mammography is evaluated using observational data from the SEER-Medicare linked database. Measuring the benefit of screening mammography is difficult due to the impact of lead-time bias, length bias and over-detection. The underlying conceptual model divides the disease into two stages: pre-clinical (T0) and symptomatic (T1) breast cancer. Treating the time in these phases as a pair of dependent bivariate observations, (t0,t1), estimates are derived to describe the distribution of this random vector. To quantify the effect of screening mammography, statistical inference is made about the mammography parameters that correspond to the marginal distribution of the symptomatic phase duration (T1). This shows the hazard ratio of death from breast cancer comparing women with screen-detected tumors to those detected at their symptom onset is 0.36 (0.30, 0.42), indicating a benefit among the screen-detected cases. ^