907 resultados para rank regression
Resumo:
Consider a nonparametric regression model Y=mu*(X) + e, where the explanatory variables X are endogenous and e satisfies the conditional moment restriction E[e|W]=0 w.p.1 for instrumental variables W. It is well known that in these models the structural parameter mu* is 'ill-posed' in the sense that the function mapping the data to mu* is not continuous. In this paper, we derive the efficiency bounds for estimating linear functionals E[p(X)mu*(X)] and int_{supp(X)}p(x)mu*(x)dx, where p is a known weight function and supp(X) the support of X, without assuming mu* to be well-posed or even identified.
Resumo:
The Data Envelopment Analysis (DEA) efficiency score obtained for an individual firm is a point estimate without any confidence interval around it. In recent years, researchers have resorted to bootstrapping in order to generate empirical distributions of efficiency scores. This procedure assumes that all firms have the same probability of getting an efficiency score from any specified interval within the [0,1] range. We propose a bootstrap procedure that empirically generates the conditional distribution of efficiency for each individual firm given systematic factors that influence its efficiency. Instead of resampling directly from the pooled DEA scores, we first regress these scores on a set of explanatory variables not included at the DEA stage and bootstrap the residuals from this regression. These pseudo-efficiency scores incorporate the systematic effects of unit-specific factors along with the contribution of the randomly drawn residual. Data from the U.S. airline industry are utilized in an empirical application.
Resumo:
The ordinal logistic regression models are used to analyze the dependant variable with multiple outcomes that can be ranked, but have been underutilized. In this study, we describe four logistic regression models for analyzing the ordinal response variable. ^ In this methodological study, the four regression models are proposed. The first model uses the multinomial logistic model. The second is adjacent-category logit model. The third is the proportional odds model and the fourth model is the continuation-ratio model. We illustrate and compare the fit of these models using data from the survey designed by the University of Texas, School of Public Health research project PCCaSO (Promoting Colon Cancer Screening in people 50 and Over), to study the patient’s confidence in the completion colorectal cancer screening (CRCS). ^ The purpose of this study is two fold: first, to provide a synthesized review of models for analyzing data with ordinal response, and second, to evaluate their usefulness in epidemiological research, with particular emphasis on model formulation, interpretation of model coefficients, and their implications. Four ordinal logistic models that are used in this study include (1) Multinomial logistic model, (2) Adjacent-category logistic model [9], (3) Continuation-ratio logistic model [10], (4) Proportional logistic model [11]. We recommend that the analyst performs (1) goodness-of-fit tests, (2) sensitivity analysis by fitting and comparing different models.^
Resumo:
Obesity prevalence among children and adolescents is rising. It is one of the most attributable causes of hospitalization and death. Overweight and obese children are more likely to suffer from associated conditions such as hypertension, dyslipidemia, chronic inflammation, increased blood clotting tendency, endothelial dysfunction, hyperinsulinemia, and asthma. These children and adolescents are also more likely to be overweight and obese in adulthood. Interestingly, rates of obesity and overweight are not evenly distributed across racial and ethnic groups. Mexican American youth have higher rates of obesity and are at higher risk of becoming obese than non-Hispanic black and non-Hispanic white children. ^ Methods. This cross-sectional study describes the association between rates of obesity and physical activity in a sample of 1313 inner-city Mexican American children and adolescents (5-19 years of age) in Houston, Texas. This study is important because it will contribute to our understanding of childhood and adolescent obesity in this at-risk population. ^ Data from the Mexican American Feasibility Cohort using the Mano a Mano questionnaire are used to describe this population's status of obesity and physical activity. An initial sample taken from 5000 households in inner city Houston Texas was used as the baseline for this prospective cohort. The questionnaire was given in person to the participants to complete (or to parents for younger children) at a home visit by two specially trained bilingual interviewers. Analysis comprised prevalence estimates of obesity represented as percentile rank (<85%= normal weight, >85%= at risk, >95%= obese) by age and gender. The association between light, moderate, strenuous activity, and obesity was also examined using linear regression. ^ Results. Overall, 46% of this Mexican American Feasibility cohort is overweight or obese. The prevalence for children in the 6-11 age range (53.2%) was significantly greater than that reported from NHANES, 1999–2002 data (39.4%). Although the percentage of overweight and obese among the 12-19 year olds was greater than that reported in NHANES (38.5% versus 38.6%) this difference was not statistically significant. ^ A significant association between BMI and sit time and moderate physical activity (both p < 0.05) found in this sample. For males, this association was significant for moderate physical activity (p < 0.01). For the females, this association was significant for BMI and sit time (p < 0.05). These results need to be interpreted in the light of design and measurement limitations. ^ Conclusion. This study supports observations that the inner city Houston Texas Mexican American child and adolescent population is more overweight and obese than nationally reported figures, and that there are positive relationships between BMI, activity levels, and sit time in this population. This study supports the need for public health initiatives within the Houston Hispanic community. ^
Resumo:
Objectives. Previous studies have shown a survival advantage in ovarian cancer patients with Ashkenazi-Jewish (AJ) BRCA founder mutations, compared to sporadic ovarian cancer patients. The purpose of this study was to determine if this association exists in ovarian cancer patients with non-Ashkenazi Jewish BRCA mutations. In addition, we sought to account for possible "survival bias" by minimizing any lead time that may exist between diagnosis and genetic testing. ^ Methods. Patients with stage III/IV ovarian, fallopian tube, or primary peritoneal cancer and a non-Ashkenazi Jewish BRCA1 or 2 mutation, seen for genetic testing January 1996-July 2007, were identified from genetics and institutional databases. Medical records were reviewed for clinical factors, including response to initial chemotherapy. Patients with sporadic (non-hereditary) ovarian, fallopian tube, or primary peritoneal cancer, without family history of breast or ovarian cancer, were compared to similar cases, matched by age, stage, year of diagnosis, and vital status at time interval to BRCA testing. When possible, 2 sporadic patients were matched to each BRCA patient. An additional group of unmatched, sporadic ovarian, fallopian tube and primary peritoneal cancer patients was included for a separate analysis. Progression-free (PFS) & overall survival (OS) were calculated by the Kaplan-Meier method. Multivariate Cox proportional hazards models were calculated for variables of interest. Matched pairs were treated as clusters. Stratified log rank test was used to calculate survival data for matched pairs using paired event times. Fisher's exact test, chi-square, and univariate logistic regression were also used for analysis. ^ Results. Forty five advanced-stage ovarian, fallopian tube and primary peritoneal cancer patients with non-Ashkenazi Jewish (non-AJ) BRCA mutations, 86 sporadic-matched and 414 sporadic-unmatched patients were analyzed. Compared to the sporadic-matched and sporadic-unmatched ovarian cancer patients, non-AJ BRCA mutation carriers had longer PFS (17.9 & 13.8 mos. vs. 32.0 mos., HR 1.76 [95% CI 1.13–2.75] & 2.61 [95% CI 1.70–4.00]). In relation to the sporadic- unmatched patients, non-AJ BRCA patients had greater odds of complete response to initial chemotherapy (OR 2.25 [95% CI 1.17–5.41]) and improved OS (37.6 mos. vs. 101.4 mos., HR 2.64 [95% CI 1.49–4.67]). ^ Conclusions. This study demonstrates a significant survival advantage in advanced-stage ovarian cancer patients with non-AJ BRCA mutations, confirming the previous studies in the Jewish population. Our efforts to account for "survival bias," by matching, will continue with collaborative studies. ^
Resumo:
Introduction and objective. A number of prognostic factors have been reported for predicting survival in patients with renal cell carcinoma. Yet few studies have analyzed the effects of those factors at different stages of the disease process. In this study, different stages of disease progression starting from nephrectomy to metastasis, from metastasis to death, and from evaluation to death were evaluated. ^ Methods. In this retrospective follow-up study, records of 97 deceased renal cell carcinoma (RCC) patients were reviewed between September 2006 to October 2006. Patients with TNM Stage IV disease before nephrectomy or with cancer diagnoses other than RCC were excluded leaving 64 records for analysis. Patient TNM staging, Furhman Grade, age, tumor size, tumor volume, histology and patient gender were analyzed in relation to time to metastases. Time from nephrectomy to metastasis, TNM staging, Furhman Grade, age, tumor size, tumor volume, histology and patient gender were tested for significance in relation to time from metastases to death. Finally, analysis of laboratory values at time of evaluation, Eastern Cooperative Oncology Group performance status (ECOG), UCLA Integrated Staging System (UISS), time from nephrectomy to metastasis, TNM staging, Furhman Grade, age, tumor size, tumor volume, histology and patient gender were tested for significance in relation to time from evaluation to death. Linear regression and Cox Proportional Hazard (univariate and multivariate) was used for testing significance. Kaplan-Meier Log-Rank test was used to detect any significance between groups at various endpoints. ^ Results. Compared to negative lymph nodes at time of nephrectomy, a single positive lymph node had significantly shorter time to metastasis (p<0.0001). Compared to other histological types, clear cell histology had significant metastasis free survival (p=0.003). Clear cell histology compared to other types (p=0.0002 univariate, p=0.038 multivariate) and time to metastasis with log conversion (p=0.028) significantly affected time from metastasis to death. A greater than one year and greater than two year metastasis free interval, compared to patients that had metastasis before one and two years, had statistically significant survival benefit (p=0.004 and p=0.0318). Time from evaluation to death was affected by greater than one year metastasis free interval (p=0.0459), alcohol consumption (p=0.044), LDH (p=0.006), ECOG performance status (p<0.001), and hemoglobin level (p=0.0092). The UISS risk stratified the patient population in a statistically significant manner for survival (p=0.001). No other factors were found to be significant. ^ Conclusion. Clear cell histology is predictive for both time to metastasis and metastasis to death. Nodal status at time of nephrectomy may predict risk of metastasis. The time interval to metastasis significantly predicts time from metastasis to death and time from evaluation to death. ECOG performance status, and hemoglobin levels predicts survival outcome at evaluation. Finally, UISS appropriately stratifies risk in our population. ^
Resumo:
Dialysis patients are at high risk for hepatitis B infection, which is a serious but preventable disease. Prevention strategies include the administration of the hepatitis B vaccine. Dialysis patients have been noted to have a poor immune response to the vaccine and lose immunity more rapidly. The long term immunogenicity of the hepatitis B vaccine has not been well defined in pediatric dialysis patients especially if administered during infancy as a routine childhood immunization.^ Purpose. The aim of this study was to determine the median duration of hepatitis B immunity and to study the effect of vaccination timing and other cofactors on the duration of hepatitis B immunity in pediatric dialysis patients.^ Methods. Duration of hepatitis B immunity was determined by Kaplan-Meier survival analysis. Comparison of stratified survival analysis was performed using log-rank analysis. Multivariate analysis by Cox regression was used to estimate hazard ratios for the effect of timing of vaccine administration and other covariates on the duration of hepatitis B immunity.^ Results. 193 patients (163 incident patients) had complete data available for analysis. Mean age was 11.2±5.8 years and mean ESRD duration was 59.3±97.8 months. Kaplan-Meier analysis showed that the total median overall duration of immunity (since the time of the primary vaccine series) was 112.7 months (95% CI: 96.6, 124.4), whereas the median overall duration of immunity for incident patients was 106.3 months (95% CI: 93.93, 124.44). Incident patients had a median dialysis duration of hepatitis B immunity equal to 37.1 months (95% CI: 24.16, 72.26). Multivariate adjusted analysis showed that there was a significant difference between patients based on the timing of hepatitis B vaccination administration (p<0.001). Patients immunized after the start of dialysis had a hazard ratio of 6.13 (2.87, 13.08) for loss of hepatitis B immunity compared to patients immunized as infants (p<0.001).^ Conclusion. This study confirms that patients immunized after dialysis onset have an overall shorter duration of hepatitis B immunity as measured by hepatitis B antibody titers and after the start of dialysis, protective antibody titer levels in pediatric dialysis patients wane rapidly compared to healthy children.^
Resumo:
Ordinal outcomes are frequently employed in diagnosis and clinical trials. Clinical trials of Alzheimer's disease (AD) treatments are a case in point using the status of mild, moderate or severe disease as outcome measures. As in many other outcome oriented studies, the disease status may be misclassified. This study estimates the extent of misclassification in an ordinal outcome such as disease status. Also, this study estimates the extent of misclassification of a predictor variable such as genotype status. An ordinal logistic regression model is commonly used to model the relationship between disease status, the effect of treatment, and other predictive factors. A simulation study was done. First, data based on a set of hypothetical parameters and hypothetical rates of misclassification was created. Next, the maximum likelihood method was employed to generate likelihood equations accounting for misclassification. The Nelder-Mead Simplex method was used to solve for the misclassification and model parameters. Finally, this method was applied to an AD dataset to detect the amount of misclassification present. The estimates of the ordinal regression model parameters were close to the hypothetical parameters. β1 was hypothesized at 0.50 and the mean estimate was 0.488, β2 was hypothesized at 0.04 and the mean of the estimates was 0.04. Although the estimates for the rates of misclassification of X1 were not as close as β1 and β2, they validate this method. X 1 0-1 misclassification was hypothesized as 2.98% and the mean of the simulated estimates was 1.54% and, in the best case, the misclassification of k from high to medium was hypothesized at 4.87% and had a sample mean of 3.62%. In the AD dataset, the estimate for the odds ratio of X 1 of having both copies of the APOE 4 allele changed from an estimate of 1.377 to an estimate 1.418, demonstrating that the estimates of the odds ratio changed when the analysis includes adjustment for misclassification. ^
Resumo:
U.S. Military personnel are more likely to use smokeless tobacco than civilians. The purpose of this study was to describe the relationship between smokeless tobacco use and sociodemographic, behavioral, and occupational variables, using data from the 2005 Department of Defense Survey of Health Related Behaviors among Active Duty Military Personnel. The DoD survey was comprised of representative active duty U.S. military members (N=16,146). In adjusted multivariate logistic regression models, this study found smokeless tobacco use to be more prevalent in younger age, males, whites, and enlisted-rank members. By service, higher rates were reported among members of the Army and Marine Corps than among the Air Force and Navy members. Smokeless tobacco use among those who also smoke or drink heavily was also much higher than among those who did not report smoking or heavy alcohol use. Results also showed increased prevalence of smokeless tobacco use among those who reported moderate or high impulsive behavior and among those who recently deployed. These findings contribute to improving the understanding of factors related to smokeless tobacco use in the military and may help design strategies to reduce the use of this potentially toxic substance and improve health for military members.^
Resumo:
Bladder cancer is the fourth most common cancer in men in the United States. There is compelling evidence supporting that genetic variations contribute to the risk and outcomes of bladder cancer. The PI3K-AKT-mTOR pathway is a major cellular pathway involved in proliferation, invasion, inflammation, tumorigenesis, and drug response. Somatic aberrations of PI3K-AKT-mTOR pathway are frequent events in several cancers including bladder cancer; however, no studies have investigated the role of germline genetic variations in this pathway in bladder cancer. In this project, we used a large case control study to evaluate the associations of a comprehensive catalogue of SNPs in this pathway with bladder cancer risk and outcomes. Three SNPs in RAPTOR were significantly associated with susceptibility: rs11653499 (OR: 1.79, 95%CI: 1.24–2.60), rs7211818 (OR: 2.13, 95%CI: 1.35–3.36), and rs7212142 (OR: 1.57, 95%CI: 1.19–2.07). Two haplotypes constructed from these 3 SNPs were also associated with bladder cancer risk. In combined analysis, a significant trend was observed for increased risk with an increase in the number of unfavorable genotypes (P for trend<0.001). Classification and regression tree analysis identified potential gene-environment interactions between RPS6KA5 rs11653499 and smoking. In superficial bladder cancer, we found that PTEN rs1234219 and rs11202600, TSC1 rs7040593, RAPTOR rs901065, and PIK3R1 rs251404 were significantly associated with recurrence in patients receiving BCG. In muscle invasive and metastatic bladder cancer, AKT2 rs3730050, PIK3R1 rs10515074, and RAPTOR rs9906827 were associated with survival. Survival tree analysis revealed potential gene-gene interactions: patients carrying the unfavorable genotypes of PTEN rs1234219 and TSC1 rs704059 exhibited a 5.24-fold (95% CI: 2.44–11.24) increased risk of recurrence. In combined analysis, with the increasing number of unfavorable genotypes, there was a significant trend of higher risk of recurrence and death (P for trend<0.001) in Cox proportional hazard regression analysis, and shorter event (recurrence and death) free survival in Kaplan-Meier estimates (P log rank<0.001). This study strongly suggests that genetic variations in PI3K-AKT-mTOR pathway play an important role in bladder cancer development. The identified SNPs, if validated in further studies, may become valuable biomarkers in assessing an individual's cancer risk, predicting prognosis and treatment response, and facilitating physicians to make individualized treatment decisions. ^
Resumo:
Background. EAP programs for airline pilots in companies with a well developed recovery management program are known to reduce pilot absenteeism following treatment. Given the costs and safety consequences to society, it is important to identify pilots who may be experiencing an AOD disorder to get them into treatment. ^ Hypotheses. This study investigated the predictive power of workplace absenteeism in identifying alcohol or drug disorders (AOD). The first hypothesis was that higher absenteeism in a 12-month period is associated with higher risk that an employee is experiencing AOD. The second hypothesis was that AOD treatment would reduce subsequent absence rates and the costs of replacing pilots on missed flights. ^ Methods. A case control design using eight years (time period) of monthly archival absence data (53,000 pay records) was conducted with a sample of (N = 76) employees having an AOD diagnosis (cases) matched 1:4 with (N = 304) non-diagnosed employees (controls) of the same profession and company (male commercial airline pilots). Cases and controls were matched on the variables age, rank and date of hire. Absence rate was defined as sick time hours used over the sum of the minimum guarantee pay hours annualized using the months the pilot worked for the year. Conditional logistic regression was used to determine if absence predicts employees experiencing an AOD disorder, starting 3 years prior to the cases receiving the AOD diagnosis. A repeated measures ANOVA, t tests and rate ratios (with 95% confidence intervals) were conducted to determine differences between cases and controls in absence usage for 3 years pre and 5 years post treatment. Mean replacement costs were calculated for sick leave usage 3 years pre and 5 years post treatment to estimate the cost of sick leave from the perspective of the company. ^ Results. Sick leave, as measured by absence rate, predicted the risk of being diagnosed with an AOD disorder (OR 1.10, 95% CI = 1.06, 1.15) during the 12 months prior to receiving the diagnosis. Mean absence rates for diagnosed employees increased over the three years before treatment, particularly in the year before treatment, whereas the controls’ did not (three years, x = 6.80 vs. 5.52; two years, x = 7.81 vs. 6.30, and one year, x = 11.00cases vs. 5.51controls. In the first year post treatment compared to the year prior to treatment, rate ratios indicated a significant (60%) post treatment reduction in absence rates (OR = 0.40, CI = 0.28, 0.57). Absence rates for cases remained lower than controls for the first three years after completion of treatment. Upon discharge from the FAA and company’s three year AOD monitoring program, case’s absence rates increased slightly during the fourth year (controls, x = 0.09, SD = 0.14, cases, x = 0.12, SD = 0.21). However, the following year, their mean absence rates were again below those of the controls (controls, x = 0.08, SD = 0.12, cases, x¯ = 0.06, SD = 0.07). Significant reductions in costs associated with replacing pilots calling in sick, were found to be 60% less, between the year of diagnosis for the cases and the first year after returning to work. A reduction in replacement costs continued over the next two years for the treated employees. ^ Conclusions. This research demonstrates the potential for workplace absences as an active organizational surveillance mechanism to assist managers and supervisors in identifying employees who may be experiencing or at risk of experiencing an alcohol/drug disorder. Currently, many workplaces use only performance problems and ignore the employee’s absence record. A referral to an EAP or alcohol/drug evaluation based on the employee’s absence/sick leave record as incorporated into company policy can provide another useful indicator that may also carry less stigma, thus reducing barriers to seeking help. This research also confirms two conclusions heretofore based only on cross-sectional studies: (1) higher absence rates are associated with employees experiencing an AOD disorder; (2) treatment is associated with lower costs for replacing absent pilots. Due to the uniqueness of the employee population studied (commercial airline pilots) and the organizational documentation of absence, the generalizability of this study to other professions and occupations should be considered limited. ^ Transition to Practice. The odds ratios for the relationship between absence rates and an AOD diagnosis are precise; the OR for year of diagnosis indicates the likelihood of being diagnosed increases 10% for every hour change in sick leave taken. In practice, however, a pilot uses approximately 20 hours of sick leave for one trip, because the replacement will have to be paid the guaranteed minimum of 20 hour. Thus, the rate based on hourly changes is precise but not practical. ^ To provide the organization with practical recommendations the yearly mean absence rates were used. A pilot flies on average, 90 hours a month, 1080 annually. Cases used almost twice the mean rate of sick time the year prior to diagnosis (T-1) compared to controls (cases, x = .11, controls, x = .06). Cases are expected to use on average 119 hours annually (total annual hours*mean annual absence rate), while controls will use 60 hours. The cases’ 60 hours could translate to 3 trips of 20 hours each. Management could use a standard of 80 hours or more of sick time claimed in a year as the threshold for unacceptable absence, a 25% increase over the controls (a cost to the company of approximately of $4000). At the 80-hour mark, the Chief Pilot would be able to call the pilot in for a routine check as to the nature of the pilot’s excessive absence. This management action would be based on a company standard, rather than a behavioral or performance issue. Using absence data in this fashion would make it an active surveillance mechanism. ^
Resumo:
Objectives. This paper seeks to assess the effect on statistical power of regression model misspecification in a variety of situations. ^ Methods and results. The effect of misspecification in regression can be approximated by evaluating the correlation between the correct specification and the misspecification of the outcome variable (Harris 2010).In this paper, three misspecified models (linear, categorical and fractional polynomial) were considered. In the first section, the mathematical method of calculating the correlation between correct and misspecified models with simple mathematical forms was derived and demonstrated. In the second section, data from the National Health and Nutrition Examination Survey (NHANES 2007-2008) were used to examine such correlations. Our study shows that comparing to linear or categorical models, the fractional polynomial models, with the higher correlations, provided a better approximation of the true relationship, which was illustrated by LOESS regression. In the third section, we present the results of simulation studies that demonstrate overall misspecification in regression can produce marked decreases in power with small sample sizes. However, the categorical model had greatest power, ranging from 0.877 to 0.936 depending on sample size and outcome variable used. The power of fractional polynomial model was close to that of linear model, which ranged from 0.69 to 0.83, and appeared to be affected by the increased degrees of freedom of this model.^ Conclusion. Correlations between alternative model specifications can be used to provide a good approximation of the effect on statistical power of misspecification when the sample size is large. When model specifications have known simple mathematical forms, such correlations can be calculated mathematically. Actual public health data from NHANES 2007-2008 were used as examples to demonstrate the situations with unknown or complex correct model specification. Simulation of power for misspecified models confirmed the results based on correlation methods but also illustrated the effect of model degrees of freedom on power.^
Resumo:
The standard analyses of survival data involve the assumption that survival and censoring are independent. When censoring and survival are related, the phenomenon is known as informative censoring. This paper examines the effects of an informative censoring assumption on the hazard function and the estimated hazard ratio provided by the Cox model.^ The limiting factor in all analyses of informative censoring is the problem of non-identifiability. Non-identifiability implies that it is impossible to distinguish a situation in which censoring and death are independent from one in which there is dependence. However, it is possible that informative censoring occurs. Examination of the literature indicates how others have approached the problem and covers the relevant theoretical background.^ Three models are examined in detail. The first model uses conditionally independent marginal hazards to obtain the unconditional survival function and hazards. The second model is based on the Gumbel Type A method for combining independent marginal distributions into bivariate distributions using a dependency parameter. Finally, a formulation based on a compartmental model is presented and its results described. For the latter two approaches, the resulting hazard is used in the Cox model in a simulation study.^ The unconditional survival distribution formed from the first model involves dependency, but the crude hazard resulting from this unconditional distribution is identical to the marginal hazard, and inferences based on the hazard are valid. The hazard ratios formed from two distributions following the Gumbel Type A model are biased by a factor dependent on the amount of censoring in the two populations and the strength of the dependency of death and censoring in the two populations. The Cox model estimates this biased hazard ratio. In general, the hazard resulting from the compartmental model is not constant, even if the individual marginal hazards are constant, unless censoring is non-informative. The hazard ratio tends to a specific limit.^ Methods of evaluating situations in which informative censoring is present are described, and the relative utility of the three models examined is discussed. ^
Resumo:
Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^
Resumo:
An extension of k-ratio multiple comparison methods to rank-based analyses is described. The new method is analogous to the Duncan-Godbold approximate k-ratio procedure for unequal sample sizes or correlated means. The close parallel of the new methods to the Duncan-Godbold approach is shown by demonstrating that they are based upon different parameterizations as starting points.^ A semi-parametric basis for the new methods is shown by starting from the Cox proportional hazards model, using Wald statistics. From there the log-rank and Gehan-Breslow-Wilcoxon methods may be seen as score statistic based methods.^ Simulations and analysis of a published data set are used to show the performance of the new methods. ^