907 resultados para Weighted regression
Resumo:
Geographic health planning analyses, such as service area calculations, are hampered by a lack of patient-specific geographic data. Using the limited patient address information in patient management systems, planners analyze patient origin based on home address. But activity space research done sparingly in public health and extensively in non-health related arenas uses multiple addresses per person when analyzing accessibility. Also, health care access research has shown that there are many non-geographic factors that influence choice of provider. Most planning methods, however, overlook non-geographic factors influencing choice of provider, and the limited data mean the analyses can only be related to home address. This research attempted to determine to what extent geography plays a part in patient choice of provider and to determine if activity space data can be used to calculate service areas for primary care providers. ^ During Spring 2008, a convenience sample of 384 patients of a locally-funded Community Health Center in Houston, Texas, completed a survey that asked about what factors are important when he or she selects a health care provider. A subset of this group (336) also completed an activity space log that captured location and time data on the places where the patient regularly goes. ^ Survey results indicate that for this patient population, geography plays a role in their choice of health care provider, but it is not the most important reason for choosing a provider. Other factors for choosing a health care provider such as the provider offering "free or low cost visits", meeting "all of the patient's health care needs", and seeing "the patient quickly" were all ranked higher than geographic reasons. ^ Analysis of the patient activity locations shows that activity spaces can be used to create service areas for a single primary care provider. Weighted activity-space-based service areas have the potential to include more patients in the service area since more than one location per patient is used. Further analysis of the logs shows that a reduced set of locations by time and type could be used for this methodology, facilitating ongoing data collection for activity-space-based planning efforts. ^
Resumo:
Based on asthma prevalence data collected from the 2000 BRFSS survey, approximately 14.7 million U.S. adults had current asthma, accounting for 7.2% of the total U.S. population. In Texas alone, state data extrapolated from the 1999-2003 Texas BRFSS suggested that approximately 1 million Texas adults were reporting current asthma and approximately 11% of the adult population has been diagnosed with the illness during their lifetime. From a public health perspective, the disease is manageable. Comprehensive state-specific asthma surveillance data are necessary to identify disparities in asthma prevalence and asthma-control characteristics among subpopulations and to develop targeted public health interventions. The purpose of this study was to determine the relative importance of various risk factors of asthma and to examine the impact of asthma on health-related quality of life among adult residents of Texas. ^ The study employed a cross-sectional study of respondents in Texas. The study extracted all the variables related to asthma along with their associated demographic, socioeconomic, and quality of life variables from the 2007 BRFSS data for 17,248 adult residents of Texas aged 18 and older. Chi-square test and logistic regression using SPSS were used in various data analyses on weighted data, adjusting for the complex sample design of the BRFSS data. All chi-square analyses were carried out using SPSS's CSTABULATE command. In addition, logistic regression models were fitted using SPSS's CSLOGISTIC command. ^ Risks factors significantly associated with reporting current asthma included BMI, race/ethnicity, gender, and income. Holding all other variables constant, obese adults were almost twice as likely to report current asthma as those adults who were normal weight (odds ratio [OR], 1.78; 95% confidence interval [CI], 1.25 to 2.53). Other non-Hispanic adults were significantly more likely to report current asthma than non-Hispanic Whites (OR, 2.43; 95% CI, 1.38 to 4.25), while Hispanics were significantly less likely to report current asthma than non-Hispanic Whites (OR, 0.38; 95% CI, 0.25 to 0.60), after controlling for all other variables. After adjusting for all other variables, adult females were almost twice as likely to report current asthma as males (OR, 1.97; 95% CI, 1.49 to 2.60). Adults with household income of less than $15,000 were almost twice as likely to report current asthma as those persons with an annual household income of $50,000 or more (OR, 1.98; 95% CI, 1.33 to 2.94). In regards to the association between asthma and health-related quality of life, after adjusting for age, race/ethnicity, gender, tobacco use, body mass index (BMI), exercise, education, and income, adults with current asthma compared to those without asthma were more likely to report having more than 15 days of unhealthy physical health (OR, 1.84; 95% CI, 1.29 to 2.60). ^ Overall, the findings of this study provide insight and valuable information into the populations in Texas most adversely affected by asthma and health-related consequences of the disease condition. Further research could build on the findings of this study by replicating this study as closely as possible in other asthma settings, and look at the relationship for hospitalization rates, asthma severity, and mortality.^
Resumo:
Ordinal outcomes are frequently employed in diagnosis and clinical trials. Clinical trials of Alzheimer's disease (AD) treatments are a case in point using the status of mild, moderate or severe disease as outcome measures. As in many other outcome oriented studies, the disease status may be misclassified. This study estimates the extent of misclassification in an ordinal outcome such as disease status. Also, this study estimates the extent of misclassification of a predictor variable such as genotype status. An ordinal logistic regression model is commonly used to model the relationship between disease status, the effect of treatment, and other predictive factors. A simulation study was done. First, data based on a set of hypothetical parameters and hypothetical rates of misclassification was created. Next, the maximum likelihood method was employed to generate likelihood equations accounting for misclassification. The Nelder-Mead Simplex method was used to solve for the misclassification and model parameters. Finally, this method was applied to an AD dataset to detect the amount of misclassification present. The estimates of the ordinal regression model parameters were close to the hypothetical parameters. β1 was hypothesized at 0.50 and the mean estimate was 0.488, β2 was hypothesized at 0.04 and the mean of the estimates was 0.04. Although the estimates for the rates of misclassification of X1 were not as close as β1 and β2, they validate this method. X 1 0-1 misclassification was hypothesized as 2.98% and the mean of the simulated estimates was 1.54% and, in the best case, the misclassification of k from high to medium was hypothesized at 4.87% and had a sample mean of 3.62%. In the AD dataset, the estimate for the odds ratio of X 1 of having both copies of the APOE 4 allele changed from an estimate of 1.377 to an estimate 1.418, demonstrating that the estimates of the odds ratio changed when the analysis includes adjustment for misclassification. ^
Resumo:
The recent hurricanes of Katrina, Rita, and Dolly have brought to light the precarious situation populations place themselves in when they are unprepared to face a storm, or do not follow official orders to evacuate when a destructive hurricane is poised to hit the area. Three counties in southern Texas lie within 60 miles of the Gulf of Mexico, and along the Mexican border. Determining the barriers to hurricane evacuation in this distinct and highly impoverished area of the United States would help aid local, state, and federal agencies to respond more effectively to persons living here.^ The aim of this study was to examine intention to comply with mandatory hurricane evacuation orders among persons living in three counties in South Texas by gender, income, education, acculturation and county of residence. A questionnaire was administered to 3,088 households across the three counties using a two-stage cluster sampling strategy, stratified by all three counties. The door-to-door survey was a 73-item instrument that included demographics, reasons for and against evacuation, and preparedness for a hurricane. Weighted data were used for the analyses.^ Chi-square tests were run to determine whether differences between observed and expected frequencies were statistically significant. A logistic regression model was developed based on that univariate analysis. Results from the logistic regression estimated odds ratios and their 95 percent confidence intervals for the independent variables.^ Logistic regression results indicate that females were less likely than men to follow an evacuation order. Having a higher education meant more likelihood of evacuating. Those respondents with a higher affiliation with Spanish than English were more likely to follow the evacuation orders. Hidalgo County residents were less likely to evacuate than Cameron or Willacy Counties' residents. Local officials need to implement communication efforts specifically tailored for females, residents with less of an affiliation with Spanish, and Hidalgo County residents to ensure their successful evacuation prior to a strong hurricane's landfall.^
Resumo:
Objectives. This paper seeks to assess the effect on statistical power of regression model misspecification in a variety of situations. ^ Methods and results. The effect of misspecification in regression can be approximated by evaluating the correlation between the correct specification and the misspecification of the outcome variable (Harris 2010).In this paper, three misspecified models (linear, categorical and fractional polynomial) were considered. In the first section, the mathematical method of calculating the correlation between correct and misspecified models with simple mathematical forms was derived and demonstrated. In the second section, data from the National Health and Nutrition Examination Survey (NHANES 2007-2008) were used to examine such correlations. Our study shows that comparing to linear or categorical models, the fractional polynomial models, with the higher correlations, provided a better approximation of the true relationship, which was illustrated by LOESS regression. In the third section, we present the results of simulation studies that demonstrate overall misspecification in regression can produce marked decreases in power with small sample sizes. However, the categorical model had greatest power, ranging from 0.877 to 0.936 depending on sample size and outcome variable used. The power of fractional polynomial model was close to that of linear model, which ranged from 0.69 to 0.83, and appeared to be affected by the increased degrees of freedom of this model.^ Conclusion. Correlations between alternative model specifications can be used to provide a good approximation of the effect on statistical power of misspecification when the sample size is large. When model specifications have known simple mathematical forms, such correlations can be calculated mathematically. Actual public health data from NHANES 2007-2008 were used as examples to demonstrate the situations with unknown or complex correct model specification. Simulation of power for misspecified models confirmed the results based on correlation methods but also illustrated the effect of model degrees of freedom on power.^
Resumo:
The standard analyses of survival data involve the assumption that survival and censoring are independent. When censoring and survival are related, the phenomenon is known as informative censoring. This paper examines the effects of an informative censoring assumption on the hazard function and the estimated hazard ratio provided by the Cox model.^ The limiting factor in all analyses of informative censoring is the problem of non-identifiability. Non-identifiability implies that it is impossible to distinguish a situation in which censoring and death are independent from one in which there is dependence. However, it is possible that informative censoring occurs. Examination of the literature indicates how others have approached the problem and covers the relevant theoretical background.^ Three models are examined in detail. The first model uses conditionally independent marginal hazards to obtain the unconditional survival function and hazards. The second model is based on the Gumbel Type A method for combining independent marginal distributions into bivariate distributions using a dependency parameter. Finally, a formulation based on a compartmental model is presented and its results described. For the latter two approaches, the resulting hazard is used in the Cox model in a simulation study.^ The unconditional survival distribution formed from the first model involves dependency, but the crude hazard resulting from this unconditional distribution is identical to the marginal hazard, and inferences based on the hazard are valid. The hazard ratios formed from two distributions following the Gumbel Type A model are biased by a factor dependent on the amount of censoring in the two populations and the strength of the dependency of death and censoring in the two populations. The Cox model estimates this biased hazard ratio. In general, the hazard resulting from the compartmental model is not constant, even if the individual marginal hazards are constant, unless censoring is non-informative. The hazard ratio tends to a specific limit.^ Methods of evaluating situations in which informative censoring is present are described, and the relative utility of the three models examined is discussed. ^
Resumo:
Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^
Resumo:
An analysis of variation in hospital inpatient charges in the greater Houston area is conducted to determine if there are consistent differences among payers. Differences in charges are examined for 59 Composite Diagnosis Related Groups (CDRGs) and two regression equations estimating charges are specified. Simple comparison of mean charges by diagnostic categories are significantly different for 42 (71 percent) of the 59 categories examined. In 41 of the 42 significant categories, charges to Medicaid were less than charges to private insurers. Meta-analytic statistical techniques yielded a weighted average effect size of $-$0.7198 for the 59 diagnostic categories, indicating an overall effect that Medicaid charges were less than private insurance charges. Results of a multiple regression estimating charges showed that private insurance was a significant independent variable, along with age, length of stay, and hospital variables. Results indicated consistent differential charges in the present analysis. ^
Resumo:
This dissertation develops and explores the methodology for the use of cubic spline functions in assessing time-by-covariate interactions in Cox proportional hazards regression models. These interactions indicate violations of the proportional hazards assumption of the Cox model. Use of cubic spline functions allows for the investigation of the shape of a possible covariate time-dependence without having to specify a particular functional form. Cubic spline functions yield both a graphical method and a formal test for the proportional hazards assumption as well as a test of the nonlinearity of the time-by-covariate interaction. Five existing methods for assessing violations of the proportional hazards assumption are reviewed and applied along with cubic splines to three well known two-sample datasets. An additional dataset with three covariates is used to explore the use of cubic spline functions in a more general setting. ^
Resumo:
A Bayesian approach to estimation of the regression coefficients of a multinominal logit model with ordinal scale response categories is presented. A Monte Carlo method is used to construct the posterior distribution of the link function. The link function is treated as an arbitrary scalar function. Then the Gauss-Markov theorem is used to determine a function of the link which produces a random vector of coefficients. The posterior distribution of the random vector of coefficients is used to estimate the regression coefficients. The method described is referred to as a Bayesian generalized least square (BGLS) analysis. Two cases involving multinominal logit models are described. Case I involves a cumulative logit model and Case II involves a proportional-odds model. All inferences about the coefficients for both cases are described in terms of the posterior distribution of the regression coefficients. The results from the BGLS method are compared to maximum likelihood estimates of the regression coefficients. The BGLS method avoids the nonlinear problems encountered when estimating the regression coefficients of a generalized linear model. The method is not complex or computationally intensive. The BGLS method offers several advantages over Bayesian approaches. ^
Resumo:
Logistic regression is one of the most important tools in the analysis of epidemiological and clinical data. Such data often contain missing values for one or more variables. Common practice is to eliminate all individuals for whom any information is missing. This deletion approach does not make efficient use of available information and often introduces bias.^ Two methods were developed to estimate logistic regression coefficients for mixed dichotomous and continuous covariates including partially observed binary covariates. The data were assumed missing at random (MAR). One method (PD) used predictive distribution as weight to calculate the average of the logistic regressions performing on all possible values of missing observations, and the second method (RS) used a variant of resampling technique. Additional seven methods were compared with these two approaches in a simulation study. They are: (1) Analysis based on only the complete cases, (2) Substituting the mean of the observed values for the missing value, (3) An imputation technique based on the proportions of observed data, (4) Regressing the partially observed covariates on the remaining continuous covariates, (5) Regressing the partially observed covariates on the remaining continuous covariates conditional on response variable, (6) Regressing the partially observed covariates on the remaining continuous covariates and response variable, and (7) EM algorithm. Both proposed methods showed smaller standard errors (s.e.) for the coefficient involving the partially observed covariate and for the other coefficients as well. However, both methods, especially PD, are computationally demanding; thus for analysis of large data sets with partially observed covariates, further refinement of these approaches is needed. ^
Resumo:
A large number of ridge regression estimators have been proposed and used with little knowledge of their true distributions. Because of this lack of knowledge, these estimators cannot be used to test hypotheses or to form confidence intervals.^ This paper presents a basic technique for deriving the exact distribution functions for a class of generalized ridge estimators. The technique is applied to five prominent generalized ridge estimators. Graphs of the resulting distribution functions are presented. The actual behavior of these estimators is found to be considerably different than the behavior which is generally assumed for ridge estimators.^ This paper also uses the derived distributions to examine the mean squared error properties of the estimators. A technique for developing confidence intervals based on the generalized ridge estimators is also presented. ^
Resumo:
The purpose of this study was to understand the role of principle economic, sociodemographic and health status factors in determining the likelihood and volume of prescription drug use. Econometric demand regression models were developed for this purpose. Ten explanatory variables were examined: family income, coinsurance rate, age, sex, race, household head education level, size of family, health status, number of medical visits, and type of provider seen during medical visits. The economic factors (family income and coinsurance) were given special emphasis in this study.^ The National Medical Care Utilization and Expenditure Survey (NMCUES) was the data source. The sample represented the civilian, noninstitutionalized residents of the United States in 1980. The sample method used in the survey was a stratified four-stage, area probability design. The sample was comprised of 6,600 households (17,123 individuals). The weighted sample provided the population estimates used in the analysis. Five repeated interviews were conducted with each household. The household survey provided detailed information on the United States health status, pattern of health care utilization, charges for services received, and methods of payments for 1980.^ The study provided evidence that economic factors influenced the use of prescription drugs, but the use was not highly responsive to family income and coinsurance for the levels examined. The elasticities for family income ranged from -.0002 to -.013 and coinsurance ranged from -.174 to -.108. Income has a greater influence on the likelihood of prescription drug use, and coinsurance rates had an impact on the amount spent on prescription drugs. The coinsurance effect was not examined for the likelihood of drug use due to limitations in the measurement of coinsurance. Health status appeared to overwhelm any effects which may be attributed to family income or coinsurance. The likelihood of prescription drug use was highly dependent on visits to medical providers. The volume of prescription drug use was highly dependent on the health status, age, and whether or not the individual saw a general practitioner. ^
Resumo:
The history of the logistic function since its introduction in 1838 is reviewed, and the logistic model for a polychotomous response variable is presented with a discussion of the assumptions involved in its derivation and use. Following this, the maximum likelihood estimators for the model parameters are derived along with a Newton-Raphson iterative procedure for evaluation. A rigorous mathematical derivation of the limiting distribution of the maximum likelihood estimators is then presented using a characteristic function approach. An appendix with theorems on the asymptotic normality of sample sums when the observations are not identically distributed, with proofs, supports the presentation on asymptotic properties of the maximum likelihood estimators. Finally, two applications of the model are presented using data from the Hypertension Detection and Follow-up Program, a prospective, population-based, randomized trial of treatment for hypertension. The first application compares the risk of five-year mortality from cardiovascular causes with that from noncardiovascular causes; the second application compares risk factors for fatal or nonfatal coronary heart disease with those for fatal or nonfatal stroke. ^