17 resultados para linear rank regression model
em DigitalCommons@The Texas Medical Center
Resumo:
Objectives. This paper seeks to assess the effect on statistical power of regression model misspecification in a variety of situations. ^ Methods and results. The effect of misspecification in regression can be approximated by evaluating the correlation between the correct specification and the misspecification of the outcome variable (Harris 2010).In this paper, three misspecified models (linear, categorical and fractional polynomial) were considered. In the first section, the mathematical method of calculating the correlation between correct and misspecified models with simple mathematical forms was derived and demonstrated. In the second section, data from the National Health and Nutrition Examination Survey (NHANES 2007-2008) were used to examine such correlations. Our study shows that comparing to linear or categorical models, the fractional polynomial models, with the higher correlations, provided a better approximation of the true relationship, which was illustrated by LOESS regression. In the third section, we present the results of simulation studies that demonstrate overall misspecification in regression can produce marked decreases in power with small sample sizes. However, the categorical model had greatest power, ranging from 0.877 to 0.936 depending on sample size and outcome variable used. The power of fractional polynomial model was close to that of linear model, which ranged from 0.69 to 0.83, and appeared to be affected by the increased degrees of freedom of this model.^ Conclusion. Correlations between alternative model specifications can be used to provide a good approximation of the effect on statistical power of misspecification when the sample size is large. When model specifications have known simple mathematical forms, such correlations can be calculated mathematically. Actual public health data from NHANES 2007-2008 were used as examples to demonstrate the situations with unknown or complex correct model specification. Simulation of power for misspecified models confirmed the results based on correlation methods but also illustrated the effect of model degrees of freedom on power.^
Resumo:
Ordinal outcomes are frequently employed in diagnosis and clinical trials. Clinical trials of Alzheimer's disease (AD) treatments are a case in point using the status of mild, moderate or severe disease as outcome measures. As in many other outcome oriented studies, the disease status may be misclassified. This study estimates the extent of misclassification in an ordinal outcome such as disease status. Also, this study estimates the extent of misclassification of a predictor variable such as genotype status. An ordinal logistic regression model is commonly used to model the relationship between disease status, the effect of treatment, and other predictive factors. A simulation study was done. First, data based on a set of hypothetical parameters and hypothetical rates of misclassification was created. Next, the maximum likelihood method was employed to generate likelihood equations accounting for misclassification. The Nelder-Mead Simplex method was used to solve for the misclassification and model parameters. Finally, this method was applied to an AD dataset to detect the amount of misclassification present. The estimates of the ordinal regression model parameters were close to the hypothetical parameters. β1 was hypothesized at 0.50 and the mean estimate was 0.488, β2 was hypothesized at 0.04 and the mean of the estimates was 0.04. Although the estimates for the rates of misclassification of X1 were not as close as β1 and β2, they validate this method. X 1 0-1 misclassification was hypothesized as 2.98% and the mean of the simulated estimates was 1.54% and, in the best case, the misclassification of k from high to medium was hypothesized at 4.87% and had a sample mean of 3.62%. In the AD dataset, the estimate for the odds ratio of X 1 of having both copies of the APOE 4 allele changed from an estimate of 1.377 to an estimate 1.418, demonstrating that the estimates of the odds ratio changed when the analysis includes adjustment for misclassification. ^
Resumo:
The standard analyses of survival data involve the assumption that survival and censoring are independent. When censoring and survival are related, the phenomenon is known as informative censoring. This paper examines the effects of an informative censoring assumption on the hazard function and the estimated hazard ratio provided by the Cox model.^ The limiting factor in all analyses of informative censoring is the problem of non-identifiability. Non-identifiability implies that it is impossible to distinguish a situation in which censoring and death are independent from one in which there is dependence. However, it is possible that informative censoring occurs. Examination of the literature indicates how others have approached the problem and covers the relevant theoretical background.^ Three models are examined in detail. The first model uses conditionally independent marginal hazards to obtain the unconditional survival function and hazards. The second model is based on the Gumbel Type A method for combining independent marginal distributions into bivariate distributions using a dependency parameter. Finally, a formulation based on a compartmental model is presented and its results described. For the latter two approaches, the resulting hazard is used in the Cox model in a simulation study.^ The unconditional survival distribution formed from the first model involves dependency, but the crude hazard resulting from this unconditional distribution is identical to the marginal hazard, and inferences based on the hazard are valid. The hazard ratios formed from two distributions following the Gumbel Type A model are biased by a factor dependent on the amount of censoring in the two populations and the strength of the dependency of death and censoring in the two populations. The Cox model estimates this biased hazard ratio. In general, the hazard resulting from the compartmental model is not constant, even if the individual marginal hazards are constant, unless censoring is non-informative. The hazard ratio tends to a specific limit.^ Methods of evaluating situations in which informative censoring is present are described, and the relative utility of the three models examined is discussed. ^
Resumo:
The problem of analyzing data with updated measurements in the time-dependent proportional hazards model arises frequently in practice. One available option is to reduce the number of intervals (or updated measurements) to be included in the Cox regression model. We empirically investigated the bias of the estimator of the time-dependent covariate while varying the effect of failure rate, sample size, true values of the parameters and the number of intervals. We also evaluated how often a time-dependent covariate needs to be collected and assessed the effect of sample size and failure rate on the power of testing a time-dependent effect.^ A time-dependent proportional hazards model with two binary covariates was considered. The time axis was partitioned into k intervals. The baseline hazard was assumed to be 1 so that the failure times were exponentially distributed in the ith interval. A type II censoring model was adopted to characterize the failure rate. The factors of interest were sample size (500, 1000), type II censoring with failure rates of 0.05, 0.10, and 0.20, and three values for each of the non-time-dependent and time-dependent covariates (1/4,1/2,3/4).^ The mean of the bias of the estimator of the coefficient of the time-dependent covariate decreased as sample size and number of intervals increased whereas the mean of the bias increased as failure rate and true values of the covariates increased. The mean of the bias of the estimator of the coefficient was smallest when all of the updated measurements were used in the model compared with two models that used selected measurements of the time-dependent covariate. For the model that included all the measurements, the coverage rates of the estimator of the coefficient of the time-dependent covariate was in most cases 90% or more except when the failure rate was high (0.20). The power associated with testing a time-dependent effect was highest when all of the measurements of the time-dependent covariate were used. An example from the Systolic Hypertension in the Elderly Program Cooperative Research Group is presented. ^
Resumo:
Background. The parents of a sick child likely experience situational anxiety due to their young child being unexpectedly hospitalized. The emotional upheaval may be great enough that their anxiety inhibits them in providing positive support to their hospitalized child. Because anxiety affects psychological distress as well as behavioral distress, identifying parental distress helps parents improving their coping mechanisms. ^ Purpose. The study compared situational anxiety levels between Taiwanese fathers and mothers and focused on differences between parental anxiety levels at the beginning of the child's unplanned hospitalization and at time of discharge. The study also identified factors related to the parents' distress and use of coping mechanisms. ^ Methods. A descriptive, comparative research design was used to determine the difference between the anxiety levels of 62 Taiwanese father-mother dyads during the situational crisis of their child's unexpected hospitalization. The Mandarin version (M) of Visual Analog Scale (VAS-M), State-Trait Anxiety Inventory (STAI-M), and the Index of Parent Participation/Hospitalized Child (IPP/HC-M) were used to differentiate maternal and paternal anxiety levels and identify factors related to the parents' distress. Questionnaires were completed by parents within 24-36 hours of the child's hospital admission and within 24 hours prior to discharge. A paired t-test, two sample t-test, and linear mixed regression model were used to test and support the study hypothesis. ^ Results. The findings reveal that the mothers' anxiety levels did not significantly differ from the fathers' anxiety level when their child had a sudden admission to the hospital. In particular, parental state anxiety levels did not decrease during the child's hospital stay and subsequent discharge. Moreover, anxiety levels did not differ between parents regardless of whether the child's disease was acute or chronic. The most effective factor related to parental situational anxiety was parental perception of the severity of the child's illness. ^ Conclusions. Parental anxiety was found to be significantly related to changes in their perception of the severity of their child's illness. However, the study was not able to illustrate how parental involvement in the child's hospital care was related to parental perception of the severity of their child's illness. Future studies, using a qualitative approach to gamer more information as to what variables influence parental anxiety during a situational crisis, may provide a richer database from which to modify key variables as well as the instruments used to improve the quality of the data obtained. ^
Resumo:
Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^
Resumo:
A historical prospective study was designed to assess the man weight status of subjects who participated in a behavioral weight reduction program in 1983 and to determine whether there was an association between the dependent variable weight change and any of 31 independent variables after a 2 year follow-up period. Data was obtained by abstracting the subjects records and from a follow-up questionnaire administered 2 years following program participation. Five hundred nine subjects (386 females and 123 males) of 1460 subjects who participated in the program, completed and returned the questionnaire. Results showed that mean weight was significantly different (p < 0.001) between the measurement at baseline and after a 2 year follow-up period. The mean weight loss of the group was 5.8 pounds, 10.7 pounds for males and 4.2 pounds for females after a 2 year follow-up period. A total of 63.9% of the group, 69.9% of males and 61.9% of females were still below their initial weight after the 2 year follow-up period. Sixteen of the 31 variables assessed utilizing bivariate analyses were found to be significantly (p (LESSTHEQ) 0.05) associated with weight change after a 2 year follow-up period. These variables were then entered into a multivariate linear regression model. A total of 37.9% of the variance of the dependent variable, weight change, was accounted for by all 16 variables. Eight of these variables were found to be significantly (p (LESSTHEQ) 0.05) predictive of weight change in the stepwise multivariate process accounting for 37.1% of the variance. These variables included: Two baseline variables (percent over ideal body weight at enrollment and occupation) and six follow-up variables (feeling in control of eating habits, percent of body weight lost during treatment, frequency of weight measurement, physical activity, eating in response to emotions, and number of pounds of weight gain needed to resume a diet). It was concluded that a greater amount of emphasis should be placed on the six follow-up variables by clinicians involved in the treatment of obesity, and by the subjects themselves to enhance their chances of success at long-term weight loss. ^
Resumo:
Background. In over 30 years, the prevalence of overweight for children and adolescents has increased across the United States (Barlow et al., 2007; Ogden, Flegal, Carroll, & Johnson, 2002). Childhood obesity is linked with adverse physiological and psychological issues in youth and affects ethnic/minority populations in disproportionate rates (Barlow et al., 2007; Butte et al., 2006; Butte, Cai, Cole, Wilson, Fisher, Zakeri, Ellis, & Comuzzie, 2007). More importantly, overweight in children and youth tends to track into adulthood (McNaughton, Ball, Mishra, & Crawford, 2008; Ogden et al., 2002). Childhood obesity affects body functions such as the cardiovascular, respiratory, gastrointestinal, and endocrine systems, including emotional health (Barlow et al., 2007, Ogden et al., 2002). Several dietary factors have been associated with the development of obesity in children; however, these factors have not been fully elucidated, especially in ethnic/minority children. In particular, few studies have been done to determine the effects of different meal patterns on the development of obesity in children. Purpose. The purpose of the study is to examine the relationships between daily proportions of energy consumed and energy derived from fat across breakfast, lunch, dinner, and snack, and obesity among Hispanic children and adolescents. Methods. A cross-sectional design was used to evaluate the relationship between dietary patterns and overweight status in Hispanic children and adolescents 4-19 years of age who participated in the Viva La Familia Study. The goal of the Viva La Familia Study was to evaluate genetic and environmental factors affecting childhood obesity and its co-morbidities in the Hispanic population (Butte et al., 2006, 2007). The study enrolled 1030 Hispanic children and adolescents from 319 families and examined factors related to increased body weight by focusing on a multilevel analysis of extensive sociodemographic, genetic, metabolic, and behavioral data. Baseline dietary intakes of the children were collected using 24-hour recalls, and body mass index was calculated from measured height and weight, and classified using the CDC standards. Dietary data were analyzed using a GEE population-averaged panel-data model with a cluster variable family identifier to include possible correlations within related data sets. A linear regression model was used to analyze associations of dietary patterns using possible covariates, and to examine the percentage of daily energy coming from breakfast, lunch, dinner, and snack while adjusting for age, sex, and BMI z-score. Random-effects logistic regression models were used to determine the relationship of the dietary variables with obesity status and to understand if the percent energy intake (%EI) derived from fat from all meals (breakfast, lunch, dinner, and snacks) affected obesity. Results. Older children (age 4-19 years) consumed a higher percent of energy at lunch and dinner and less percent energy from snacks compared to younger children. Age was significantly associated with percentage of total energy intake (%TEI) for lunch, as well as dinner, while no association was found by gender. Percent of energy consumed from dinner significantly differed by obesity status, with obese children consuming more energy at dinner (p = 0.03), but no associations were found between percent energy from fat and obesity across all meals. Conclusions. Information from this study can be used to develop interventions that target dietary intake patterns in obesity prevention programs for Hispanic children and adolescents. In particular, intervention programs for children should target dietary patterns with energy intake that is spread throughout the day and earlier in the day. These results indicate that a longitudinal study should be used to further explore the relationship of dietary patterns and BMI in this and other populations (Dubois et al., 2008; Rodriquez & Moreno, 2006; Thompson et al., 2005; Wilson et al., in review, 2008). ^
Whence a healthy mind: Correlation of physical fitness and academic performance among schoolchildren
Resumo:
Background. Public schools are a key forum in the fight for child health because of the opportunities they present for physical activity and fitness surveillance. However, because schools are evaluated and funded on the basis of standardized academic performance rather than physical activity, empirical research evaluating the connections between fitness and academic performance is needed to justify curriculum allocations to physical activity. ^ Methods. Analyses were based on a convenience sample of 315,092 individually-matched standardized academic (TAKS™) and fitness (FITNESSGRAM®) test records collected by 13 Texas school districts under state mandates. We categorized each fitness result in quintiles by age and gender and used a mixed effects regression model to compare the academic performance of the top and bottom fitness groups for each fitness test and grade level combination. ^ Results. All fitness variables except BMI showed significant, positive associations with academic performance after sociodemographic covariate adjustments, with effect sizes ranging from 0.07 (95% CI: 0.05,0.08) in girls trunklift-TAKS reading to 0.34 (0.32,0.35) in boys cardiovascular-TAKS math. Cardiovascular fitness showed the largest inter-quintile difference in TAKS score (32-75 points), followed by curl-ups. After an additional adjustment for BMI and curl-ups, cardiovascular associations peaked in 8th-9 th grades (maximum inter-quintile difference 142 TAKS points; effect size 0.75 (0.69,0.82) for 8th grade girls math) and showed dose-response characteristics across quintiles (p<0.001 for both genders and outcomes). BMI analysis demonstrated limited, non-linear association with academic performance after adjustment for sociodemographic, cardiovascular fitness and curl-up variables. Low-BMI Hispanic high school boys showed significantly lower TAKS scores than the moderate (but not high) BMI group. High-BMI non-Hispanic white high school girls showed significantly lower scores than the moderate (but not low) BMI group. ^ Conclusions. In this study, fitness was strongly and significantly related to academic performance. Cardiovascular fitness showed a distinct dose-response association with academic performance independent of other sociodemographic and fitness variables. The association peaked in late middle to early high school. The independent association of BMI to academic performance was only found in two sub-groups and was non-linear, with both low and high BMI posing risk relative to moderate BMI but not to each other. In light of our findings, we recommend that policymakers consider PE mandates in middle-high school and require linkage of academic and fitness records to facilitate longitudinal surveillance. School administrators should consider increasing PE time in pursuit of higher academic test scores, and PE practitioners should emphasize cardiovascular fitness over BMI reduction.^
Resumo:
This study described the relationship of sexual maturation and blood pressure in a sample (n = 361) of white females, ages seven through 18, attending public schools in a defined area of Central Texas during October through December, 1984. Other correlates of blood pressure were also described for this sample.^ A survey was performed to obtain the data on height, weight, body mass, pulse rate, upper arm circumference and length, and blood pressure. Each subject self-assessed her secondary sex characteristics (breast and pubic hair) according to drawings of the Tanner stages of maturation. The subjects were interviewed to obtain data on personal health habits and menstrual status. Student age, ethnic group and place of residence were abstracted from school records. Parents or guardians of the subjects responded to a questionnaire pertaining to parental and subject health history and parents' occupation and educational attainment.^ In the simple linear regression analysis, sexual maturation and variables of body size were significantly (p < 0.001) and positively associated with systolic and fourth- and fifth-phase diastolic blood pressure. The demographic and socioeconomic variables were not sufficiently variant in this population to have differential effects on the relation between blood pressure and maturation. Stepwise multiple regression was used to assess the contribution of sexual maturation to the variance of blood pressure after accounting for the variables of body size. Sexual maturation (breast stage) along with weight, height and body mass remained in the multiple regression models for fourth- and fifth-phase diastolic blood pressure. Only height and body mass remained in the regression model for systolic blood pressure; sexual maturation did not contribute more to the explanation of the systolic blood pressure variance.^ The association of sexual maturation with blood pressure level was established in this sample of young white females. More research is needed first, to determine if this relationship prevails in other populations of young females, and second, to determine the relationship of sexual maturation sequence and change with the change of blood pressure during childhood and adolescence. ^
Resumo:
Traditional comparison of standardized mortality ratios (SMRs) can be misleading if the age-specific mortality ratios are not homogeneous. For this reason, a regression model has been developed which incorporates the mortality ratio as a function of age. This model is then applied to mortality data from an occupational cohort study. The nature of the occupational data necessitates the investigation of mortality ratios which increase with age. These occupational data are used primarily to illustrate and develop the statistical methodology.^ The age-specific mortality ratio (MR) for the covariates of interest can be written as MR(,ij...m) = ((mu)(,ij...m)/(theta)(,ij...m)) = r(.)exp (Z('')(,ij...m)(beta)) where (mu)(,ij...m) and (theta)(,ij...m) denote the force of mortality in the study and chosen standard populations in the ij...m('th) stratum, respectively, r is the intercept, Z(,ij...m) is the vector of covariables associated with the i('th) age interval, and (beta) is a vector of regression coefficients associated with these covariables. A Newton-Raphson iterative procedure has been used for determining the maximum likelihood estimates of the regression coefficients.^ This model provides a statistical method for a logical and easily interpretable explanation of an occupational cohort mortality experience. Since it gives a reasonable fit to the mortality data, it can also be concluded that the model is fairly realistic. The traditional statistical method for the analysis of occupational cohort mortality data is to present a summary index such as the SMR under the assumption of constant (homogeneous) age-specific mortality ratios. Since the mortality ratios for occupational groups usually increase with age, the homogeneity assumption of the age-specific mortality ratios is often untenable. The traditional method of comparing SMRs under the homogeneity assumption is a special case of this model, without age as a covariate.^ This model also provides a statistical technique to evaluate the relative risk between two SMRs or a dose-response relationship among several SMRs. The model presented has application in the medical, demographic and epidemiologic areas. The methods developed in this thesis are suitable for future analyses of mortality or morbidity data when the age-specific mortality/morbidity experience is a function of age or when there is an interaction effect between confounding variables needs to be evaluated. ^
Resumo:
In regression analysis, covariate measurement error occurs in many applications. The error-prone covariates are often referred to as latent variables. In this proposed study, we extended the study of Chan et al. (2008) on recovering latent slope in a simple regression model to that in a multiple regression model. We presented an approach that applied the Monte Carlo method in the Bayesian framework to the parametric regression model with the measurement error in an explanatory variable. The proposed estimator applied the conditional expectation of latent slope given the observed outcome and surrogate variables in the multiple regression models. A simulation study was presented showing that the method produces estimator that is efficient in the multiple regression model, especially when the measurement error variance of surrogate variable is large.^
Resumo:
Numerous harmful occupational exposures affect working teens in the United States. Teens working in agriculture and other heavy-labor industries may be at risk for occupational exposures to pesticides and solvents. The neurotoxicity of pesticides and solvents at high doses is well-known; however, the long term effects of these substances at low doses on occupationally exposed adolescents have not been well-studied. To address this research gap, a secondary analysis of cross-sectional data was completed in order to estimate the prevalence of self-reported symptoms of neurotoxicity among a cohort of high school students from Starr County, Texas, a rural area along the Texas-Mexico border. Multivariable linear regression was used to estimate the association between work status (i.e., no work, farm work, and non-farm work) and symptoms of neurotoxicity, while controlling for age, gender, Spanish speaking preference, inhalant use, tobacco use, and alcohol use. The sample included 1,208 students. Of these, the majority (85.84%) did not report having worked during the prior nine months compared to 4.80% who did only farm work, 6.21% who did only non-farm work, and 3.15% who did both types of work. On average, students reported 3.26 symptoms with a range from 0-16. The most commonly endorsed items across work status were those related to memory impairment. Adolescents employed in non-farm work jobs reported more neurotoxicity symptoms than those who reported that they did not work (Mean 4.31; SD 3.97). In the adjusted multivariable regression model, adolescents reporting non-farm work status reported an average of 0.77 more neurotoxicity symptoms on the Q16 than those who did not work (P = 0.031). The confounding variables included in the final model were all found to be factors significantly associated with report of neurotoxicity symptoms. Future research should examine the relationship between these variables and self-report of symptoms of neurotoxicity.^
Resumo:
The tobacco-specific nitrosamine 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (NNK) is an obvious carcinogen for lung cancer. Since CBMN (Cytokinesis-blocked micronucleus) has been found to be extremely sensitive to NNK-induced genetic damage, it is a potential important factor to predict the lung cancer risk. However, the association between lung cancer and NNK-induced genetic damage measured by CBMN assay has not been rigorously examined. ^ This research develops a methodology to model the chromosomal changes under NNK-induced genetic damage in a logistic regression framework in order to predict the occurrence of lung cancer. Since these chromosomal changes were usually not observed very long due to laboratory cost and time, a resampling technique was applied to generate the Markov chain of the normal and the damaged cell for each individual. A joint likelihood between the resampled Markov chains and the logistic regression model including transition probabilities of this chain as covariates was established. The Maximum likelihood estimation was applied to carry on the statistical test for comparison. The ability of this approach to increase discriminating power to predict lung cancer was compared to a baseline "non-genetic" model. ^ Our method offered an option to understand the association between the dynamic cell information and lung cancer. Our study indicated the extent of DNA damage/non-damage using the CBMN assay provides critical information that impacts public health studies of lung cancer risk. This novel statistical method could simultaneously estimate the process of DNA damage/non-damage and its relationship with lung cancer for each individual.^