19 resultados para general regression model
em DigitalCommons@The Texas Medical Center
Resumo:
The standard analyses of survival data involve the assumption that survival and censoring are independent. When censoring and survival are related, the phenomenon is known as informative censoring. This paper examines the effects of an informative censoring assumption on the hazard function and the estimated hazard ratio provided by the Cox model.^ The limiting factor in all analyses of informative censoring is the problem of non-identifiability. Non-identifiability implies that it is impossible to distinguish a situation in which censoring and death are independent from one in which there is dependence. However, it is possible that informative censoring occurs. Examination of the literature indicates how others have approached the problem and covers the relevant theoretical background.^ Three models are examined in detail. The first model uses conditionally independent marginal hazards to obtain the unconditional survival function and hazards. The second model is based on the Gumbel Type A method for combining independent marginal distributions into bivariate distributions using a dependency parameter. Finally, a formulation based on a compartmental model is presented and its results described. For the latter two approaches, the resulting hazard is used in the Cox model in a simulation study.^ The unconditional survival distribution formed from the first model involves dependency, but the crude hazard resulting from this unconditional distribution is identical to the marginal hazard, and inferences based on the hazard are valid. The hazard ratios formed from two distributions following the Gumbel Type A model are biased by a factor dependent on the amount of censoring in the two populations and the strength of the dependency of death and censoring in the two populations. The Cox model estimates this biased hazard ratio. In general, the hazard resulting from the compartmental model is not constant, even if the individual marginal hazards are constant, unless censoring is non-informative. The hazard ratio tends to a specific limit.^ Methods of evaluating situations in which informative censoring is present are described, and the relative utility of the three models examined is discussed. ^
Resumo:
Ordinal outcomes are frequently employed in diagnosis and clinical trials. Clinical trials of Alzheimer's disease (AD) treatments are a case in point using the status of mild, moderate or severe disease as outcome measures. As in many other outcome oriented studies, the disease status may be misclassified. This study estimates the extent of misclassification in an ordinal outcome such as disease status. Also, this study estimates the extent of misclassification of a predictor variable such as genotype status. An ordinal logistic regression model is commonly used to model the relationship between disease status, the effect of treatment, and other predictive factors. A simulation study was done. First, data based on a set of hypothetical parameters and hypothetical rates of misclassification was created. Next, the maximum likelihood method was employed to generate likelihood equations accounting for misclassification. The Nelder-Mead Simplex method was used to solve for the misclassification and model parameters. Finally, this method was applied to an AD dataset to detect the amount of misclassification present. The estimates of the ordinal regression model parameters were close to the hypothetical parameters. β1 was hypothesized at 0.50 and the mean estimate was 0.488, β2 was hypothesized at 0.04 and the mean of the estimates was 0.04. Although the estimates for the rates of misclassification of X1 were not as close as β1 and β2, they validate this method. X 1 0-1 misclassification was hypothesized as 2.98% and the mean of the simulated estimates was 1.54% and, in the best case, the misclassification of k from high to medium was hypothesized at 4.87% and had a sample mean of 3.62%. In the AD dataset, the estimate for the odds ratio of X 1 of having both copies of the APOE 4 allele changed from an estimate of 1.377 to an estimate 1.418, demonstrating that the estimates of the odds ratio changed when the analysis includes adjustment for misclassification. ^
Resumo:
Objectives. This paper seeks to assess the effect on statistical power of regression model misspecification in a variety of situations. ^ Methods and results. The effect of misspecification in regression can be approximated by evaluating the correlation between the correct specification and the misspecification of the outcome variable (Harris 2010).In this paper, three misspecified models (linear, categorical and fractional polynomial) were considered. In the first section, the mathematical method of calculating the correlation between correct and misspecified models with simple mathematical forms was derived and demonstrated. In the second section, data from the National Health and Nutrition Examination Survey (NHANES 2007-2008) were used to examine such correlations. Our study shows that comparing to linear or categorical models, the fractional polynomial models, with the higher correlations, provided a better approximation of the true relationship, which was illustrated by LOESS regression. In the third section, we present the results of simulation studies that demonstrate overall misspecification in regression can produce marked decreases in power with small sample sizes. However, the categorical model had greatest power, ranging from 0.877 to 0.936 depending on sample size and outcome variable used. The power of fractional polynomial model was close to that of linear model, which ranged from 0.69 to 0.83, and appeared to be affected by the increased degrees of freedom of this model.^ Conclusion. Correlations between alternative model specifications can be used to provide a good approximation of the effect on statistical power of misspecification when the sample size is large. When model specifications have known simple mathematical forms, such correlations can be calculated mathematically. Actual public health data from NHANES 2007-2008 were used as examples to demonstrate the situations with unknown or complex correct model specification. Simulation of power for misspecified models confirmed the results based on correlation methods but also illustrated the effect of model degrees of freedom on power.^
Resumo:
With most clinical trials, missing data presents a statistical problem in evaluating a treatment's efficacy. There are many methods commonly used to assess missing data; however, these methods leave room for bias to enter the study. This thesis was a secondary analysis on data taken from TIME, a phase 2 randomized clinical trial conducted to evaluate the safety and effect of the administration timing of bone marrow mononuclear cells (BMMNC) for subjects with acute myocardial infarction (AMI).^ We evaluated the effect of missing data by comparing the variance inflation factor (VIF) of the effect of therapy between all subjects and only subjects with complete data. Through the general linear model, an unbiased solution was made for the VIF of the treatment's efficacy using the weighted least squares method to incorporate missing data. Two groups were identified from the TIME data: 1) all subjects and 2) subjects with complete data (baseline and follow-up measurements). After the general solution was found for the VIF, it was migrated Excel 2010 to evaluate data from TIME. The resulting numerical value from the two groups was compared to assess the effect of missing data.^ The VIF values from the TIME study were considerably less in the group with missing data. By design, we varied the correlation factor in order to evaluate the VIFs of both groups. As the correlation factor increased, the VIF values increased at a faster rate in the group with only complete data. Furthermore, while varying the correlation factor, the number of subjects with missing data was also varied to see how missing data affects the VIF. When subjects with only baseline data was increased, we saw a significant rate increase in VIF values in the group with only complete data while the group with missing data saw a steady and consistent increase in the VIF. The same was seen when we varied the group with follow-up only data. This essentially showed that the VIFs steadily increased when missing data is not ignored. When missing data is ignored as with our comparison group, the VIF values sharply increase as correlation increases.^
Resumo:
The problem of analyzing data with updated measurements in the time-dependent proportional hazards model arises frequently in practice. One available option is to reduce the number of intervals (or updated measurements) to be included in the Cox regression model. We empirically investigated the bias of the estimator of the time-dependent covariate while varying the effect of failure rate, sample size, true values of the parameters and the number of intervals. We also evaluated how often a time-dependent covariate needs to be collected and assessed the effect of sample size and failure rate on the power of testing a time-dependent effect.^ A time-dependent proportional hazards model with two binary covariates was considered. The time axis was partitioned into k intervals. The baseline hazard was assumed to be 1 so that the failure times were exponentially distributed in the ith interval. A type II censoring model was adopted to characterize the failure rate. The factors of interest were sample size (500, 1000), type II censoring with failure rates of 0.05, 0.10, and 0.20, and three values for each of the non-time-dependent and time-dependent covariates (1/4,1/2,3/4).^ The mean of the bias of the estimator of the coefficient of the time-dependent covariate decreased as sample size and number of intervals increased whereas the mean of the bias increased as failure rate and true values of the covariates increased. The mean of the bias of the estimator of the coefficient was smallest when all of the updated measurements were used in the model compared with two models that used selected measurements of the time-dependent covariate. For the model that included all the measurements, the coverage rates of the estimator of the coefficient of the time-dependent covariate was in most cases 90% or more except when the failure rate was high (0.20). The power associated with testing a time-dependent effect was highest when all of the measurements of the time-dependent covariate were used. An example from the Systolic Hypertension in the Elderly Program Cooperative Research Group is presented. ^
Resumo:
While clinical studies have shown a negative relationship between obesity and mental health in women, population studies have not shown a consistent association. However, many of these studies can be criticized regarding fatness level criteria, lack of control variables, and validity of the psychological variables.^ The purpose of this research was to elucidate the relationship between fatness level and mental health in United States women using data from the First National Health and Nutrition Examination Survey (NHANES I), which was conducted on a national probability sample from 1971 to 1974. Mental health was measured by the General Well-Being Schedule (GWB), and fatness level was determined by the sum of the triceps and subscapular skinfolds. Women were categorized as lean (15th percentile or less), normal (16th to 84th percentiles), or obese (85th percentile or greater).^ A conceptual framework was developed which identified the variables of age, race, marital status, socioeconomic status (education), employment status, number of births, physical health, weight history, and perception of body image as important to the fatness level-GWB relationship. Multiple regression analyses were performed separately for whites and blacks with GWB as the response variable, and fatness level, age, education, employment status, number of births, marital status, and health perception as predictor variables. In addition, 2- and 3-way interaction terms for leanness, obesity and age were included as predictor variables. Variables related to weight history and perception of body image were not collected in NHANES I, and thus were not included in this study.^ The results indicated that obesity was a statistically significant predictor of lower GWB in white women even when the other predictor variables were controlled. The full regression model identified the young, more educated, obese female as a subgroup with lower GWB, especially in blacks. These findings were not consistent with the previous non-clinical studies which found that obesity was associated with better mental health. The social stigma of being obese and the preoccupation of women with being lean may have contributed to the lower GWB in these women. ^
Resumo:
The infant mortality rate (IMR) is considered to be one of the most important indices of a country's well-being. Countries around the world and other health organizations like the World Health Organization are dedicating their resources, knowledge and energy to reduce the infant mortality rates. The well-known Millennium Development Goal 4 (MDG 4), whose aim is to archive a two thirds reduction of the under-five mortality rate between 1990 and 2015, is an example of the commitment. ^ In this study our goal is to model the trends of IMR between the 1950s to 2010s for selected countries. We would like to know how the IMR is changing overtime and how it differs across countries. ^ IMR data collected over time forms a time series. The repeated observations of IMR time series are not statistically independent. So in modeling the trend of IMR, it is necessary to account for these correlations. We proposed to use the generalized least squares method in general linear models setting to deal with the variance-covariance structure in our model. In order to estimate the variance-covariance matrix, we referred to the time-series models, especially the autoregressive and moving average models. Furthermore, we will compared results from general linear model with correlation structure to that from ordinary least squares method without taking into account the correlation structure to check how significantly the estimates change.^
Resumo:
It is well known that an identification problem exists in the analysis of age-period-cohort data because of the relationship among the three factors (date of birth + age at death = date of death). There are numerous suggestions about how to analyze the data. No one solution has been satisfactory. The purpose of this study is to provide another analytic method by extending the Cox's lifetable regression model with time-dependent covariates. The new approach contains the following features: (1) It is based on the conditional maximum likelihood procedure using a proportional hazard function described by Cox (1972), treating the age factor as the underlying hazard to estimate the parameters for the cohort and period factors. (2) The model is flexible so that both the cohort and period factors can be treated as dummy or continuous variables, and the parameter estimations can be obtained for numerous combinations of variables as in a regression analysis. (3) The model is applicable even when the time period is unequally spaced.^ Two specific models are considered to illustrate the new approach and applied to the U.S. prostate cancer data. We find that there are significant differences between all cohorts and there is a significant period effect for both whites and nonwhites. The underlying hazard increases exponentially with age indicating that old people have much higher risk than young people. A log transformation of relative risk shows that the prostate cancer risk declined in recent cohorts for both models. However, prostate cancer risk declined 5 cohorts (25 years) earlier for whites than for nonwhites under the period factor model (0 0 0 1 1 1 1). These latter results are similar to the previous study by Holford (1983).^ The new approach offers a general method to analyze the age-period-cohort data without using any arbitrary constraint in the model. ^
Resumo:
The purpose of this study is to investigate the effects of predictor variable correlations and patterns of missingness with dichotomous and/or continuous data in small samples when missing data is multiply imputed. Missing data of predictor variables is multiply imputed under three different multivariate models: the multivariate normal model for continuous data, the multinomial model for dichotomous data and the general location model for mixed dichotomous and continuous data. Subsequent to the multiple imputation process, Type I error rates of the regression coefficients obtained with logistic regression analysis are estimated under various conditions of correlation structure, sample size, type of data and patterns of missing data. The distributional properties of average mean, variance and correlations among the predictor variables are assessed after the multiple imputation process. ^ For continuous predictor data under the multivariate normal model, Type I error rates are generally within the nominal values with samples of size n = 100. Smaller samples of size n = 50 resulted in more conservative estimates (i.e., lower than the nominal value). Correlation and variance estimates of the original data are retained after multiple imputation with less than 50% missing continuous predictor data. For dichotomous predictor data under the multinomial model, Type I error rates are generally conservative, which in part is due to the sparseness of the data. The correlation structure for the predictor variables is not well retained on multiply-imputed data from small samples with more than 50% missing data with this model. For mixed continuous and dichotomous predictor data, the results are similar to those found under the multivariate normal model for continuous data and under the multinomial model for dichotomous data. With all data types, a fully-observed variable included with variables subject to missingness in the multiple imputation process and subsequent statistical analysis provided liberal (larger than nominal values) Type I error rates under a specific pattern of missing data. It is suggested that future studies focus on the effects of multiple imputation in multivariate settings with more realistic data characteristics and a variety of multivariate analyses, assessing both Type I error and power. ^
Resumo:
This cross-sectional study was undertaken to evaluate the impact in terms of HIV/STD knowledge and sexual behavior that the City of Houston HIV/STD prevention program in HISD high schools has had on students who have participated in it by comparing them with their peers who have not, based on self reports. The study further evaluated the program cost-effectiveness for averting future HIV infections by computing Cost-Utility Ratios based on reported sexual behavior. ^ Mixed results were obtained, indicating a statistically significant difference in knowledge with the intervention group having scored higher (p-value 0.001) but not for any of the behaviors assessed. The knowledge score outcome's overall p-value after adjusting for each stratifying variable (age, grade, gender and ethnicity) was statistically significant. The Odds Ratio of intervention group participants aged 15 years or more scoring 70% or higher was 1.86 times; that of intervention group female participants was 2.29 times; and that of intervention group Black/African American participants was 2.47 times relative to their comparison group counterparts. The knowledge score results remained statistically significant in the logistic regression model, which controlled for age, grade level, gender and ethnicity. The Odds Ratio in this case was 1.74. ^ Three scenarios based on the difference in the risk of HIV infection between the intervention and comparison group were used for computation of Cost-Utility Ratios: Base, worst and best-case scenario. The best-case scenario yielded cost-effective results for male participants and cost-saving results for female participants when using ethnicity-adjusted HIV prevalence. The scenario remained cost-effective for female participants when using the unadjusted HIV prevalence. ^ The challenge to the program is to devise approaches that can enhance benefits for male participants. If it is a threshold problem implying that male participants require more intensive programs for behavioral change, then programs should first be piloted among boys before being implemented across the board. If it is a reflection of gender differences, then we might have to go back to the drawing board and engage boys in focus group discussions that will help formulate more effective programs. Gender-blind approaches currently in vogue do not seem to be working. ^
Resumo:
Background. The purpose of this study was to describe the risk factors and demographics of persons with salmonellosis and shigellosis and to investigate both seasonal and spatial variations in the occurrence of these infections in Texas from 2000 to 2004, utilizing time series analyses and the geographic information system digital mapping methods. ^ Methods. Spatial Analysis: MapInfo software was used to map the distribution of age-adjusted rates of reported shigellosis and salmonellosis in Texas from 2000–2004 by zip codes. Census data on above or below poverty level, household income, highest level of educational attainment, race, ethnicity, and urban/rural community status was obtained from the 2000 Decennial Census for each zip code. The zip codes with the upper 10% and lower 10% were compared using t-tests and logistic regression to determine whether there were any potential risk factors. ^ Temporal analysis. Seasonal patterns in the prevalence of infections in Texas from 2000 to 2003 were determined by performing time-series analysis on the numbers of cases of salmonellosis and shigellosis. A linear regression was also performed to assess for trends in the incidence of each disease, along with auto-correlation and multi-component cosinor analysis. ^ Results. Spatial analysis: Analysis by general linear model showed a significant association between infection rates and age, with young children aged less than 5 and those aged 5–9 years having increased risk of infection for both disease conditions. The data demonstrated that those populations with high percentages of people who attained a higher than high school education were less likely to be represented in zip codes with high rates of shigellosis. However, for salmonellosis, logistic regression models indicated that when compared to populations with high percentages of non-high school graduates, having a high school diploma or equivalent increased the odds of having a high rate of infection. ^ Temporal analysis. For shigellosis, multi-component cosinor analyses were used to determine the approximated cosine curve which represented a statistically significant representation of the time series data for all age groups by sex. The shigellosis results show 2 peaks, with a major peak occurring in June and a secondary peak appearing around October. Salmonellosis results showed a single peak and trough in all age groups with the peak occurring in August and the trough occurring in February. ^ Conclusion. The results from this study can be used by public health agencies to determine the timing of public health awareness programs and interventions in order to prevent salmonellosis and shigellosis from occurring. Because young children depend on adults for their meals, it is important to increase the awareness of day-care workers and new parents about modes of transmission and hygienic methods of food preparation and storage. ^
Resumo:
Background. The parents of a sick child likely experience situational anxiety due to their young child being unexpectedly hospitalized. The emotional upheaval may be great enough that their anxiety inhibits them in providing positive support to their hospitalized child. Because anxiety affects psychological distress as well as behavioral distress, identifying parental distress helps parents improving their coping mechanisms. ^ Purpose. The study compared situational anxiety levels between Taiwanese fathers and mothers and focused on differences between parental anxiety levels at the beginning of the child's unplanned hospitalization and at time of discharge. The study also identified factors related to the parents' distress and use of coping mechanisms. ^ Methods. A descriptive, comparative research design was used to determine the difference between the anxiety levels of 62 Taiwanese father-mother dyads during the situational crisis of their child's unexpected hospitalization. The Mandarin version (M) of Visual Analog Scale (VAS-M), State-Trait Anxiety Inventory (STAI-M), and the Index of Parent Participation/Hospitalized Child (IPP/HC-M) were used to differentiate maternal and paternal anxiety levels and identify factors related to the parents' distress. Questionnaires were completed by parents within 24-36 hours of the child's hospital admission and within 24 hours prior to discharge. A paired t-test, two sample t-test, and linear mixed regression model were used to test and support the study hypothesis. ^ Results. The findings reveal that the mothers' anxiety levels did not significantly differ from the fathers' anxiety level when their child had a sudden admission to the hospital. In particular, parental state anxiety levels did not decrease during the child's hospital stay and subsequent discharge. Moreover, anxiety levels did not differ between parents regardless of whether the child's disease was acute or chronic. The most effective factor related to parental situational anxiety was parental perception of the severity of the child's illness. ^ Conclusions. Parental anxiety was found to be significantly related to changes in their perception of the severity of their child's illness. However, the study was not able to illustrate how parental involvement in the child's hospital care was related to parental perception of the severity of their child's illness. Future studies, using a qualitative approach to gamer more information as to what variables influence parental anxiety during a situational crisis, may provide a richer database from which to modify key variables as well as the instruments used to improve the quality of the data obtained. ^
Resumo:
Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^
Resumo:
Traditional comparison of standardized mortality ratios (SMRs) can be misleading if the age-specific mortality ratios are not homogeneous. For this reason, a regression model has been developed which incorporates the mortality ratio as a function of age. This model is then applied to mortality data from an occupational cohort study. The nature of the occupational data necessitates the investigation of mortality ratios which increase with age. These occupational data are used primarily to illustrate and develop the statistical methodology.^ The age-specific mortality ratio (MR) for the covariates of interest can be written as MR(,ij...m) = ((mu)(,ij...m)/(theta)(,ij...m)) = r(.)exp (Z('')(,ij...m)(beta)) where (mu)(,ij...m) and (theta)(,ij...m) denote the force of mortality in the study and chosen standard populations in the ij...m('th) stratum, respectively, r is the intercept, Z(,ij...m) is the vector of covariables associated with the i('th) age interval, and (beta) is a vector of regression coefficients associated with these covariables. A Newton-Raphson iterative procedure has been used for determining the maximum likelihood estimates of the regression coefficients.^ This model provides a statistical method for a logical and easily interpretable explanation of an occupational cohort mortality experience. Since it gives a reasonable fit to the mortality data, it can also be concluded that the model is fairly realistic. The traditional statistical method for the analysis of occupational cohort mortality data is to present a summary index such as the SMR under the assumption of constant (homogeneous) age-specific mortality ratios. Since the mortality ratios for occupational groups usually increase with age, the homogeneity assumption of the age-specific mortality ratios is often untenable. The traditional method of comparing SMRs under the homogeneity assumption is a special case of this model, without age as a covariate.^ This model also provides a statistical technique to evaluate the relative risk between two SMRs or a dose-response relationship among several SMRs. The model presented has application in the medical, demographic and epidemiologic areas. The methods developed in this thesis are suitable for future analyses of mortality or morbidity data when the age-specific mortality/morbidity experience is a function of age or when there is an interaction effect between confounding variables needs to be evaluated. ^