44 resultados para Stepwise regression
Resumo:
The standard analyses of survival data involve the assumption that survival and censoring are independent. When censoring and survival are related, the phenomenon is known as informative censoring. This paper examines the effects of an informative censoring assumption on the hazard function and the estimated hazard ratio provided by the Cox model.^ The limiting factor in all analyses of informative censoring is the problem of non-identifiability. Non-identifiability implies that it is impossible to distinguish a situation in which censoring and death are independent from one in which there is dependence. However, it is possible that informative censoring occurs. Examination of the literature indicates how others have approached the problem and covers the relevant theoretical background.^ Three models are examined in detail. The first model uses conditionally independent marginal hazards to obtain the unconditional survival function and hazards. The second model is based on the Gumbel Type A method for combining independent marginal distributions into bivariate distributions using a dependency parameter. Finally, a formulation based on a compartmental model is presented and its results described. For the latter two approaches, the resulting hazard is used in the Cox model in a simulation study.^ The unconditional survival distribution formed from the first model involves dependency, but the crude hazard resulting from this unconditional distribution is identical to the marginal hazard, and inferences based on the hazard are valid. The hazard ratios formed from two distributions following the Gumbel Type A model are biased by a factor dependent on the amount of censoring in the two populations and the strength of the dependency of death and censoring in the two populations. The Cox model estimates this biased hazard ratio. In general, the hazard resulting from the compartmental model is not constant, even if the individual marginal hazards are constant, unless censoring is non-informative. The hazard ratio tends to a specific limit.^ Methods of evaluating situations in which informative censoring is present are described, and the relative utility of the three models examined is discussed. ^
Resumo:
Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^
Resumo:
This dissertation develops and explores the methodology for the use of cubic spline functions in assessing time-by-covariate interactions in Cox proportional hazards regression models. These interactions indicate violations of the proportional hazards assumption of the Cox model. Use of cubic spline functions allows for the investigation of the shape of a possible covariate time-dependence without having to specify a particular functional form. Cubic spline functions yield both a graphical method and a formal test for the proportional hazards assumption as well as a test of the nonlinearity of the time-by-covariate interaction. Five existing methods for assessing violations of the proportional hazards assumption are reviewed and applied along with cubic splines to three well known two-sample datasets. An additional dataset with three covariates is used to explore the use of cubic spline functions in a more general setting. ^
Resumo:
A Bayesian approach to estimation of the regression coefficients of a multinominal logit model with ordinal scale response categories is presented. A Monte Carlo method is used to construct the posterior distribution of the link function. The link function is treated as an arbitrary scalar function. Then the Gauss-Markov theorem is used to determine a function of the link which produces a random vector of coefficients. The posterior distribution of the random vector of coefficients is used to estimate the regression coefficients. The method described is referred to as a Bayesian generalized least square (BGLS) analysis. Two cases involving multinominal logit models are described. Case I involves a cumulative logit model and Case II involves a proportional-odds model. All inferences about the coefficients for both cases are described in terms of the posterior distribution of the regression coefficients. The results from the BGLS method are compared to maximum likelihood estimates of the regression coefficients. The BGLS method avoids the nonlinear problems encountered when estimating the regression coefficients of a generalized linear model. The method is not complex or computationally intensive. The BGLS method offers several advantages over Bayesian approaches. ^
Resumo:
Dropout from obesity treatment has been a major factor associated with weight control failure, with few reliable predictors of dropouts or completers. Previous studies have tended to treat obese people as a homogeneous group with standard behavior modification-based interventions. Current research indicates there may be subgroups within the obese population, binge eaters and nonbinge eaters, who have different dropout rates. Current studies also recommend focusing on the subset of this subgroup that does not engage in purging (vomiting, laxative abuse, or excessive exercise) to compensate for binge eating. This research uses a secondary dataset (N = 156) from a prospective study in which participants were randomized to a Food Dependency (FD) and a Behavioral Self-Management (BSM) group for weight reduction. Criteria for subjects in the original study included (1) scoring higher on the existing Binge Eating Scale (BES) in order to ensure enrollment of more binge eaters and (2) no compensatory purging behavior for binge eating. Subjects were then reclassified in this study as binge eaters or nonbinge eaters using the more stringent proposed 1994 DSM-IV criteria for Binge Eating Disorder (BED). Subjects were followed for dropout. Variables studied were binge status, age at obesity onset, age at study baseline, class instructor, number of previous weight loss attempts, race, marital status, body mass index (BMI kg/m$\sp2$), type of intervention, work status, educational level, and social support. Stepwise backward regression Cox survival analysis indicated binge status had a consistent, statistically significant protective effect on dropout in which binge eaters were half as likely to dropout versus nonbinge eaters (p = 0.04). Cox proportional hazards analysis indicated no statistical difference in dropout by type of intervention (FD, p = 0.13; BSM, p = 0.80) when controlling for binge status. All other variables did not reach significance, which is consistent with the literature. Implications of these findings suggest that (1) the proposed 1994 DSM-IV criteria for BED is a more useful classification that the existing DSM-III-R criteria, and (2) the identification of subgroups among obese subjects is an important step in dropout and weight loss intervention research. Future research can confirm this finding. ^
Resumo:
Logistic regression is one of the most important tools in the analysis of epidemiological and clinical data. Such data often contain missing values for one or more variables. Common practice is to eliminate all individuals for whom any information is missing. This deletion approach does not make efficient use of available information and often introduces bias.^ Two methods were developed to estimate logistic regression coefficients for mixed dichotomous and continuous covariates including partially observed binary covariates. The data were assumed missing at random (MAR). One method (PD) used predictive distribution as weight to calculate the average of the logistic regressions performing on all possible values of missing observations, and the second method (RS) used a variant of resampling technique. Additional seven methods were compared with these two approaches in a simulation study. They are: (1) Analysis based on only the complete cases, (2) Substituting the mean of the observed values for the missing value, (3) An imputation technique based on the proportions of observed data, (4) Regressing the partially observed covariates on the remaining continuous covariates, (5) Regressing the partially observed covariates on the remaining continuous covariates conditional on response variable, (6) Regressing the partially observed covariates on the remaining continuous covariates and response variable, and (7) EM algorithm. Both proposed methods showed smaller standard errors (s.e.) for the coefficient involving the partially observed covariate and for the other coefficients as well. However, both methods, especially PD, are computationally demanding; thus for analysis of large data sets with partially observed covariates, further refinement of these approaches is needed. ^
Resumo:
A large number of ridge regression estimators have been proposed and used with little knowledge of their true distributions. Because of this lack of knowledge, these estimators cannot be used to test hypotheses or to form confidence intervals.^ This paper presents a basic technique for deriving the exact distribution functions for a class of generalized ridge estimators. The technique is applied to five prominent generalized ridge estimators. Graphs of the resulting distribution functions are presented. The actual behavior of these estimators is found to be considerably different than the behavior which is generally assumed for ridge estimators.^ This paper also uses the derived distributions to examine the mean squared error properties of the estimators. A technique for developing confidence intervals based on the generalized ridge estimators is also presented. ^
Resumo:
The main objective of this study was to attempt to develop some indicators for measuring the food safety status of a country. A conceptual model was put forth by the investigator. The assumption was that food safety status was multifactorily influenced by medico-health levels, food-nutrition programs, and consumer protection activities. However, all these in turn depended upon socio-economic status of the country.^ Twenty-six indicators were reviewed and examined. Seventeen were first screened and three were finally selected, by the stepwise multiple regression analysis, to reflect the food safety status. Sixty-one countries/areas were included in this study.^ The three indicators were life expectancy at birth with multiple correlation coefficient (R2 = 34.62%), adult literacy rate (R2 = 29.66%), and child mortality rate for ages 1-4 (R2 = 9.99%). They showed a cumulative R2 of 57.79%. ^
Resumo:
Few, if any studies, have attempted to identify the specific environmental factors associated with the incidence of diarrheal disease and to rank these by their contribution to the total incidence of diarrheal illness. Potentially those factors with the greatest contribution are the variables on which intervention could be expected to have the greatest impact on the incidence of diarrhea.^ In 317 rural Egyptian households participating in a longitudinal study of diarrheal disease, selected environmental characteristics were observed and recorded on a questionnaire. Characteristics of the environment were classified into seven categories including water usage, proximity of animals to the house, waste management, food preparation area, toilet area, the household structure and hygiene. The variables from each of the seven major groupings most associated with the incidence of diarrhea in infants were selected through the application of stepwise multiple regression. Each area was then ranked by the portion of the incidence of diarrhea in infants that each composite group of area-specific variables alone would explain. The groups of household structure and water usage variables were found to be more associated with the incidence of diarrhea in infants than variables describing the toilet area, proximity to animals or others. It was also found that 24.7% of the total variance in incidence of diarrheal illness was explained by environmental variables. ^
Resumo:
An investigation was undertaken to determine the chemical characterization of inhalable particulate matter in the Houston area, with special emphasis on source identification and apportionment of outdoor and indoor atmospheric aerosols using multivariate statistical analyses.^ Fine (<2.5 (mu)m) particle aerosol samples were collected by means of dichotomous samplers at two fixed site (Clear Lake and Sunnyside) ambient monitoring stations and one mobile monitoring van in the Houston area during June-October 1981 as part of the Houston Asthma Study. The mobile van allowed particulate sampling to take place both inside and outside of twelve homes.^ The samples collected for 12-h sampling on a 7 AM-7 PM and 7 PM-7 AM (CDT) schedule were analyzed for mass, trace elements, and two anions. Mass was determined gravimetrically. An energy-dispersive X-ray fluorescence (XRF) spectrometer was used for determination of elemental composition. Ion chromatography (IC) was used to determine sulfate and nitrate.^ Average chemical compositions of fine aerosol at each site were presented. Sulfate was found to be the largest single component in the fine fraction mass, comprising approximately 30% of the fine mass outdoors and 12% indoors, respectively.^ Principal components analysis (PCA) was applied to identify sources of aerosols and to assess the role of meteorological factors on the variation in particulate samples. The results suggested that meteorological parameters were not associated with sources of aerosol samples collected at these Houston sites.^ Source factor contributions to fine mass were calculated using a combination of PCA and stepwise multivariate regression analysis. It was found that much of the total fine mass was apparently contributed by sulfate-related aerosols. The average contributions to the fine mass coming from the sulfate-related aerosols were 56% of the Houston outdoor ambient fine particulate matter and 26% of the indoor fine particulate matter.^ Characterization of indoor aerosol in residential environments was compared with the results for outdoor aerosols. It was suggested that much of the indoor aerosol may be due to outdoor sources, but there may be important contributions from common indoor sources in the home environment such as smoking and gas cooking. ^
Resumo:
This study described the relationship of sexual maturation and blood pressure in a sample (n = 361) of white females, ages seven through 18, attending public schools in a defined area of Central Texas during October through December, 1984. Other correlates of blood pressure were also described for this sample.^ A survey was performed to obtain the data on height, weight, body mass, pulse rate, upper arm circumference and length, and blood pressure. Each subject self-assessed her secondary sex characteristics (breast and pubic hair) according to drawings of the Tanner stages of maturation. The subjects were interviewed to obtain data on personal health habits and menstrual status. Student age, ethnic group and place of residence were abstracted from school records. Parents or guardians of the subjects responded to a questionnaire pertaining to parental and subject health history and parents' occupation and educational attainment.^ In the simple linear regression analysis, sexual maturation and variables of body size were significantly (p < 0.001) and positively associated with systolic and fourth- and fifth-phase diastolic blood pressure. The demographic and socioeconomic variables were not sufficiently variant in this population to have differential effects on the relation between blood pressure and maturation. Stepwise multiple regression was used to assess the contribution of sexual maturation to the variance of blood pressure after accounting for the variables of body size. Sexual maturation (breast stage) along with weight, height and body mass remained in the multiple regression models for fourth- and fifth-phase diastolic blood pressure. Only height and body mass remained in the regression model for systolic blood pressure; sexual maturation did not contribute more to the explanation of the systolic blood pressure variance.^ The association of sexual maturation with blood pressure level was established in this sample of young white females. More research is needed first, to determine if this relationship prevails in other populations of young females, and second, to determine the relationship of sexual maturation sequence and change with the change of blood pressure during childhood and adolescence. ^
Resumo:
The history of the logistic function since its introduction in 1838 is reviewed, and the logistic model for a polychotomous response variable is presented with a discussion of the assumptions involved in its derivation and use. Following this, the maximum likelihood estimators for the model parameters are derived along with a Newton-Raphson iterative procedure for evaluation. A rigorous mathematical derivation of the limiting distribution of the maximum likelihood estimators is then presented using a characteristic function approach. An appendix with theorems on the asymptotic normality of sample sums when the observations are not identically distributed, with proofs, supports the presentation on asymptotic properties of the maximum likelihood estimators. Finally, two applications of the model are presented using data from the Hypertension Detection and Follow-up Program, a prospective, population-based, randomized trial of treatment for hypertension. The first application compares the risk of five-year mortality from cardiovascular causes with that from noncardiovascular causes; the second application compares risk factors for fatal or nonfatal coronary heart disease with those for fatal or nonfatal stroke. ^
Resumo:
The research project is an extension of a series of administrative science and health care research projects evaluating the influence of external context, organizational strategy, and organizational structure upon organizational success or performance. The research will rely on the assumption that there is not one single best approach to the management of organizations (the contingency theory). As organizational effectiveness is dependent on an appropriate mix of factors, organizations may be equally effective based on differing combinations of factors. The external context of the organization is expected to influence internal organizational strategy and structure and in turn the internal measures affect performance (discriminant theory). The research considers the relationship of external context and organization performance.^ The unit of study for the research will be the health maintenance organization (HMO); an organization the accepts in exchange for a fixed, advance capitation payment, contractual responsibility to assure the delivery of a stated range of health sevices to a voluntary enrolled population. With the current Federal resurgence of interest in the Health Maintenance Organization (HMO) as a major component in the health care system, attention must be directed at maximizing development of HMOs from the limited resources available. Increased skills are needed in both Federal and private evaluation of HMO feasibility in order to prevent resource investment and in projects that will fail while concurrently identifying potentially successful projects that will not be considered using current standards.^ The research considers 192 factors measuring contextual milieu (social, educational, economic, legal, demographic, health and technological factors). Through intercorrelation and principle components data reduction techniques this was reduced to 12 variables. Two measures of HMO performance were identified, they are (1) HMO status (operational or defunct), and (2) a principle components factor score considering eight measures of performance. The relationship between HMO context and performance was analysed using correlation and stepwise multiple regression methods. In each case it has been concluded that the external contextual variables are not predictive of success or failure of study Health Maintenance Organizations. This suggests that performance of an HMO may rely on internal organizational factors. These findings have policy implications as contextual measures are used as a major determinant in HMO feasibility analysis, and as a factor in the allocation of limited Federal funds. ^
Resumo:
Traditional comparison of standardized mortality ratios (SMRs) can be misleading if the age-specific mortality ratios are not homogeneous. For this reason, a regression model has been developed which incorporates the mortality ratio as a function of age. This model is then applied to mortality data from an occupational cohort study. The nature of the occupational data necessitates the investigation of mortality ratios which increase with age. These occupational data are used primarily to illustrate and develop the statistical methodology.^ The age-specific mortality ratio (MR) for the covariates of interest can be written as MR(,ij...m) = ((mu)(,ij...m)/(theta)(,ij...m)) = r(.)exp (Z('')(,ij...m)(beta)) where (mu)(,ij...m) and (theta)(,ij...m) denote the force of mortality in the study and chosen standard populations in the ij...m('th) stratum, respectively, r is the intercept, Z(,ij...m) is the vector of covariables associated with the i('th) age interval, and (beta) is a vector of regression coefficients associated with these covariables. A Newton-Raphson iterative procedure has been used for determining the maximum likelihood estimates of the regression coefficients.^ This model provides a statistical method for a logical and easily interpretable explanation of an occupational cohort mortality experience. Since it gives a reasonable fit to the mortality data, it can also be concluded that the model is fairly realistic. The traditional statistical method for the analysis of occupational cohort mortality data is to present a summary index such as the SMR under the assumption of constant (homogeneous) age-specific mortality ratios. Since the mortality ratios for occupational groups usually increase with age, the homogeneity assumption of the age-specific mortality ratios is often untenable. The traditional method of comparing SMRs under the homogeneity assumption is a special case of this model, without age as a covariate.^ This model also provides a statistical technique to evaluate the relative risk between two SMRs or a dose-response relationship among several SMRs. The model presented has application in the medical, demographic and epidemiologic areas. The methods developed in this thesis are suitable for future analyses of mortality or morbidity data when the age-specific mortality/morbidity experience is a function of age or when there is an interaction effect between confounding variables needs to be evaluated. ^