Biblioteca Digital

8 resultados para Mean Squared Error

em DigitalCommons@The Texas Medical Center

DISTRIBUTION FUNCTIONS, MEAN SQUARED ERRORS, AND CONFIDENCE LIMITS FOR RIDGE REGRESSION ESTIMATORS

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A large number of ridge regression estimators have been proposed and used with little knowledge of their true distributions. Because of this lack of knowledge, these estimators cannot be used to test hypotheses or to form confidence intervals.^ This paper presents a basic technique for deriving the exact distribution functions for a class of generalized ridge estimators. The technique is applied to five prominent generalized ridge estimators. Graphs of the resulting distribution functions are presented. The actual behavior of these estimators is found to be considerably different than the behavior which is generally assumed for ridge estimators.^ This paper also uses the derived distributions to examine the mean squared error properties of the estimators. A technique for developing confidence intervals based on the generalized ridge estimators is also presented. ^

Veja mais

Effect of nonnormal random components on level and power in group-randomized trials

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of group-randomized trials is particularly widespread in the evaluation of health care, educational, and screening strategies. Group-randomized trials represent a subset of a larger class of designs often labeled nested, hierarchical, or multilevel and are characterized by the randomization of intact social units or groups, rather than individuals. The application of random effects models to group-randomized trials requires the specification of fixed and random components of the model. The underlying assumption is usually that these random components are normally distributed. This research is intended to determine if the Type I error rate and power are affected when the assumption of normality for the random component representing the group effect is violated. ^ In this study, simulated data are used to examine the Type I error rate, power, bias and mean squared error of the estimates of the fixed effect and the observed intraclass correlation coefficient (ICC) when the random component representing the group effect possess distributions with non-normal characteristics, such as heavy tails or severe skewness. The simulated data are generated with various characteristics (e.g. number of schools per condition, number of students per school, and several within school ICCs) observed in most small, school-based, group-randomized trials. The analysis is carried out using SAS PROC MIXED, Version 6.12, with random effects specified in a random statement and restricted maximum likelihood (REML) estimation specified. The results from the non-normally distributed data are compared to the results obtained from the analysis of data with similar design characteristics but normally distributed random effects. ^ The results suggest that the violation of the normality assumption for the group component by a skewed or heavy-tailed distribution does not appear to influence the estimation of the fixed effect, Type I error, and power. Negative biases were detected when estimating the sample ICC and dramatically increased in magnitude as the true ICC increased. These biases were not as pronounced when the true ICC was within the range observed in most group-randomized trials (i.e. 0.00 to 0.05). The normally distributed group effect also resulted in bias ICC estimates when the true ICC was greater than 0.05. However, this may be a result of higher correlation within the data. ^

Veja mais

Selection bias and the cross-validation of regression models for prediction

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^

Veja mais

THE IMPROVEMENT REGION IN RIDGE REGRESSION

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One of the difficulties in the practical application of ridge regression is that, for a given data set, it is unknown whether a selected ridge estimator has smaller squared error than the least squares estimator. The concept of the improvement region is defined, and a technique is developed which obtains approximate confidence intervals for the value of ridge k which produces the maximum reduction in mean squared error. Two simulation experiments were conducted to investigate how accurate these approximate confidence intervals might be. ^

Veja mais

Changes in serum lipids with change and fluctuation in body mass index and body fat distribution in Mexican Americans with type 2 diabetes

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This dissertation was written in the format of three journal articles. Paper 1 examined the influence of change and fluctuation in body mass index (BMI) over an eleven-year period, on changes in serum lipid levels (total, HDL, and LDL cholesterol, triglyceride) in a population of Mexican Americans with type 2 diabetes. Linear regression models containing initial lipid value, BMI and age, BMI change (slope of BMI), and BMI fluctuation (root mean square error) were used to investigate associations of these variables with change in lipids over time. Increasing BMI over time was associated with gains in total and LDL cholesterol and triglyceride levels in women. Fluctuation of BMI was not associated with detrimental lipid profiles. These effects were independent of age and were not statistically significant in men. In Mexican-American women with type 2 diabetes, weight reduction is likely to result in more favorable levels of total and LDL cholesterol and triglyceride, without concern for possible detrimental effects of weight fluctuation. Weight reduction may not be as effective in men, but does not appear to be harmful either. ^ Paper 2 examined the associations of upper and total body fat with total cholesterol, HDL and LDL cholesterol, and triglyceride levels in the same population. Multilevel analysis was used to predict serum lipid levels from total body fat (BMI and triceps skinfold) and upper body fat (subscapular skinfold), while controlling for the effects of sex, age and self-correlations across time. Body fat was not strikingly associated with trends in serum lipid levels. However, upper body fat was strongly associated with triglyceride levels. This suggests that loss of upper body fat may be more important than weight loss in management of the hypertriglyceridemia commonly seen in type 2 diabetes. ^ Paper 3 was a review of the literature reporting associations between weight fluctuation and lipid levels. Few studies have reported associations between weight fluctuation and total, LDL, and HDL cholesterol and triglyceride levels. The body of evidence to date suggests that weight fluctuation does not strongly influence levels of total, LDL and HDL cholesterol and triglyceride. ^

Veja mais

Estimating the mean and standard deviation of concentration distributions using censored samples

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Environmental data sets of pollutant concentrations in air, water, and soil frequently include unquantified sample values reported only as being below the analytical method detection limit. These values, referred to as censored values, should be considered in the estimation of distribution parameters as each represents some value of pollutant concentration between zero and the detection limit. Most of the currently accepted methods for estimating the population parameters of environmental data sets containing censored values rely upon the assumption of an underlying normal (or transformed normal) distribution. This assumption can result in unacceptable levels of error in parameter estimation due to the unbounded left tail of the normal distribution. With the beta distribution, which is bounded by the same range of a distribution of concentrations, $\rm\lbrack0\le x\le1\rbrack,$ parameter estimation errors resulting from improper distribution bounds are avoided. This work developed a method that uses the beta distribution to estimate population parameters from censored environmental data sets and evaluated its performance in comparison to currently accepted methods that rely upon an underlying normal (or transformed normal) distribution. Data sets were generated assuming typical values encountered in environmental pollutant evaluation for mean, standard deviation, and number of variates. For each set of model values, data sets were generated assuming that the data was distributed either normally, lognormally, or according to a beta distribution. For varying levels of censoring, two established methods of parameter estimation, regression on normal ordered statistics, and regression on lognormal ordered statistics, were used to estimate the known mean and standard deviation of each data set. The method developed for this study, employing a beta distribution assumption, was also used to estimate parameters and the relative accuracy of all three methods were compared. For data sets of all three distribution types, and for censoring levels up to 50%, the performance of the new method equaled, if not exceeded, the performance of the two established methods. Because of its robustness in parameter estimation regardless of distribution type or censoring level, the method employing the beta distribution should be considered for full development in estimating parameters for censored environmental data sets. ^

Veja mais

Multiple imputation for missing continuous and dichotomous data: The effects of patterns of missingness and variable correlations on type I error with small samples

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of this study is to investigate the effects of predictor variable correlations and patterns of missingness with dichotomous and/or continuous data in small samples when missing data is multiply imputed. Missing data of predictor variables is multiply imputed under three different multivariate models: the multivariate normal model for continuous data, the multinomial model for dichotomous data and the general location model for mixed dichotomous and continuous data. Subsequent to the multiple imputation process, Type I error rates of the regression coefficients obtained with logistic regression analysis are estimated under various conditions of correlation structure, sample size, type of data and patterns of missing data. The distributional properties of average mean, variance and correlations among the predictor variables are assessed after the multiple imputation process. ^ For continuous predictor data under the multivariate normal model, Type I error rates are generally within the nominal values with samples of size n = 100. Smaller samples of size n = 50 resulted in more conservative estimates (i.e., lower than the nominal value). Correlation and variance estimates of the original data are retained after multiple imputation with less than 50% missing continuous predictor data. For dichotomous predictor data under the multinomial model, Type I error rates are generally conservative, which in part is due to the sparseness of the data. The correlation structure for the predictor variables is not well retained on multiply-imputed data from small samples with more than 50% missing data with this model. For mixed continuous and dichotomous predictor data, the results are similar to those found under the multivariate normal model for continuous data and under the multinomial model for dichotomous data. With all data types, a fully-observed variable included with variables subject to missingness in the multiple imputation process and subsequent statistical analysis provided liberal (larger than nominal values) Type I error rates under a specific pattern of missing data. It is suggested that future studies focus on the effects of multiple imputation in multivariate settings with more realistic data characteristics and a variety of multivariate analyses, assessing both Type I error and power. ^

Veja mais

Evaluation of risk factors for long term changes in refractive error among adult patients at a Dallas Texas private setting A retrospective chart-review study

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background. Over 39.9% of the adult population forty or older in the United States has refractive error, little is known about the etiology of this condition and associated risk factors and their entailed mechanism due to the paucity of data regarding the changes of refractive error for the adult population over time.^ Aim. To evaluate risk factors over a long term, 5-year period, in refractive error changes among persons 43 or older by testing the hypothesis that age, gender, systemic diseases, nuclear sclerosis and baseline refractive errors are all significantly associated with refractive errors changes in patients at a Dallas, Texas private optometric office.^ Methods. A retrospective chart review of subjective refraction, eye health, and self-report health history was done on patients at a private optometric office who were 43 or older in 2000 who had eye examinations both in 2000 and 2005. Aphakic and pseudophakic eyes were excluded as well as eyes with best corrected Snellen visual acuity of 20/40 and worse. After exclusions, refraction was obtained on 114 right eyes and 114 left eyes. Spherical equivalent (sum of sphere + ½ cylinder) was used as the measure of refractive error.^ Results. Similar changes in refractive error were observed for the two eyes. The 5-year change in spherical power was in a hyperopic direction for younger age groups and in a myopic direction for older subjects, P<0.0001. The gender-adjusted mean change in refractive error in right eyes of persons aged 43 to 54, 55 to 64, 65 to 74, and 75 or older at baseline was +0.43D, +0.46 D, -0.09 D, and -0.23D, respectively. Refractive change was strongly related to baseline nuclear cataract severity; grades 4 to 5 were associated with a myopic shift (-0.38 D, P< 0.0001). The mean age-adjusted change in refraction was +0.27 D for hyperopic eyes, +0.56 D for emmetropic eyes, and +0.26 D for myopic eyes.^ Conclusions. This report has documented refractive error changes in an older population and confirmed reported trends of a hyperopic shift before age 65 and a myopic shift thereafter associated with the development of nuclear cataract.^

Veja mais

8 resultados para Mean Squared Error

em DigitalCommons@The Texas Medical Center

Filtro por publicador