5 resultados para Asymptotic Mean Squared Errors

em DigitalCommons@The Texas Medical Center


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A large number of ridge regression estimators have been proposed and used with little knowledge of their true distributions. Because of this lack of knowledge, these estimators cannot be used to test hypotheses or to form confidence intervals.^ This paper presents a basic technique for deriving the exact distribution functions for a class of generalized ridge estimators. The technique is applied to five prominent generalized ridge estimators. Graphs of the resulting distribution functions are presented. The actual behavior of these estimators is found to be considerably different than the behavior which is generally assumed for ridge estimators.^ This paper also uses the derived distributions to examine the mean squared error properties of the estimators. A technique for developing confidence intervals based on the generalized ridge estimators is also presented. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of group-randomized trials is particularly widespread in the evaluation of health care, educational, and screening strategies. Group-randomized trials represent a subset of a larger class of designs often labeled nested, hierarchical, or multilevel and are characterized by the randomization of intact social units or groups, rather than individuals. The application of random effects models to group-randomized trials requires the specification of fixed and random components of the model. The underlying assumption is usually that these random components are normally distributed. This research is intended to determine if the Type I error rate and power are affected when the assumption of normality for the random component representing the group effect is violated. ^ In this study, simulated data are used to examine the Type I error rate, power, bias and mean squared error of the estimates of the fixed effect and the observed intraclass correlation coefficient (ICC) when the random component representing the group effect possess distributions with non-normal characteristics, such as heavy tails or severe skewness. The simulated data are generated with various characteristics (e.g. number of schools per condition, number of students per school, and several within school ICCs) observed in most small, school-based, group-randomized trials. The analysis is carried out using SAS PROC MIXED, Version 6.12, with random effects specified in a random statement and restricted maximum likelihood (REML) estimation specified. The results from the non-normally distributed data are compared to the results obtained from the analysis of data with similar design characteristics but normally distributed random effects. ^ The results suggest that the violation of the normality assumption for the group component by a skewed or heavy-tailed distribution does not appear to influence the estimation of the fixed effect, Type I error, and power. Negative biases were detected when estimating the sample ICC and dramatically increased in magnitude as the true ICC increased. These biases were not as pronounced when the true ICC was within the range observed in most group-randomized trials (i.e. 0.00 to 0.05). The normally distributed group effect also resulted in bias ICC estimates when the true ICC was greater than 0.05. However, this may be a result of higher correlation within the data. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One of the difficulties in the practical application of ridge regression is that, for a given data set, it is unknown whether a selected ridge estimator has smaller squared error than the least squares estimator. The concept of the improvement region is defined, and a technique is developed which obtains approximate confidence intervals for the value of ridge k which produces the maximum reduction in mean squared error. Two simulation experiments were conducted to investigate how accurate these approximate confidence intervals might be. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Environmental data sets of pollutant concentrations in air, water, and soil frequently include unquantified sample values reported only as being below the analytical method detection limit. These values, referred to as censored values, should be considered in the estimation of distribution parameters as each represents some value of pollutant concentration between zero and the detection limit. Most of the currently accepted methods for estimating the population parameters of environmental data sets containing censored values rely upon the assumption of an underlying normal (or transformed normal) distribution. This assumption can result in unacceptable levels of error in parameter estimation due to the unbounded left tail of the normal distribution. With the beta distribution, which is bounded by the same range of a distribution of concentrations, $\rm\lbrack0\le x\le1\rbrack,$ parameter estimation errors resulting from improper distribution bounds are avoided. This work developed a method that uses the beta distribution to estimate population parameters from censored environmental data sets and evaluated its performance in comparison to currently accepted methods that rely upon an underlying normal (or transformed normal) distribution. Data sets were generated assuming typical values encountered in environmental pollutant evaluation for mean, standard deviation, and number of variates. For each set of model values, data sets were generated assuming that the data was distributed either normally, lognormally, or according to a beta distribution. For varying levels of censoring, two established methods of parameter estimation, regression on normal ordered statistics, and regression on lognormal ordered statistics, were used to estimate the known mean and standard deviation of each data set. The method developed for this study, employing a beta distribution assumption, was also used to estimate parameters and the relative accuracy of all three methods were compared. For data sets of all three distribution types, and for censoring levels up to 50%, the performance of the new method equaled, if not exceeded, the performance of the two established methods. Because of its robustness in parameter estimation regardless of distribution type or censoring level, the method employing the beta distribution should be considered for full development in estimating parameters for censored environmental data sets. ^