3 resultados para assessment evaluation
em DigitalCommons@University of Nebraska - Lincoln
Resumo:
The 3PL model is a flexible and widely used tool in assessment. However, it suffers from limitations due to its need for large sample sizes. This study introduces and evaluates the efficacy of a new sample size augmentation technique called Duplicate, Erase, and Replace (DupER) Augmentation through a simulation study. Data are augmented using several variations of DupER Augmentation (based on different imputation methodologies, deletion rates, and duplication rates), analyzed in BILOG-MG 3, and results are compared to those obtained from analyzing the raw data. Additional manipulated variables include test length and sample size. Estimates are compared using seven different evaluative criteria. Results are mixed and inconclusive. DupER augmented data tend to result in larger root mean squared errors (RMSEs) and lower correlations between estimates and parameters for both item and ability parameters. However, some DupER variations produce estimates that are much less biased than those obtained from the raw data alone. For one DupER variation, it was found that DupER produced better results for low-ability simulees and worse results for those with high abilities. Findings, limitations, and recommendations for future studies are discussed. Specific recommendations for future studies include the application of Duper Augmentation (1) to empirical data, (2) with additional IRT models, and (3) the analysis of the efficacy of the procedure for different item and ability parameter distributions.
Resumo:
This mixed methods concurrent triangulation design study was predicated upon two models that advocated a connection between teaching presence and perceived learning: the Community of Inquiry Model of Online Learning developed by Garrison, Anderson, and Archer (2000); and the Online Interaction Learning Model by Benbunan-Fich, Hiltz, and Harasim (2005). The objective was to learn how teaching presence impacted students’ perceptions of learning and sense of community in intensive online distance education courses developed and taught by instructors at a regional comprehensive university. In the quantitative phase online surveys collected relevant data from participating students (N = 397) and selected instructional faculty (N = 32) during the second week of a three-week Winter Term. Student information included: demographics such as age, gender, employment status, and distance from campus; perceptions of teaching presence; sense of community; perceived learning; course length; and course type. The students claimed having positive relationships between teaching presence, perceived learning, and sense of community. The instructors showed similar positive relationships with no significant differences when the student and instructor data were compared. The qualitative phase consisted of interviews with 12 instructors who had completed the online survey and replied to all of the open-response questions. The two phases were integrated using a matrix generation, and the analysis allowed for conclusions regarding teaching presence, perceived learning, and sense of community. The findings were equivocal with regard to satisfaction with course length and the relative importance of the teaching presence components. A model was provided depicting relationships between and among teaching presence components, perceived learning, and sense of community in intensive online courses.
Resumo:
Evaluations of measurement invariance provide essential construct validity evidence. However, the quality of such evidence is partly dependent upon the validity of the resulting statistical conclusions. The presence of Type I or Type II errors can render measurement invariance conclusions meaningless. The purpose of this study was to determine the effects of categorization and censoring on the behavior of the chi-square/likelihood ratio test statistic and two alternative fit indices (CFI and RMSEA) under the context of evaluating measurement invariance. Monte Carlo simulation was used to examine Type I error and power rates for the (a) overall test statistic/fit indices, and (b) change in test statistic/fit indices. Data were generated according to a multiple-group single-factor CFA model across 40 conditions that varied by sample size, strength of item factor loadings, and categorization thresholds. Seven different combinations of model estimators (ML, Yuan-Bentler scaled ML, and WLSMV) and specified measurement scales (continuous, censored, and categorical) were used to analyze each of the simulation conditions. As hypothesized, non-normality increased Type I error rates for the continuous scale of measurement and did not affect error rates for the categorical scale of measurement. Maximum likelihood estimation combined with a categorical scale of measurement resulted in more correct statistical conclusions than the other analysis combinations. For the continuous and censored scales of measurement, the Yuan-Bentler scaled ML resulted in more correct conclusions than normal-theory ML. The censored measurement scale did not offer any advantages over the continuous measurement scale. Comparing across fit statistics and indices, the chi-square-based test statistics were preferred over the alternative fit indices, and ΔRMSEA was preferred over ΔCFI. Results from this study should be used to inform the modeling decisions of applied researchers. However, no single analysis combination can be recommended for all situations. Therefore, it is essential that researchers consider the context and purpose of their analyses.