2 resultados para degenerate test set
em DRUM (Digital Repository at the University of Maryland)
Resumo:
Previous research found personality test scores to be inflated on average among individuals who were motivated to present themselves in a desirable fashion in high stakes situations, such as during the employee selection process. One apparently effective way to reduce the undesirable test score inflation in such situations was to warn participants against faking. This research set out to investigate whether warning against faking would indeed affect personality test scores in the theoretically expected fashion. Contrary to expectations, the results did not support the hypothesized causal chain. Results across three studies show that while a warning may lower test scores in participants motivated to respond desirably (i.e., to fake), the effect of warning on test scores was not fully mediated by: a reduction in motivation to do well and self-reports of exaggerated responses in the personality test. Theoretical and practical implications are discussed.
Resumo:
Modern software application testing, such as the testing of software driven by graphical user interfaces (GUIs) or leveraging event-driven architectures in general, requires paying careful attention to context. Model-based testing (MBT) approaches first acquire a model of an application, then use the model to construct test cases covering relevant contexts. A major shortcoming of state-of-the-art automated model-based testing is that many test cases proposed by the model are not actually executable. These \textit{infeasible} test cases threaten the integrity of the entire model-based suite, and any coverage of contexts the suite aims to provide. In this research, I develop and evaluate a novel approach for classifying the feasibility of test cases. I identify a set of pertinent features for the classifier, and develop novel methods for extracting these features from the outputs of MBT tools. I use a supervised logistic regression approach to obtain a model of test case feasibility from a randomly selected training suite of test cases. I evaluate this approach with a set of experiments. The outcomes of this investigation are as follows: I confirm that infeasibility is prevalent in MBT, even for test suites designed to cover a relatively small number of unique contexts. I confirm that the frequency of infeasibility varies widely across applications. I develop and train a binary classifier for feasibility with average overall error, false positive, and false negative rates under 5\%. I find that unique event IDs are key features of the feasibility classifier, while model-specific event types are not. I construct three types of features from the event IDs associated with test cases, and evaluate the relative effectiveness of each within the classifier. To support this study, I also develop a number of tools and infrastructure components for scalable execution of automated jobs, which use state-of-the-art container and continuous integration technologies to enable parallel test execution and the persistence of all experimental artifacts.