7 resultados para text and data mining
em Brock University, Canada
Resumo:
Feature selection plays an important role in knowledge discovery and data mining nowadays. In traditional rough set theory, feature selection using reduct - the minimal discerning set of attributes - is an important area. Nevertheless, the original definition of a reduct is restrictive, so in one of the previous research it was proposed to take into account not only the horizontal reduction of information by feature selection, but also a vertical reduction considering suitable subsets of the original set of objects. Following the work mentioned above, a new approach to generate bireducts using a multi--objective genetic algorithm was proposed. Although the genetic algorithms were used to calculate reduct in some previous works, we did not find any work where genetic algorithms were adopted to calculate bireducts. Compared to the works done before in this area, the proposed method has less randomness in generating bireducts. The genetic algorithm system estimated a quality of each bireduct by values of two objective functions as evolution progresses, so consequently a set of bireducts with optimized values of these objectives was obtained. Different fitness evaluation methods and genetic operators, such as crossover and mutation, were applied and the prediction accuracies were compared. Five datasets were used to test the proposed method and two datasets were used to perform a comparison study. Statistical analysis using the one-way ANOVA test was performed to determine the significant difference between the results. The experiment showed that the proposed method was able to reduce the number of bireducts necessary in order to receive a good prediction accuracy. Also, the influence of different genetic operators and fitness evaluation strategies on the prediction accuracy was analyzed. It was shown that the prediction accuracies of the proposed method are comparable with the best results in machine learning literature, and some of them outperformed it.
Resumo:
This study examined the effects of providing students with explicit instruction in how to use a repertoire of reading comprehension strategies and test taking skills when reading and responding to three types of questions (direct, inferential, critical). Specifically, the study examined whether providing students with a "model" of how to read and respond to the text and to the comprehension questions improved their reading comprehension relative to providing them with implicit instruction on reading comprehension strategies and test taking skills. Students' reading comprehension and test taking performance scores were compared as a function of instructional condition. Students from 2 grade 8 classes participated in this study. The reading component of the Canadian Achievement Tests, Third Edition (CAT/3) was used to identify students' level of reading comprehension prior to the formal instructional sessions. Students received either explicit instruction, which involved modelling, or implicit instruction, which consisted of review and discussion of the strategies to be used. Comprehension was measured through the administration of formative tests after each instructional session. The formative tests consisted of reading comprehension questions pertaining to a specific form of text (narrative, informational, graphic). In addition, students completed 3 summative tests and a delayed comprehension test which consisted of the alternative version of the CAT/3 standardized reading assessment. These data served as a posttest measure to determine whether students had shown an improvement in their reading comprehension skills as a result of the program delivery. There were significant differences in students' Canadian Achievement Test performance scores prior to the onset of the study. Students in the implicit group attained significantly higher comprehension scores than did students in the explicit group. The results from the program sessions indicated no significant differences in reading comprehension between the implicit and explicit conditions, with the exception of the 6th session involving the reading and interpreting of graphic text. Students in the explicit group performed significantly better when reading and interpreting graphic text than those in the implicit group. No significant differences were evident between the two study conditions across the three summative tests. Upon completion of the study, the results from the Canadian Achievement Test indicated no significant differences in performance between the two study conditions. The findings from this study reveal the effectiveness of providing students with explicit strategy instruction when reading and responding to various forms of text. Modelling the appropriate reading comprehension strategies and test taking skills enabled students to apply the same thought processes to their own independent work. This form of instruction enabled students in the explicit group to improve in their abilities to comprehend and respond to text and therefore should be incorporated as an effective form of classroom teaching.
Resumo:
Spatial data representation and compression has become a focus issue in computer graphics and image processing applications. Quadtrees, as one of hierarchical data structures, basing on the principle of recursive decomposition of space, always offer a compact and efficient representation of an image. For a given image, the choice of quadtree root node plays an important role in its quadtree representation and final data compression. The goal of this thesis is to present a heuristic algorithm for finding a root node of a region quadtree, which is able to reduce the number of leaf nodes when compared with the standard quadtree decomposition. The empirical results indicate that, this proposed algorithm has quadtree representation and data compression improvement when in comparison with the traditional method.
Resumo:
Mobile augmented reality applications are increasingly utilized as a medium for enhancing learning and engagement in history education. Although these digital devices facilitate learning through immersive and appealing experiences, their design should be driven by theories of learning and instruction. We provide an overview of an evidence-based approach to optimize the development of mobile augmented reality applications that teaches students about history. Our research aims to evaluate and model the impacts of design parameters towards learning and engagement. The research program is interdisciplinary in that we apply techniques derived from design-based experiments and educational data mining. We outline the methodological and analytical techniques as well as discuss the implications of the anticipated findings.
Resumo:
Mobile augmented reality applications are increasingly utilized as a medium for enhancing learning and engagement in history education. Although these digital devices facilitate learning through immersive and appealing experiences, their design should be driven by theories of learning and instruction. We provide an overview of an evidence-based approach to optimize the development of mobile augmented reality applications that teaches students about history. Our research aims to evaluate and model the impacts of design parameters towards learning and engagement. The research program is interdisciplinary in that we apply techniques derived from design-based experiments and educational data mining. We outline the methodological and analytical techniques as well as discuss the implications of the anticipated findings.
Resumo:
The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and deterministic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel metaheuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS metaheuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.
Resumo:
The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and determinis- tic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel meta–heuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS meta–heuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.