5 resultados para Learning algorithm


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background and aims: Machine learning techniques for the text mining of cancer-related clinical documents have not been sufficiently explored. Here some techniques are presented for the pre-processing of free-text breast cancer pathology reports, with the aim of facilitating the extraction of information relevant to cancer staging.

Materials and methods: The first technique was implemented using the freely available software RapidMiner to classify the reports according to their general layout: ‘semi-structured’ and ‘unstructured’. The second technique was developed using the open source language engineering framework GATE and aimed at the prediction of chunks of the report text containing information pertaining to the cancer morphology, the tumour size, its hormone receptor status and the number of positive nodes. The classifiers were trained and tested respectively on sets of 635 and 163 manually classified or annotated reports, from the Northern Ireland Cancer Registry.

Results: The best result of 99.4% accuracy – which included only one semi-structured report predicted as unstructured – was produced by the layout classifier with the k nearest algorithm, using the binary term occurrence word vector type with stopword filter and pruning. For chunk recognition, the best results were found using the PAUM algorithm with the same parameters for all cases, except for the prediction of chunks containing cancer morphology. For semi-structured reports the performance ranged from 0.97 to 0.94 and from 0.92 to 0.83 in precision and recall, while for unstructured reports performance ranged from 0.91 to 0.64 and from 0.68 to 0.41 in precision and recall. Poor results were found when the classifier was trained on semi-structured reports but tested on unstructured.

Conclusions: These results show that it is possible and beneficial to predict the layout of reports and that the accuracy of prediction of which segments of a report may contain certain information is sensitive to the report layout and the type of information sought.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of this paper is to examine the promising contributions of the Concept Maps for Learning (CMfL) website to assessment for learning practices. The CMfL website generates concept maps from relatedness degree of concepts pairs through the Pathfinder Scaling Algorithm. This website also confirms the established principles of effective assessment for learning, for it is capable of automatically assessing students' higher order knowledge, simultaneously identifying strengths and weaknesses, immediately providing useful feedback and being user-friendly. According to the default assessment plan, students first create concept maps on a particular subject and then they are given individualized visual feedback followed by associated instructional material (e.g., videos, website links, examples, problems, etc.) based on a comparison of their concept map and a subject matter expert's map. After studying the feedback and instructional material, teachers can monitor their students' progress by having them create revised concept maps. Therefore, we claim that the CMfL website may reduce the workload of teachers as well as provide immediate and delayed feedback on the weaknesses of students in different forms such as graphical and multimedia. For the following study, we will examine whether these promising contributions to assessment for learning are valid in a variety of subjects.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we describe how the pathfinder algorithm converts relatedness ratings of concept pairs to concept maps; we also present how this algorithm has been used to develop the Concept Maps for Learning website (www.conceptmapsforlearning.com) based on the principles of effective formative assessment. The pathfinder networks, one of the network representation tools, claim to help more students memorize and recall the relations between concepts than spatial representation tools (such as Multi- Dimensional Scaling). Therefore, the pathfinder networks have been used in various studies on knowledge structures, including identifying students’ misconceptions. To accomplish this, each student’s knowledge map and the expert knowledge map are compared via the pathfinder software, and the differences between these maps are highlighted. After misconceptions are identified, the pathfinder software fails to provide any feedback on these misconceptions. To overcome this weakness, we have been developing a mobile-based concept mapping tool providing visual, textual and remedial feedback (ex. videos, website links and applets) on the concept relations. This information is then placed on the expert concept map, but not on the student’s concept map. Additionally, students are asked to note what they understand from given feedback, and given the opportunity to revise their knowledge maps after receiving various types of feedback.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Learning Bayesian networks with bounded tree-width has attracted much attention recently, because low tree-width allows exact inference to be performed efficiently. Some existing methods \cite{korhonen2exact, nie2014advances} tackle the problem by using $k$-trees to learn the optimal Bayesian network with tree-width up to $k$. Finding the best $k$-tree, however, is computationally intractable. In this paper, we propose a sampling method to efficiently find representative $k$-trees by introducing an informative score function to characterize the quality of a $k$-tree. To further improve the quality of the $k$-trees, we propose a probabilistic hill climbing approach that locally refines the sampled $k$-trees. The proposed algorithm can efficiently learn a quality Bayesian network with tree-width at most $k$. Experimental results demonstrate that our approach is more computationally efficient than the exact methods with comparable accuracy, and outperforms most existing approximate methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a method for learning treewidth-bounded Bayesian networks from data sets containing thousands of variables. Bounding the treewidth of a Bayesian network greatly reduces the complexity of inferences. Yet, being a global property of the graph, it considerably increases the difficulty of the learning process. Our novel algorithm accomplishes this task, scaling both to large domains and to large treewidths. Our novel approach consistently outperforms the state of the art on experiments with up to thousands of variables.