853 resultados para active learning
Resumo:
Active learning approaches reduce the annotation cost required by traditional supervised approaches to reach the same effectiveness by actively selecting informative instances during the learning phase. However, effectiveness and robustness of the learnt models are influenced by a number of factors. In this paper we investigate the factors that affect the effectiveness, more specifically in terms of stability and robustness, of active learning models built using conditional random fields (CRFs) for information extraction applications. Stability, defined as a small variation of performance when small variation of the training data or a small variation of the parameters occur, is a major issue for machine learning models, but even more so in the active learning framework which aims to minimise the amount of training data required. The factors we investigate are a) the choice of incremental vs. standard active learning, b) the feature set used as a representation of the text (i.e., morphological features, syntactic features, or semantic features) and c) Gaussian prior variance as one of the important CRFs parameters. Our empirical findings show that incremental learning and the Gaussian prior variance lead to more stable and robust models across iterations. Our study also demonstrates that orthographical, morphological and contextual features as a group of basic features play an important role in learning effective models across all iterations.
Resumo:
Objective This paper presents an automatic active learning-based system for the extraction of medical concepts from clinical free-text reports. Specifically, (1) the contribution of active learning in reducing the annotation effort, and (2) the robustness of incremental active learning framework across different selection criteria and datasets is determined. Materials and methods The comparative performance of an active learning framework and a fully supervised approach were investigated to study how active learning reduces the annotation effort while achieving the same effectiveness as a supervised approach. Conditional Random Fields as the supervised method, and least confidence and information density as two selection criteria for active learning framework were used. The effect of incremental learning vs. standard learning on the robustness of the models within the active learning framework with different selection criteria was also investigated. Two clinical datasets were used for evaluation: the i2b2/VA 2010 NLP challenge and the ShARe/CLEF 2013 eHealth Evaluation Lab. Results The annotation effort saved by active learning to achieve the same effectiveness as supervised learning is up to 77%, 57%, and 46% of the total number of sequences, tokens, and concepts, respectively. Compared to the Random sampling baseline, the saving is at least doubled. Discussion Incremental active learning guarantees robustness across all selection criteria and datasets. The reduction of annotation effort is always above random sampling and longest sequence baselines. Conclusion Incremental active learning is a promising approach for building effective and robust medical concept extraction models, while significantly reducing the burden of manual annotation.
Resumo:
This paper presents a new active learning query strategy for information extraction, called Domain Knowledge Informativeness (DKI). Active learning is often used to reduce the amount of annotation effort required to obtain training data for machine learning algorithms. A key component of an active learning approach is the query strategy, which is used to iteratively select samples for annotation. Knowledge resources have been used in information extraction as a means to derive additional features for sample representation. DKI is, however, the first query strategy that exploits such resources to inform sample selection. To evaluate the merits of DKI, in particular with respect to the reduction in annotation effort that the new query strategy allows to achieve, we conduct a comprehensive empirical comparison of active learning query strategies for information extraction within the clinical domain. The clinical domain was chosen for this work because of the availability of extensive structured knowledge resources which have often been exploited for feature generation. In addition, the clinical domain offers a compelling use case for active learning because of the necessary high costs and hurdles associated with obtaining annotations in this domain. Our experimental findings demonstrated that 1) amongst existing query strategies, the ones based on the classification model’s confidence are a better choice for clinical data as they perform equally well with a much lighter computational load, and 2) significant reductions in annotation effort are achievable by exploiting knowledge resources within active learning query strategies, with up to 14% less tokens and concepts to manually annotate than with state-of-the-art query strategies.
Resumo:
Perceiving students, science students especially, as mere consumers of facts and information belies the importance of a need to engage them with the principles underlying those facts and is counter-intuitive to the facilitation of knowledge and understanding. Traditional didactic lecture approaches need a re-think if student classroom engagement and active learning are to be valued over fact memorisation and fact recall. In our undergraduate biomedical science programs across Years 1, 2 and 3 in the Faculty of Health at QUT, we have developed an authentic learning model with an embedded suite of pedagogical strategies that foster classroom engagement and allow for active learning in the sub-discipline area of medical bacteriology. The suite of pedagogical tools we have developed have been designed to enable their translation, with appropriate fine-tuning, to most biomedical and allied health discipline teaching and learning contexts. Indeed, aspects of the pedagogy have been successfully translated to the nursing microbiology study stream at QUT. The aims underpinning the pedagogy are for our students to: (1) Connect scientific theory with scientific practice in a more direct and authentic way, (2) Construct factual knowledge and facilitate a deeper understanding, and (3) Develop and refine their higher order flexible thinking and problem solving skills, both semi-independently and independently. The mindset and role of the teaching staff is critical to this approach since for the strategy to be successful tertiary teachers need to abandon traditional instructional modalities based on one-way information delivery. Face-to-face classroom interactions between students and lecturer enable realisation of pedagogical aims (1), (2) and (3). The strategy we have adopted encourages teachers to view themselves more as expert guides in what is very much a student-focused process of scientific exploration and learning. Specific pedagogical strategies embedded in the authentic learning model we have developed include: (i) interactive lecture-tutorial hybrids or lectorials featuring teacher role-plays as well as class-level question-and-answer sessions, (ii) inclusion of “dry” laboratory activities during lectorials to prepare students for the wet laboratory to follow, (iii) real-world problem-solving exercises conducted during both lectorials and wet laboratory sessions, and (iv) designing class activities and formative assessments that probe a student’s higher order flexible thinking skills. Flexible thinking in this context encompasses analytical, critical, deductive, scientific and professional thinking modes. The strategic approach outlined above is designed to provide multiple opportunities for students to apply principles flexibly according to a given situation or context, to adapt methods of inquiry strategically, to go beyond mechanical application of formulaic approaches, and to as much as possible self-appraise their own thinking and problem solving. The pedagogical tools have been developed within both workplace (real world) and theoretical frameworks. The philosophical core of the pedagogy is a coherent pathway of teaching and learning which we, and many of our students, believe is more conducive to student engagement and active learning in the classroom. Qualitative and quantitative data derived from online and hardcopy evaluations, solicited and unsolicited student and graduate feedback, anecdotal evidence as well as peer review indicate that: (i) our students are engaging with the pedagogy, (ii) a constructivist, authentic-learning approach promotes active learning, and (iii) students are better prepared for workplace transition.
Resumo:
Web-based technology is particularly well-suited to promoting active student involvement in the processes of learning. All students enrolled in a first-year educational psychology unit were required to complete ten weekly online quizzes, ten weekly student-generated questions and ten weekly student answers to those questions. Results of an online survey of participating students strongly support the viability and perceived benefits of such an instructional approach. Although students reported that the 30 assessments were useful and reasonable, the most common theme to emerge from the professional reflections of participating lecturers was that the marking of questions and answers was unmanageable.
Resumo:
- Objectives To explore if active learning principles be applied to nursing bioscience assessments and will this influence student perception of confidence in applying theory to practice? - Design and Data Sources A review of the literature utilising searches of various databases including CINAHL, PUBMED, Google Scholar and Mosby's Journal Index. - Methods The literature search identified research from twenty-six original articles, two electronic books, one published book and one conference proceedings paper. - Results Bioscience has been identified as an area that nurses struggle to learn in tertiary institutions and then apply to clinical practice. A number of problems have been identified and explored that may contribute to this poor understanding and retention. University academics need to be knowledgeable of innovative teaching and assessing modalities that focus on enhancing student learning and address the integration issues associated with the theory practice gap. Increased bioscience education is associated with improved patient outcomes therefore by addressing this “bioscience problem” and improving the integration of bioscience in clinical practice there will subsequently be an improvement in health care outcomes. - Conclusion From the literature several themes were identified. First there are many problems with teaching nursing students bioscience education. These include class sizes, motivation, concentration, delivery mode, lecturer perspectives, student's previous knowledge, anxiety, and a lack of confidence. Among these influences the type of assessment employed by the educator has not been explored or identified as a contributor to student learning specifically in nursing bioscience instruction. Second that educating could be achieved more effectively if active learning principles were applied and the needs and expectations of the student were met. Lastly, assessment influences student retention and the student experience and as such assessment should be congruent with the subject content, align with the learning objectives and be used as a stimulus tool for learning.
Resumo:
The problem of scaling up data integration, such that new sources can be quickly utilized as they are discovered, remains elusive: Global schemas for integrated data are difficult to develop and expand, and schema and record matching techniques are limited by the fact that data and metadata are often under-specified and must be disambiguated by data experts. One promising approach is to avoid using a global schema, and instead to develop keyword search-based data integration-where the system lazily discovers associations enabling it to join together matches to keywords, and return ranked results. The user is expected to understand the data domain and provide feedback about answers' quality. The system generalizes such feedback to learn how to correctly integrate data. A major open challenge is that under this model, the user only sees and offers feedback on a few ``top-'' results: This result set must be carefully selected to include answers of high relevance and answers that are highly informative when feedback is given on them. Existing systems merely focus on predicting relevance, by composing the scores of various schema and record matching algorithms. In this paper, we show how to predict the uncertainty associated with a query result's score, as well as how informative feedback is on a given result. We build upon these foundations to develop an active learning approach to keyword search-based data integration, and we validate the effectiveness of our solution over real data from several very different domains.
Resumo:
Therapy employing epidural electrostimulation holds great potential for improving therapy for patients with spinal cord injury (SCI) (Harkema et al., 2011). Further promising results from combined therapies using electrostimulation have also been recently obtained (e.g., van den Brand et al., 2012). The devices being developed to deliver the stimulation are highly flexible, capable of delivering any individual stimulus among a combinatorially large set of stimuli (Gad et al., 2013). While this extreme flexibility is very useful for ensuring that the device can deliver an appropriate stimulus, the challenge of choosing good stimuli is quite substantial, even for expert human experimenters. To develop a fully implantable, autonomous device which can provide useful therapy, it is necessary to design an algorithmic method for choosing the stimulus parameters. Such a method can be used in a clinical setting, by caregivers who are not experts in the neurostimulator's use, and to allow the system to adapt autonomously between visits to the clinic. To create such an algorithm, this dissertation pursues the general class of active learning algorithms that includes Gaussian Process Upper Confidence Bound (GP-UCB, Srinivas et al., 2010), developing the Gaussian Process Batch Upper Confidence Bound (GP-BUCB, Desautels et al., 2012) and Gaussian Process Adaptive Upper Confidence Bound (GP-AUCB) algorithms. This dissertation develops new theoretical bounds for the performance of these and similar algorithms, empirically assesses these algorithms against a number of competitors in simulation, and applies a variant of the GP-BUCB algorithm in closed-loop to control SCI therapy via epidural electrostimulation in four live rats. The algorithm was tasked with maximizing the amplitude of evoked potentials in the rats' left tibialis anterior muscle. These experiments show that the algorithm is capable of directing these experiments sensibly, finding effective stimuli in all four animals. Further, in direct competition with an expert human experimenter, the algorithm produced superior performance in terms of average reward and comparable or superior performance in terms of maximum reward. These results indicate that variants of GP-BUCB may be suitable for autonomously directing SCI therapy.
Resumo:
Information theoretic active learning has been widely studied for probabilistic models. For simple regression an optimal myopic policy is easily tractable. However, for other tasks and with more complex models, such as classification with nonparametric models, the optimal solution is harder to compute. Current approaches make approximations to achieve tractability. We propose an approach that expresses information gain in terms of predictive entropies, and apply this method to the Gaussian Process Classifier (GPC). Our approach makes minimal approximations to the full information theoretic objective. Our experimental performance compares favourably to many popular active learning algorithms, and has equal or lower computational complexity. We compare well to decision theoretic approaches also, which are privy to more information and require much more computational time. Secondly, by developing further a reformulation of binary preference learning to a classification problem, we extend our algorithm to Gaussian Process preference learning.