External knowledge and query strategies in active learning: A study in clinical information extraction


Autoria(s): Kholghi, Mahnoosh; Sitbon, Laurianne; Zuccon, Guido; Nguyen, Anthony
Data(s)

2015

Resumo

This paper presents a new active learning query strategy for information extraction, called Domain Knowledge Informativeness (DKI). Active learning is often used to reduce the amount of annotation effort required to obtain training data for machine learning algorithms. A key component of an active learning approach is the query strategy, which is used to iteratively select samples for annotation. Knowledge resources have been used in information extraction as a means to derive additional features for sample representation. DKI is, however, the first query strategy that exploits such resources to inform sample selection. To evaluate the merits of DKI, in particular with respect to the reduction in annotation effort that the new query strategy allows to achieve, we conduct a comprehensive empirical comparison of active learning query strategies for information extraction within the clinical domain. The clinical domain was chosen for this work because of the availability of extensive structured knowledge resources which have often been exploited for feature generation. In addition, the clinical domain offers a compelling use case for active learning because of the necessary high costs and hurdles associated with obtaining annotations in this domain. Our experimental findings demonstrated that 1) amongst existing query strategies, the ones based on the classification model’s confidence are a better choice for clinical data as they perform equally well with a much lighter computational load, and 2) significant reductions in annotation effort are achievable by exploiting knowledge resources within active learning query strategies, with up to 14% less tokens and concepts to manually annotate than with state-of-the-art query strategies.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/85674/

Publicador

ACM

Relação

http://eprints.qut.edu.au/85674/3/85674.pdf

http://dl.acm.org/citation.cfm?doid=2806416.2806550

DOI:10.1145/2806416.2806550

Kholghi, Mahnoosh, Sitbon, Laurianne, Zuccon, Guido, & Nguyen, Anthony (2015) External knowledge and query strategies in active learning: A study in clinical information extraction. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, ACM, Melbourne, Australia, pp. 143-152.

Direitos

Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Fonte

School of Electrical Engineering & Computer Science; Science & Engineering Faculty

Palavras-Chave #Domain Knowledge #Active Learning #Concept Extraction #Clinical Free Text #Conditional Random Fields
Tipo

Conference Paper