Improving Recall of Regular Expressions for Information Extraction


Autoria(s): Murthy, Karin; Padmanabhan, Deepak; Deshpande, Prasad
Data(s)

2012

Resumo

Learning or writing regular expressions to identify instances of a specific<br/>concept within text documents with a high precision and recall is challenging.<br/>It is relatively easy to improve the precision of an initial regular expression<br/>by identifying false positives covered and tweaking the expression to avoid the<br/>false positives. However, modifying the expression to improve recall is difficult<br/>since false negatives can only be identified by manually analyzing all documents,<br/>in the absence of any tools to identify the missing instances. We focus on partially<br/>automating the discovery of missing instances by soliciting minimal user<br/>feedback. We present a technique to identify good generalizations of a regular<br/>expression that have improved recall while retaining high precision. We empirically<br/>demonstrate the effectiveness of the proposed technique as compared to<br/>existing methods and show results for a variety of tasks such as identification of<br/>dates, phone numbers, product names, and course numbers on real world datasets

Identificador

http://pure.qub.ac.uk/portal/en/publications/improving-recall-of-regular-expressions-for-information-extraction(c8c74b46-95b3-4d09-8dd8-47a67ffea0b2).html

Idioma(s)

eng

Direitos

info:eu-repo/semantics/restrictedAccess

Fonte

Murthy , K , Padmanabhan , D & Deshpande , P 2012 , Improving Recall of Regular Expressions for Information Extraction . in Web Information Systems Engineering - WISE 2012 - 13th International Conference, Paphos, Cyprus, November 28-30, 2012. Proceedings. . pp. 455-467 , WISE 2012 , Paphos , Cyprus , 28-30 November .

Tipo

contributionToPeriodical