Evaluation of information extraction techniques to label extracted data from e-commerce web page


Autoria(s): Anderson, Neil; Hong, Jun
Data(s)

01/04/2014

Resumo

Automatically determining and assigning shared and meaningful text labels to data extracted from an e-Commerce web page is a challenging problem. An e-Commerce web page can display a list of data records, each of which can contain a combination of data items (e.g. product name and price) and explicit labels, which describe some of these data items. Recent advances in extraction techniques have made it much easier to precisely extract individual data items and labels from a web page, however, there are two open problems: 1. assigning an explicit label to a data item, and 2. determining labels for the remaining data items. Furthermore, improvements in the availability and coverage of vocabularies, especially in the context of e-Commerce web sites, means that we now have access to a bank of relevant, meaningful and shared labels which can be assigned to extracted data items. However, there is a need for a technique which will take as input a set of extracted data items and assign automatically to them the most relevant and meaningful labels from a shared vocabulary. We observe that the Information Extraction (IE) community has developed a great number of techniques which solve problems similar to our own. In this work-in-progress paper we propose our intention to theoretically and experimentally evaluate different IE techniques to ascertain which is most suitable to solve this problem.

Identificador

http://pure.qub.ac.uk/portal/en/publications/evaluation-of-information-extraction-techniques-to-label-extracted-data-from-ecommerce-web-page(7af75bc1-8d93-4c24-a0d4-73f68a2f375e).html

http://dx.doi.org/10.1145/2567948.2579703

Idioma(s)

eng

Publicador

ACM

Direitos

info:eu-repo/semantics/restrictedAccess

Fonte

Anderson , N & Hong , J 2014 , Evaluation of information extraction techniques to label extracted data from e-commerce web page . in WWW 2014 Companion . ACM , pp. 1275-1278 , International World Wide Web Conference , Seoul , Korea, Republic of , 7-11 April . DOI: 10.1145/2567948.2579703

Tipo

contributionToPeriodical