OWDEAH: Online Web Data Extraction based on Access History


Autoria(s): Li, Zhao; Ng, Wee-Keong; Ong, Kok-Leong
Data(s)

01/01/2004

Resumo

Web data extraction systems are the kernel of information mediators between users and heterogeneous Web data resources. How to extract structured data from semi-structured documents has been a problem of active research. Supervised and unsupervised methods have been devised to learn extraction rules from training sets. However, trying to prepare training sets (especially to annotate them for supervised methods), is very time-consuming. We propose a framework for Web data extraction, which logged usersrsquo access history and exploit them to assist automatic training set generation. We cluster accessed Web documents according to their structural details; define criteria to measure the importance of sub-structures; and then generate extraction rules. We also propose a method to adjust the rules according to historical data. Our experiments confirm the viability of our proposal.<br />

Identificador

http://hdl.handle.net/10536/DRO/DU:30008664

Idioma(s)

eng

Publicador

Springer-Verlag

Relação

http://dro.deakin.edu.au/eserv/DU:30008664/n20040143.pdf

http://springerlink.com/content/upqnuq3nxeqmc8jq/fulltext.pdf

Direitos

2004, Springer-Verlag

Tipo

Journal Article