Biblioteca Digital

**Autoria(s):** Li, Zhao; Ng, Wee-Keong; Ong, Kok-Leong
Data(s)	01/01/2004
Resumo	Web data extraction systems are the kernel of information mediators between users and heterogeneous Web data resources. How to extract structured data from semi-structured documents has been a problem of active research. Supervised and unsupervised methods have been devised to learn extraction rules from training sets. However, trying to prepare training sets (especially to annotate them for supervised methods), is very time-consuming. We propose a framework for Web data extraction, which logged usersrsquo access history and exploit them to assist automatic training set generation. We cluster accessed Web documents according to their structural details; define criteria to measure the importance of sub-structures; and then generate extraction rules. We also propose a method to adjust the rules according to historical data. Our experiments confirm the viability of our proposal.<br />
Identificador	http://hdl.handle.net/10536/DRO/DU:30008664
Idioma(s)	eng
Publicador	Springer-Verlag
Relação	http://dro.deakin.edu.au/eserv/DU:30008664/n20040143.pdf http://springerlink.com/content/upqnuq3nxeqmc8jq/fulltext.pdf
Direitos	2004, Springer-Verlag
Tipo	Journal Article

Acesso ao item digital