Biblioteca Digital

**Autoria(s):** Kowalkiewicz, M.; Orlowska, M. E.; Kaczmarek, T.; Abramowicz, W.
Data(s)	2006
Resumo	We present an empirical evaluation and comparison of two content extraction methods in HTML: absolute XPath expressions and relative XPath expressions. We argue that the relative XPath expressions, although not widely used, should be used in preference to absolute XPath expressions in extracting content from human-created Web documents. Evaluation of robustness covers four thousand queries executed on several hundred webpages. We show that in referencing parts of real world dynamic HTML documents, relative XPath expressions are on average significantly more robust than absolute XPath ones.
Formato	application/pdf
Identificador	http://eprints.qut.edu.au/86019/
Publicador	ACM (The Association for Computing Machinery)
Relação	http://eprints.qut.edu.au/86019/1/86019.pdf DOI:10.1145/1135777.1135928 Kowalkiewicz, M., Orlowska, M. E., Kaczmarek, T., & Abramowicz, W. (2006) Robust Web content extraction. In 15th International Conference on World Wide Web, May 22 - 26, 2006, Edinburgh, Scotland UK.
Direitos	The authors
Fonte	Science & Engineering Faculty
Palavras-Chave	#Content extraction #Evaluation #Robustness #Wrappers #Content based retrieval #Electronic document exchange #HTML #Robust control #Robustness (control systems) #Websites #Markup languages #XPath expressions #Web services #World Wide Web #Empirical evaluations #HTML documents #Web content #Web document #Web page
Tipo	Conference Paper

Acesso ao item digital