The influence of pre-processing on the estimation of readability of web documents


Autoria(s): Palotti, João; Zuccon, Guido; Hanbury, Allan
Data(s)

2015

Resumo

This paper investigates the effect that text pre-processing approaches have on the estimation of the readability of web pages. Readability has been highlighted as an important aspect of web search result personalisation in previous work. The most widely used text readability measures rely on surface level characteristics of text, such as the length of words and sentences. We demonstrate that different tools for extracting text from web pages lead to very different estimations of readability. This has an important implication for search engines because search result personalisation strategies that consider users reading ability may fail if incorrect text readability estimations are computed.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/91421/

Publicador

ACM

Relação

http://eprints.qut.edu.au/91421/1/cikm2015_readability.pdf

http://dl.acm.org/citation.cfm?id=2806613

DOI:10.1145/2806416.2806613

Palotti, João, Zuccon, Guido, & Hanbury, Allan (2015) The influence of pre-processing on the estimation of readability of web documents. In Proceedings of the 24th ACM International Conference on Information and Knowledge Management, ACM, Melbourne, VIC, pp. 1763-1766.

Direitos

Copyright 2015 ACM

Fonte

Faculty of Science and Technology; School of Information Systems

Palavras-Chave #Readability, Text pre-processing
Tipo

Conference Paper