Biblioteca Digital

Indexing without spam

**Autoria(s):** Zuccon, Guido; Nguyen, Anthony; Leelanupab, Teerapong; Azzopardi, Leif
Contribuinte(s)	Cunningham, Sally Jo Scholer, Falk Thomas, Paul
Data(s)	2011
Resumo	The presence of spam in a document ranking is a major issue for Web search engines. Common approaches that cope with spam remove from the document rankings those pages that are likely to contain spam. These approaches are implemented as post-retrieval processes, that filter out spam pages only after documents have been retrieved with respect to a user’s query. In this paper we suggest to remove spam pages at indexing time, therefore obtaining a pruned index that is virtually “spam-free”. We investigate the benefits of this approach from three points of view: indexing time, index size, and retrieval performances. Not surprisingly, we found that the strategy decreases both the time required by the indexing process and the space required for storing the index. Surprisingly instead, we found that by considering a spam-pruned version of a collection’s index, no difference in retrieval performance is found when compared to that obtained by traditional post-retrieval spam filtering approaches.
Formato	application/pdf
Identificador	http://eprints.qut.edu.au/69285/
Publicador	RMIT University
Relação	http://eprints.qut.edu.au/69285/1/zuccon2011e.pdf http://www.cs.rmit.edu.au/adcs2011/pdf/paper11.pdf Zuccon, Guido, Nguyen, Anthony, Leelanupab, Teerapong, & Azzopardi, Leif (2011) Indexing without spam. In Cunningham, Sally Jo, Scholer, Falk, & Thomas, Paul (Eds.) Proceedings of the 16th Australasian Document Computing Symposium, RMIT University, Australian National University, Canberra, pp. 6-13.
Direitos	Copyright 2011 Author(s)
Fonte	Institute for Future Environments; School of Information Systems; Science & Engineering Faculty
Palavras-Chave	#Information Retrieval #Index Pruning #Spam #Web Search #Efficiency
Tipo	Conference Paper

Acesso ao item digital