TOPSIG : Topology Preserving Document Signatures


Autoria(s): Geva, Shlomo; De Vries, Christopher M.
Data(s)

19/07/2011

Resumo

Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and positions the file signatures model in the class of Vector Space retrieval models.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/43451/

Relação

http://eprints.qut.edu.au/43451/4/43451.pdf

http://www.cikm2011.org/

Geva, Shlomo & De Vries, Christopher M. (2011) TOPSIG : Topology Preserving Document Signatures. In Conference on Information and Knowledge Management 2011, 24-28 October 2011, Glasgow, Scotland.

Direitos

Copyright 2011 Please consult the authors.

Fonte

Computer Science; Faculty of Science and Technology

Palavras-Chave #080109 Pattern Recognition and Data Mining #080704 Information Retrieval and Web Search #Signature Files, Random Indexing, Topology, Quantisation #Vector Space IR, Search Engines, Document Clustering, Document
Tipo

Conference Paper