Efficient top-k retrieval with signatures


Autoria(s): Chappell, Timothy; Geva, Shlomo; Nguyen, Anthony; Zuccon, Guido
Contribuinte(s)

Culpepper, J. Shane

Sitbon, Laurianne

Zuccon, Guido

Data(s)

2013

Resumo

This paper describes a new method of indexing and searching large binary signature collections to efficiently find similar signatures, addressing the scalability problem in signature search. Signatures offer efficient computation with acceptable measure of similarity in numerous applications. However, performing a complete search with a given search argument (a signature) requires a Hamming distance calculation against every signature in the collection. This quickly becomes excessive when dealing with large collections, presenting issues of scalability that limit their applicability. Our method efficiently finds similar signatures in very large collections, trading memory use and precision for greatly improved search speed. Experimental results demonstrate that our approach is capable of finding a set of nearest signatures to a given search argument with a high degree of speed and fidelity.

Identificador

http://eprints.qut.edu.au/66949/

Publicador

ACM

Relação

http://dl.acm.org/citation.cfm?doid=2537734.2537742

DOI:10.1145/2537734.2537742

Chappell, Timothy, Geva, Shlomo, Nguyen, Anthony, & Zuccon, Guido (2013) Efficient top-k retrieval with signatures. In Culpepper, J. Shane, Sitbon, Laurianne, & Zuccon, Guido (Eds.) Proceedings of the 18th Australasian Document Computing Symposium, ACM, Brisbane, Australia, pp. 10-17.

Fonte

School of Electrical Engineering & Computer Science; Science & Engineering Faculty

Palavras-Chave #Document Signatures #Near-Duplicate Detection #Hamming Distance #Locality-Sensitive Hashing #Nearest Neighbour #Top-K
Tipo

Conference Paper