Scalable document hashing and retrieval


Autoria(s): Chappell, Timothy A.
Data(s)

2015

Resumo

This thesis studies document signatures, which are small representations of documents and other objects that can be stored compactly and compared for similarity. This research finds that document signatures can be effectively and efficiently used to both search and understand relationships between documents in large collections, scalable enough to search a billion documents in a fraction of a second. Deliverables arising from the research include an investigation of the representational capacity of document signatures, the publication of an open-source signature search platform and an approach for scaling signature retrieval to operate efficiently on collections containing hundreds of millions of documents.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/90044/

Publicador

Queensland University of Technology

Relação

http://eprints.qut.edu.au/90044/1/Timothy_Chappell_Thesis.pdf

Chappell, Timothy A. (2015) Scalable document hashing and retrieval. PhD thesis, Queensland University of Technology.

Fonte

Science & Engineering Faculty

Palavras-Chave #Information retrieval #Document signatures #Signature files #Relevance feedback #Superimposed coding #Locality-sensitive hashing #Topological signatures #Dimensionality reduction #Nearest-neighbour #Hamming distance problem
Tipo

Thesis