Slicing the metric space to provide quick indexing of complex data in the main memory


Autoria(s): CARELO, Caio Cesar Mori; POLA, Ives Rene Venturini; CIFERRI, Ricardo Rodrigues; TRAINA, Agma Juci Machado; TRAINA JR., Caetano; CIFERRI, Cristina Dutra de Aguiar
Contribuinte(s)

UNIVERSIDADE DE SÃO PAULO

Data(s)

20/10/2012

20/10/2012

2011

Resumo

Searching in a dataset for elements that are similar to a given query element is a core problem in applications that manage complex data, and has been aided by metric access methods (MAMs). A growing number of applications require indices that must be built faster and repeatedly, also providing faster response for similarity queries. The increase in the main memory capacity and its lowering costs also motivate using memory-based MAMs. In this paper. we propose the Onion-tree, a new and robust dynamic memory-based MAM that slices the metric space into disjoint subspaces to provide quick indexing of complex data. It introduces three major characteristics: (i) a partitioning method that controls the number of disjoint subspaces generated at each node; (ii) a replacement technique that can change the leaf node pivots in insertion operations; and (iii) range and k-NN extended query algorithms to support the new partitioning method, including a new visit order of the subspaces in k-NN queries. Performance tests with both real-world and synthetic datasets showed that the Onion-tree is very compact. Comparisons of the Onion-tree with the MM-tree and a memory-based version of the Slim-tree showed that the Onion-tree was always faster to build the index. The experiments also showed that the Onion-tree significantly improved range and k-NN query processing performance and was the most efficient MAM, followed by the MM-tree, which in turn outperformed the Slim-tree in almost all the tests. (C) 2010 Elsevier B.V. All rights reserved.

Identificador

INFORMATION SYSTEMS, v.36, n.1, Special Issue, p.79-98, 2011

0306-4379

http://producao.usp.br/handle/BDPI/28741

10.1016/j.is.2010.06.004

http://dx.doi.org/10.1016/j.is.2010.06.004

Idioma(s)

eng

Publicador

PERGAMON-ELSEVIER SCIENCE LTD

Relação

Information Systems

Direitos

restrictedAccess

Copyright PERGAMON-ELSEVIER SCIENCE LTD

Palavras-Chave #Metric access method #Memory-based indexing #Complex data #Similarity search #ACCESS METHODS #SIMILARITY SEARCH #QUERIES #TREES #Computer Science, Information Systems
Tipo

article

proceedings paper

publishedVersion