Biblioteca Digital

Compositional data analysis (CoDA) approaches to distance in information retrieval

**Autoria(s):** Thomas, P.; Lovell, D. R.
Data(s)	2014
Resumo	Many techniques in information retrieval produce counts from a sample, and it is common to analyse these counts as proportions of the whole - term frequencies are a familiar example. Proportions carry only relative information and are not free to vary independently of one another: for the proportion of one term to increase, one or more others must decrease. These constraints are hallmarks of compositional data. While there has long been discussion in other fields of how such data should be analysed, to our knowledge, Compositional Data Analysis (CoDA) has not been considered in IR. In this work we explore compositional data in IR through the lens of distance measures, and demonstrate that common measures, naïve to compositions, have some undesirable properties which can be avoided with composition-aware measures. As a practical example, these measures are shown to improve clustering. Copyright 2014 ACM.
Identificador	http://eprints.qut.edu.au/79872/
Publicador	Association for Computing Machinery
Relação	DOI:10.1145/2600428.2609492 Thomas, P. & Lovell, D. R. (2014) Compositional data analysis (CoDA) approaches to distance in information retrieval. In SIGIR '14 Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, Association for Computing Machinery, Gold Coast, Qld., pp. 991-994.
Direitos	ACM
Fonte	School of Electrical Engineering & Computer Science; Science & Engineering Faculty
Palavras-Chave	#Aitchison's distance #Compositions #Distance #Ratio #Similarity #Chemical analysis #Compositional data #Compositional data analysis #Relative information #Through the lens #Information retrieval
Tipo	Conference Paper

Acesso ao item digital