Biblioteca Digital

**Autoria(s):** Li, Cheng; Rana, Santu; Phung,D; Venkatesh,S
Data(s)	01/05/2016
Resumo	Side information, or auxiliary information associated with documents or image content, provides hints for clustering. We propose a new model, side information dependent Chinese restaurant process, which exploits side information in a Bayesian nonparametric model to improve data clustering. We introduce side information into the framework of distance dependent Chinese restaurant process using a robust decay function to handle noisy side information. The threshold parameter of the decay function is updated automatically in the Gibbs sampling process. A fast inference algorithm is proposed. We evaluate our approach on four datasets: Cora, 20 Newsgroups, NUS-WIDE and one medical dataset. Types of side information explored in this paper include citations, authors, tags, keywords and auxiliary clinical information. The comparison with the state-of-the-art approaches based on standard performance measures (NMI, F1) clearly shows the superiority of our approach.
Identificador	http://hdl.handle.net/10536/DRO/DU:30076873
Idioma(s)	eng
Publicador	Springer
Relação	http://dro.deakin.edu.au/eserv/DU:30076873/li-dataclustering-2016.pdf http://dro.deakin.edu.au/eserv/DU:30076873/li-dataclustering-inpress-2016.pdf http://www.dx.doi.org/10.1007/s10115-015-0834-7
Direitos	2016, Springer
Palavras-Chave	#Side information #Similarity #Data clustering #Bayesian nonparametric models
Tipo	Journal Article

Acesso ao item digital