Regularizing topic discovery in emrs with side information by using hierarchical bayesian models


Autoria(s): Li,C; Rana,S; Phung,D; Venkatesh,S
Contribuinte(s)

[Unknown]

Data(s)

01/01/2014

Resumo

We propose a novel hierarchical Bayesian framework, word-distance-dependent Chinese restaurant franchise (wd-dCRF) for topic discovery from a document corpus regularized by side information in the form of word-to-word relations, with an application on Electronic Medical Records (EMRs). Typically, a EMRs dataset consists of several patients (documents) and each patient contains many diagnosis codes (words). We exploit the side information available in the form of a semantic tree structure among the diagnosis codes for semantically-coherent disease topic discovery. We introduce novel functions to compute word-to-word distances when side information is available in the form of tree structures. We derive an efficient inference method for the wddCRF using MCMC technique. We evaluate on a real world medical dataset consisting of about 1000 patients with PolyVascular disease. Compared with the popular topic analysis tool, hierarchical Dirichlet process (HDP), our model discovers topics which are superior in terms of both qualitative and quantitative measures.

Identificador

http://hdl.handle.net/10536/DRO/DU:30072477

Idioma(s)

eng

Publicador

IEEE

Relação

http://dro.deakin.edu.au/eserv/DU:30072477/t121000-Li-c-regularisingtopicdiscovery-.pdf

http://dro.deakin.edu.au/eserv/DU:30072477/t121045-ev-conficprpeerrvwgnrl-2014-ICPR.pdf

http://www.dx.doi.org/10.1109/ICPR.2014.234

Direitos

2014, IEEE

Palavras-Chave #Medical application #Readmission #Side information #Topic analysis #Tree structure #Words
Tipo

Conference Paper