Biblioteca Digital

Injury narrative text classification : a preliminary study

**Autoria(s):** Chen, Lin; Vallmuur, Kirsten; Nayak, Richi
Contribuinte(s)	Lee, Doheon Chen, Luonan
Data(s)	2014
Resumo	Description of a patient's injuries is recorded in narrative text form by hospital emergency departments. For statistical reporting, this text data needs to be mapped to pre-defined codes. Existing research in this field uses the Naïve Bayes probabilistic method to build classifiers for mapping. In this paper, we focus on providing guidance on the selection of a classification method. We build a number of classifiers belonging to different classification families such as decision tree, probabilistic, neural networks, and instance-based, ensemble-based and kernel-based linear classifiers. An extensive pre-processing is carried out to ensure the quality of data and, in hence, the quality classification outcome. The records with a null entry in injury description are removed. The misspelling correction process is carried out by finding and replacing the misspelt word with a soundlike word. Meaningful phrases have been identified and kept, instead of removing the part of phrase as a stop word. The abbreviations appearing in many forms of entry are manually identified and only one form of abbreviations is used. Clustering is utilised to discriminate between non-frequent and frequent terms. This process reduced the number of text features dramatically from about 28,000 to 5000. The medical narrative text injury dataset, under consideration, is composed of many short documents. The data can be characterized as high-dimensional and sparse, i.e., few features are irrelevant but features are correlated with one another. Therefore, Matrix factorization techniques such as Singular Value Decomposition (SVD) and Non Negative Matrix Factorization (NNMF) have been used to map the processed feature space to a lower-dimensional feature space. Classifiers with these reduced feature space have been built. In experiments, a set of tests are conducted to reflect which classification method is best for the medical text classification. The Non Negative Matrix Factorization with Support Vector Machine method can achieve 93% precision which is higher than all the tested traditional classifiers. We also found that TF/IDF weighting which works well for long text classification is inferior to binary weighting in short document classification. Another finding is that the Top-n terms should be removed in consultation with medical experts, as it affects the classification performance.
Identificador	http://eprints.qut.edu.au/78731/
Publicador	ACM
Relação	DOI:10.1145/2665970.2665976 Chen, Lin, Vallmuur, Kirsten, & Nayak, Richi (2014) Injury narrative text classification : a preliminary study. In Lee, Doheon & Chen, Luonan (Eds.) Proceedings of the ACM 8th International Workshop on Data and Text Mining in Bioinformatics - DTMBIO '14, ACM, Shanghai, China, p. 7.
Direitos	Copyright 2014 [please consult the authors]
Fonte	School of Electrical Engineering & Computer Science; Science & Engineering Faculty
Palavras-Chave	#Narrative text classification
Tipo	Conference Paper

Acesso ao item digital