Supervised latent dirichlet allocation models for efficient activity representation


Autoria(s): Umakanthan, Sabanadesan; Denman, Simon; Fookes, Clinton B.; Sridharan, Sridha
Contribuinte(s)

Bouzerdoum, Abdesselam

Wang, Lei

Ogunbona, Philip

Data(s)

2014

Resumo

Local spatio-temporal features with a Bag-of-visual words model is a popular approach used in human action recognition. Bag-of-features methods suffer from several challenges such as extracting appropriate appearance and motion features from videos, converting extracted features appropriate for classification and designing a suitable classification framework. In this paper we address the problem of efficiently representing the extracted features for classification to improve the overall performance. We introduce two generative supervised topic models, maximum entropy discrimination LDA (MedLDA) and class- specific simplex LDA (css-LDA), to encode the raw features suitable for discriminative SVM based classification. Unsupervised LDA models disconnect topic discovery from the classification task, hence yield poor results compared to the baseline Bag-of-words framework. On the other hand supervised LDA techniques learn the topic structure by considering the class labels and improve the recognition accuracy significantly. MedLDA maximizes likelihood and within class margins using max-margin techniques and yields a sparse highly discriminative topic structure; while in css-LDA separate class specific topics are learned instead of common set of topics across the entire dataset. In our representation first topics are learned and then each video is represented as a topic proportion vector, i.e. it can be comparable to a histogram of topics. Finally SVM classification is done on the learned topic proportion vector. We demonstrate the efficiency of the above two representation techniques through the experiments carried out in two popular datasets. Experimental results demonstrate significantly improved performance compared to the baseline Bag-of-features framework which uses kmeans to construct histogram of words from the feature vectors.

Identificador

http://eprints.qut.edu.au/82220/

Publicador

IEEE

Relação

DOI:10.1109/DICTA.2014.7008130

Umakanthan, Sabanadesan, Denman, Simon, Fookes, Clinton B., & Sridharan, Sridha (2014) Supervised latent dirichlet allocation models for efficient activity representation. In Bouzerdoum, Abdesselam, Wang, Lei, & Ogunbona, Philip (Eds.) Proceedings of the 2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA), IEEE, Wollongong, NSW, pp. 1-6.

Direitos

Copyright 2014 by IEEE

Fonte

School of Electrical Engineering & Computer Science; Science & Engineering Faculty

Palavras-Chave #Feature extraction #Gesture recognition #Image classification #Image representation #Statistical analysis #Support vector machines #Unsupervised learning
Tipo

Conference Paper