Visual voice activity detection using frontal versus profile views


Autoria(s): Navarathna, Rajitha; Dean, David B.; Sridharan, Sridha; Fookes, Clinton B.; Lucey, Patrick J.
Data(s)

17/10/2011

Resumo

Visual activity detection of lip movements can be used to overcome the poor performance of voice activity detection based solely in the audio domain, particularly in noisy acoustic conditions. However, most of the research conducted in visual voice activity detection (VVAD) has neglected addressing variabilities in the visual domain such as viewpoint variation. In this paper we investigate the effectiveness of the visual information from the speaker’s frontal and profile views (i.e left and right side views) for the task of VVAD. As far as we are aware, our work constitutes the first real attempt to study this problem. We describe our visual front end approach and the Gaussian mixture model (GMM) based VVAD framework, and report the experimental results using the freely available CUAVE database. The experimental results show that VVAD is indeed possible from profile views and we give a quantitative comparison of VVAD based on frontal and profile views The results presented are useful in the development of multi-modal Human Machine Interaction (HMI) using a single camera, where the speaker’s face may not always be frontal.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/46513/

Relação

http://eprints.qut.edu.au/46513/1/Rajitha_DICTA2011.pdf

http://itee.uq.edu.au/~dicta2011/

Navarathna, Rajitha, Dean, David B., Sridharan, Sridha, Fookes, Clinton B., & Lucey, Patrick J. (2011) Visual voice activity detection using frontal versus profile views. In The International Conference on Digital Image Computing : Techniques and Applications (DICTA2011), 6-8 December 2011, Sheraton Noosa Resort & Spa, Noosa, QLD.

Direitos

Copyright 2011 [please consult the author]

Fonte

Faculty of Built Environment and Engineering; Information Security Institute; School of Engineering Systems

Palavras-Chave #080100 ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING #lip movements #voice detection
Tipo

Conference Paper