Visual voice activity detection using frontal versus profile views
Data(s) |
17/10/2011
|
---|---|
Resumo |
Visual activity detection of lip movements can be used to overcome the poor performance of voice activity detection based solely in the audio domain, particularly in noisy acoustic conditions. However, most of the research conducted in visual voice activity detection (VVAD) has neglected addressing variabilities in the visual domain such as viewpoint variation. In this paper we investigate the effectiveness of the visual information from the speaker’s frontal and profile views (i.e left and right side views) for the task of VVAD. As far as we are aware, our work constitutes the first real attempt to study this problem. We describe our visual front end approach and the Gaussian mixture model (GMM) based VVAD framework, and report the experimental results using the freely available CUAVE database. The experimental results show that VVAD is indeed possible from profile views and we give a quantitative comparison of VVAD based on frontal and profile views The results presented are useful in the development of multi-modal Human Machine Interaction (HMI) using a single camera, where the speaker’s face may not always be frontal. |
Formato |
application/pdf |
Identificador | |
Relação |
http://eprints.qut.edu.au/46513/1/Rajitha_DICTA2011.pdf http://itee.uq.edu.au/~dicta2011/ Navarathna, Rajitha, Dean, David B., Sridharan, Sridha, Fookes, Clinton B., & Lucey, Patrick J. (2011) Visual voice activity detection using frontal versus profile views. In The International Conference on Digital Image Computing : Techniques and Applications (DICTA2011), 6-8 December 2011, Sheraton Noosa Resort & Spa, Noosa, QLD. |
Direitos |
Copyright 2011 [please consult the author] |
Fonte |
Faculty of Built Environment and Engineering; Information Security Institute; School of Engineering Systems |
Palavras-Chave | #080100 ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING #lip movements #voice detection |
Tipo |
Conference Paper |