Can audio-visual speech recognition outperform acoustically enhanced speech recognition in automotive environment?


Autoria(s): Navarathna, Rajitha; Kleinschmidt, Tristan; Dean, David B.; Sridharan, Sridha; Lucey, Patrick J.
Data(s)

31/08/2011

Resumo

The use of visual features in the form of lip movements to improve the performance of acoustic speech recognition has been shown to work well, particularly in noisy acoustic conditions. However, whether this technique can outperform speech recognition incorporating well-known acoustic enhancement techniques, such as spectral subtraction, or multi-channel beamforming is not known. This is an important question to be answered especially in an automotive environment, for the design of an efficient human-vehicle computer interface. We perform a variety of speech recognition experiments on a challenging automotive speech dataset and results show that synchronous HMM-based audio-visual fusion can outperform traditional single as well as multi-channel acoustic speech enhancement techniques. We also show that further improvement in recognition performance can be obtained by fusing speech-enhanced audio with the visual modality, demonstrating the complementary nature of the two robust speech recognition approaches.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/45770/

Relação

http://eprints.qut.edu.au/45770/1/Rajitha_Interspeech2011.pdf

http://www.interspeech2011.org/

Navarathna, Rajitha, Kleinschmidt, Tristan, Dean, David B., Sridharan, Sridha, & Lucey, Patrick J. (2011) Can audio-visual speech recognition outperform acoustically enhanced speech recognition in automotive environment? In Interspeech 2011, 27-31 August 2011, Firenze Fiera, Florence.

Direitos

Copyright 2011 [please consult the author]

Fonte

Faculty of Built Environment and Engineering; Information Security Institute; School of Engineering Systems

Palavras-Chave #090609 Signal Processing #Speech enhancement #robust speech recognition #audio-visual automatic speech recognition #synchronous HMM
Tipo

Conference Paper