Multiple cameras for audio-visual speech recognition in an automotive environment


Autoria(s): Navarathna, Rajitha; Dean, David B.; Sridharan, Sridha; Lucey, Patrick J.
Data(s)

2012

Resumo

Audio-visualspeechrecognition, or the combination of visual lip-reading with traditional acoustic speechrecognition, has been previously shown to provide a considerable improvement over acoustic-only approaches in noisy environments, such as that present in an automotive cabin. The research presented in this paper will extend upon the established audio-visualspeechrecognition literature to show that further improvements in speechrecognition accuracy can be obtained when multiple frontal or near-frontal views of a speaker's face are available. A series of visualspeechrecognition experiments using a four-stream visual synchronous hidden Markov model (SHMM) are conducted on the four-camera AVICAR automotiveaudio-visualspeech database. We study the relative contribution between the side and central orientated cameras in improving visualspeechrecognition accuracy. Finally combination of the four visual streams with a single audio stream in a five-stream SHMM demonstrates a relative improvement of over 56% in word recognition accuracy when compared to the acoustic-only approach in the noisiest conditions of the AVICAR database.

Identificador

http://eprints.qut.edu.au/52968/

Publicador

Elsevier

Relação

DOI:10.1016/j.csl.2012.07.005

Navarathna, Rajitha, Dean, David B., Sridharan, Sridha, & Lucey, Patrick J. (2012) Multiple cameras for audio-visual speech recognition in an automotive environment. Computer Speech and Language.

Fonte

School of Earth, Environmental & Biological Sciences; School of Electrical Engineering & Computer Science; Science & Engineering Faculty

Palavras-Chave #080104 Computer Vision #080109 Pattern Recognition and Data Mining #090609 Signal Processing #AVASR #AVICAR database #Speech recognition #Multi-stream HMM #Automotive environment
Tipo

Journal Article