Multiple cameras for audio-visual speech recognition in an automotive environment
Data(s) |
2012
|
---|---|
Resumo |
Audio-visualspeechrecognition, or the combination of visual lip-reading with traditional acoustic speechrecognition, has been previously shown to provide a considerable improvement over acoustic-only approaches in noisy environments, such as that present in an automotive cabin. The research presented in this paper will extend upon the established audio-visualspeechrecognition literature to show that further improvements in speechrecognition accuracy can be obtained when multiple frontal or near-frontal views of a speaker's face are available. A series of visualspeechrecognition experiments using a four-stream visual synchronous hidden Markov model (SHMM) are conducted on the four-camera AVICAR automotiveaudio-visualspeech database. We study the relative contribution between the side and central orientated cameras in improving visualspeechrecognition accuracy. Finally combination of the four visual streams with a single audio stream in a five-stream SHMM demonstrates a relative improvement of over 56% in word recognition accuracy when compared to the acoustic-only approach in the noisiest conditions of the AVICAR database. |
Identificador | |
Publicador |
Elsevier |
Relação |
DOI:10.1016/j.csl.2012.07.005 Navarathna, Rajitha, Dean, David B., Sridharan, Sridha, & Lucey, Patrick J. (2012) Multiple cameras for audio-visual speech recognition in an automotive environment. Computer Speech and Language. |
Fonte |
School of Earth, Environmental & Biological Sciences; School of Electrical Engineering & Computer Science; Science & Engineering Faculty |
Palavras-Chave | #080104 Computer Vision #080109 Pattern Recognition and Data Mining #090609 Signal Processing #AVASR #AVICAR database #Speech recognition #Multi-stream HMM #Automotive environment |
Tipo |
Journal Article |