A Likelihood-Maximizing Framework for Enhanced In-Car Speech Recognition Based on Speech Dialog System Interaction


Autoria(s): Kleinschmidt, Tristan; Sridharan, Sridha; Mason, Michael
Data(s)

2011

Resumo

Speech recognition in car environments has been identified as a valuable means for reducing driver distraction when operating noncritical in-car systems. Under such conditions, however, speech recognition accuracy degrades significantly, and techniques such as speech enhancement are required to improve these accuracies. Likelihood-maximizing (LIMA) frameworks optimize speech enhancement algorithms based on recognized state sequences rather than traditional signal-level criteria such as maximizing signal-to-noise ratio. LIMA frameworks typically require calibration utterances to generate optimized enhancement parameters that are used for all subsequent utterances. Under such a scheme, suboptimal recognition performance occurs in noise conditions that are significantly different from that present during the calibration session – a serious problem in rapidly changing noise environments out on the open road. In this chapter, we propose a dialog-based design that allows regular optimization iterations in order to track the ever-changing noise conditions. Experiments using Mel-filterbank noise subtraction (MFNS) are performed to determine the optimization requirements for vehicular environments and show that minimal optimization is required to improve speech recognition, avoid over-optimization, and ultimately assist with semireal-time operation. It is also shown that the proposed design is able to provide improved recognition performance over frameworks incorporating a calibration session only.

Identificador

http://eprints.qut.edu.au/79995/

Publicador

Springer Science+Business Media

Relação

DOI:10.1007/978-1-4419-9607-7_10

Kleinschmidt, Tristan, Sridharan, Sridha, & Mason, Michael (2011) A Likelihood-Maximizing Framework for Enhanced In-Car Speech Recognition Based on Speech Dialog System Interaction. In Digital Signal Processing for In-Vehicle Systems and Safety. Springer Science+Business Media, Germany, pp. 159-174.

Fonte

School of Electrical Engineering & Computer Science; Science & Engineering Faculty

Palavras-Chave #080100 ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING #Automatic speech recognition (ASR) ; In-car speech recognition #Mel-filterbank noise subtraction (MFNS)
Tipo

Book Chapter