A Likelihood-Maximizing Framework for Enhanced In-Car Speech Recognition Based on Speech Dialog System Interaction
Data(s) |
2011
|
---|---|
Resumo |
Speech recognition in car environments has been identified as a valuable means for reducing driver distraction when operating noncritical in-car systems. Under such conditions, however, speech recognition accuracy degrades significantly, and techniques such as speech enhancement are required to improve these accuracies. Likelihood-maximizing (LIMA) frameworks optimize speech enhancement algorithms based on recognized state sequences rather than traditional signal-level criteria such as maximizing signal-to-noise ratio. LIMA frameworks typically require calibration utterances to generate optimized enhancement parameters that are used for all subsequent utterances. Under such a scheme, suboptimal recognition performance occurs in noise conditions that are significantly different from that present during the calibration session – a serious problem in rapidly changing noise environments out on the open road. In this chapter, we propose a dialog-based design that allows regular optimization iterations in order to track the ever-changing noise conditions. Experiments using Mel-filterbank noise subtraction (MFNS) are performed to determine the optimization requirements for vehicular environments and show that minimal optimization is required to improve speech recognition, avoid over-optimization, and ultimately assist with semireal-time operation. It is also shown that the proposed design is able to provide improved recognition performance over frameworks incorporating a calibration session only. |
Identificador | |
Publicador |
Springer Science+Business Media |
Relação |
DOI:10.1007/978-1-4419-9607-7_10 Kleinschmidt, Tristan, Sridharan, Sridha, & Mason, Michael (2011) A Likelihood-Maximizing Framework for Enhanced In-Car Speech Recognition Based on Speech Dialog System Interaction. In Digital Signal Processing for In-Vehicle Systems and Safety. Springer Science+Business Media, Germany, pp. 159-174. |
Fonte |
School of Electrical Engineering & Computer Science; Science & Engineering Faculty |
Palavras-Chave | #080100 ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING #Automatic speech recognition (ASR) ; In-car speech recognition #Mel-filterbank noise subtraction (MFNS) |
Tipo |
Book Chapter |