Biblioteca Digital

242 resultados para automatic speech recognition

A Likelihood-Maximizing Framework for Enhanced In-Car Speech Recognition Based on Speech Dialog System Interaction

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech recognition in car environments has been identified as a valuable means for reducing driver distraction when operating noncritical in-car systems. Under such conditions, however, speech recognition accuracy degrades significantly, and techniques such as speech enhancement are required to improve these accuracies. Likelihood-maximizing (LIMA) frameworks optimize speech enhancement algorithms based on recognized state sequences rather than traditional signal-level criteria such as maximizing signal-to-noise ratio. LIMA frameworks typically require calibration utterances to generate optimized enhancement parameters that are used for all subsequent utterances. Under such a scheme, suboptimal recognition performance occurs in noise conditions that are significantly different from that present during the calibration session – a serious problem in rapidly changing noise environments out on the open road. In this chapter, we propose a dialog-based design that allows regular optimization iterations in order to track the ever-changing noise conditions. Experiments using Mel-filterbank noise subtraction (MFNS) are performed to determine the optimization requirements for vehicular environments and show that minimal optimization is required to improve speech recognition, avoid over-optimization, and ultimately assist with semireal-time operation. It is also shown that the proposed design is able to provide improved recognition performance over frameworks incorporating a calibration session only.

Channel selection in the short-time modulation domain for distant speech recognition

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Automatic speech recognition from multiple distant micro- phones poses significant challenges because of noise and reverberations. The quality of speech acquisition may vary between microphones because of movements of speakers and channel distortions. This paper proposes a channel selection approach for selecting reliable channels based on selection criterion operating in the short-term modulation spectrum domain. The proposed approach quantifies the relative strength of speech from each microphone and speech obtained from beamforming modulations. The new technique is compared experimentally in the real reverb conditions in terms of perceptual evaluation of speech quality (PESQ) measures and word error rate (WER). Overall improvement in recognition rate is observed using delay-sum and superdirective beamformers compared to the case when the channel is selected randomly using circular microphone arrays.

Automatic Speech Segmentation with HMM

Relevância:

100.00% 100.00%

Publicador:

Using a Free-Parts Representation for Visual Speech Recognition

Relevância:

100.00% 100.00%

Publicador:

Adaptive Parameter Compensation for Robust Hands-Free Speech Recognition Using a Dual Beamforming Microphone Array

Relevância:

100.00% 100.00%

Publicador:

An investigation of HMM classifier combination ctrategies for improved audio-visual speech recognition

Relevância:

100.00% 100.00%

Publicador:

Pseudo-Syntactic Language Modelling for Disfluent Speech Recognition

Relevância:

100.00% 100.00%

Publicador:

A link between cepstral shrinking and the weighted product rule in audio-visual speech recognition

Relevância:

100.00% 100.00%

Publicador:

Microphone Array Sub-Brand Speech Recognition

Relevância:

100.00% 100.00%

Publicador:

Improved speech recognition using adaptive audio-visual fusion via a stochastic secondary classifier

Relevância:

100.00% 100.00%

Publicador:

Likelihood-maximising frameworks for enhanced in-car speech recognition

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech recognition in car environments has been identified as a valuable means for reducing driver distraction when operating non-critical in-car systems. Likelihood-maximising (LIMA) frameworks optimise speech enhancement algorithms based on recognised state sequences rather than traditional signal-level criteria such as maximising signal-to-noise ratio. Previously presented LIMA frameworks require calibration utterances to generate optimised enhancement parameters which are used for all subsequent utterances. Sub-optimal recognition performance occurs in noise conditions which are significantly different from that present during the calibration session - a serious problem in rapidly changing noise environments. We propose a dialog-based design which allows regular optimisation iterations in order to track the changing noise conditions. Experiments using Mel-filterbank spectral subtraction are performed to determine the optimisation requirements for vehicular environments and show that minimal optimisation assists real-time operation with improved speech recognition accuracy. It is also shown that the proposed design is able to provide improved recognition performance over frameworks incorporating a calibration session.

Visual speech recognition across multiple views

Relevância:

100.00% 100.00%

Publicador:

The effect of dialect mismatch on likelihood-maximising speech enhancement for noise-robust speech recognition

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traditional speech enhancement methods optimise signal-level criteria such as signal-to-noise ratio, but these approaches are sub-optimal for noise-robust speech recognition. Likelihood-maximising (LIMA) frameworks are an alternative that optimise parameters of enhancement algorithms based on state sequences generated for utterances with known transcriptions. Previous reports of LIMA frameworks have shown significant promise for improving speech recognition accuracies under additive background noise for a range of speech enhancement techniques. In this paper we discuss the drawbacks of the LIMA approach when multiple layers of acoustic mismatch are present – namely background noise and speaker accent. Experimentation using LIMA-based Mel-filterbank noise subtraction on American and Australian English in-car speech databases supports this discussion, demonstrating that inferior speech recognition performance occurs when a second layer of mismatch is seen during evaluation.

Speaker recognition modelling with artificial neural networks

Relevância:

100.00% 100.00%

Publicador:

Low-cost hardware speech enhancement for improved speech recognition in automotive environments

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Voice recognition is one of the key enablers to reduce driver distraction as in-vehicle systems become more and more complex. With the integration of voice recognition in vehicles, safety and usability are improved as the driver’s eyes and hands are not required to operate system controls. Whilst speaker independent voice recognition is well developed, performance in high noise environments (e.g. vehicles) is still limited. La Trobe University and Queensland University of Technology have developed a low-cost hardware-based speech enhancement system for automotive environments based on spectral subtraction and delay–sum beamforming techniques. The enhancement algorithms have been optimised using authentic Australian English collected under typical driving conditions. Performance tests conducted using speech data collected under variety of vehicle noise conditions demonstrate a word recognition rate improvement in the order of 10% or more under the noisiest conditions. Currently developed to a proof of concept stage there is potential for even greater performance improvement.

«
1
2
3
4
5
6
7
8
...
16
17
»