979 resultados para task recognition


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Model compensation is a standard way of improving the robustness of speech recognition systems to noise. A number of popular schemes are based on vector Taylor series (VTS) compensation, which uses a linear approximation to represent the influence of noise on the clean speech. To compensate the dynamic parameters, the continuous time approximation is often used. This approximation uses a point estimate of the gradient, which fails to take into account that dynamic coefficients are a function of a number of consecutive static coefficients. In this paper, the accuracy of dynamic parameter compensation is improved by representing the dynamic features as a linear transformation of a window of static features. A modified version of VTS compensation is applied to the distribution of the window of static features and, importantly, their correlations. These compensated distributions are then transformed to distributions over standard static and dynamic features. With this improved approximation, it is also possible to obtain full-covariance corrupted speech distributions. This addresses the correlation changes that occur in noise. The proposed scheme outperformed the standard VTS scheme by 10% to 20% relative on a range of tasks. © 2006 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Model based compensation schemes are a powerful approach for noise robust speech recognition. Recently there have been a number of investigations into adaptive training, and estimating the noise models used for model adaptation. This paper examines the use of EM-based schemes for both canonical models and noise estimation, including discriminative adaptive training. One issue that arises when estimating the noise model is a mismatch between the noise estimation approximation and final model compensation scheme. This paper proposes FA-style compensation where this mismatch is eliminated, though at the expense of a sensitivity to the initial noise estimates. EM-based discriminative adaptive training is evaluated on in-car and Aurora4 tasks. FA-style compensation is then evaluated in an incremental mode on the in-car task. © 2011 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

For many realistic scenarios, there are multiple factors that affect the clean speech signal. In this work approaches to handling two such factors, speaker and background noise differences, simultaneously are described. A new adaptation scheme is proposed. Here the acoustic models are first adapted to the target speaker via an MLLR transform. This is followed by adaptation to the target noise environment via model-based vector Taylor series (VTS) compensation. These speaker and noise transforms are jointly estimated, using maximum likelihood. Experiments on the AURORA4 task demonstrate that this adaptation scheme provides improved performance over VTS-based noise adaptation. In addition, this framework enables the speech and noise to be factorised, allowing the speaker transform estimated in one noise condition to be successfully used in a different noise condition. © 2011 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

One important issue in designing state-of-the-art LVCSR systems is the choice of acoustic units. Context dependent (CD) phones remain the dominant form of acoustic units. They can capture the co-articulatory effect in speech via explicit modelling. However, for other more complicated phonological processes, they rely on the implicit modelling ability of the underlying statistical models. Alternatively, it is possible to construct acoustic models based on higher level linguistic units, for example, syllables, to explicitly capture these complex patterns. When sufficient training data is available, this approach may show an advantage over implicit acoustic modelling. In this paper a wide range of acoustic units are investigated to improve LVCSR system performance. Significant error rate gains up to 7.1% relative (0.8% abs.) were obtained on a state-of-the-art Mandarin Chinese broadcast audio recognition task using word and syllable position dependent triphone and quinphone models. © 2011 IEEE.