944 resultados para Visual Speech Recognition, Multiple Views, Frontal View, Profile View


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Model compensation is a standard way of improving the robustness of speech recognition systems to noise. A number of popular schemes are based on vector Taylor series (VTS) compensation, which uses a linear approximation to represent the influence of noise on the clean speech. To compensate the dynamic parameters, the continuous time approximation is often used. This approximation uses a point estimate of the gradient, which fails to take into account that dynamic coefficients are a function of a number of consecutive static coefficients. In this paper, the accuracy of dynamic parameter compensation is improved by representing the dynamic features as a linear transformation of a window of static features. A modified version of VTS compensation is applied to the distribution of the window of static features and, importantly, their correlations. These compensated distributions are then transformed to distributions over standard static and dynamic features. With this improved approximation, it is also possible to obtain full-covariance corrupted speech distributions. This addresses the correlation changes that occur in noise. The proposed scheme outperformed the standard VTS scheme by 10% to 20% relative on a range of tasks. © 2006 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For speech recognition, mismatches between training and testing for speaker and noise are normally handled separately. The work presented in this paper aims at jointly applying speaker adaptation and model-based noise compensation by embedding speaker adaptation as part of the noise mismatch function. The proposed method gives a faster and more optimum adaptation compared to compensating for these two factors separately. It is also more consistent with respect to the basic assumptions of speaker and noise adaptation. Experimental results show significant and consistent gains from the proposed method. © 2011 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently there has been interest in structured discriminative models for speech recognition. In these models sentence posteriors are directly modelled, given a set of features extracted from the observation sequence, and hypothesised word sequence. In previous work these discriminative models have been combined with features derived from generative models for noise-robust speech recognition for continuous digits. This paper extends this work to medium to large vocabulary tasks. The form of the score-space extracted using the generative models, and parameter tying of the discriminative model, are both discussed. Update formulae for both conditional maximum likelihood and minimum Bayes' risk training are described. Experimental results are presented on small and medium to large vocabulary noise-corrupted speech recognition tasks: AURORA 2 and 4. © 2011 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Structured precision modelling is an important approach to improve the intra-frame correlation modelling of the standard HMM, where Gaussian mixture model with diagonal covariance are used. Previous work has all been focused on direct structured representation of the precision matrices. In this paper, a new framework is proposed, where the structure of the Cholesky square root of the precision matrix is investigated, referred to as Cholesky Basis Superposition (CBS). Each Cholesky matrix associated with a particular Gaussian distribution is represented as a linear combination of a set of Gaussian independent basis upper-triangular matrices. Efficient optimization methods are derived for both combination weights and basis matrices. Experiments on a Chinese dictation task showed that the proposed approach can significantly outperformed the direct structured precision modelling with similar number of parameters as well as full covariance modelling. © 2011 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Model compensation methods for noise-robust speech recognition have shown good performance. Predictive linear transformations can approximate these methods to balance computational complexity and compensation accuracy. This paper examines both of these approaches from a variational perspective. Using a matched-pair approximation at the component level yields a number of standard forms of model compensation and predictive linear transformations. However, a tighter bound can be obtained by using variational approximations at the state level. Both model-based and predictive linear transform schemes can be implemented in this framework. Preliminary results show that the tighter bound obtained from the state-level variational approach can yield improved performance over standard schemes. © 2011 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Model-based approaches to handling additive background noise and channel distortion, such as Vector Taylor Series (VTS), have been intensively studied and extended in a number of ways. In previous work, VTS has been extended to handle both reverberant and background noise, yielding the Reverberant VTS (RVTS) scheme. In this work, rather than assuming the observation vector is generated by the reverberation of a sequence of background noise corrupted speech vectors, as in RVTS, the observation vector is modelled as a superposition of the background noise and the reverberation of clean speech. This yields a new compensation scheme RVTS Joint (RVTSJ), which allows an easy formulation for joint estimation of both additive and reverberation noise parameters. These two compensation schemes were evaluated and compared on a simulated reverberant noise corrupted AURORA4 task. Both yielded large gains over VTS baseline system, with RVTSJ outperforming the previous RVTS scheme. © 2011 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As-built models have been proven useful in many project-related applications, such as progress monitoring and quality control. However, they are not widely produced in most projects because a lot of effort is still necessary to manually convert remote sensing data from photogrammetry or laser scanning to an as-built model. In order to automate the generation of as-built models, the first and fundamental step is to automatically recognize infrastructure-related elements from the remote sensing data. This paper outlines a framework for creating visual pattern recognition models that can automate the recognition of infrastructure-related elements based on their visual features. The framework starts with identifying the visual characteristics of infrastructure element types and numerically representing them using image analysis tools. The derived representations, along with their relative topology, are then used to form element visual pattern recognition (VPR) models. So far, the VPR models of four infrastructure-related elements have been created using the framework. The high recognition performance of these models validates the effectiveness of the framework in recognizing infrastructure-related elements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As-built models have been proven useful in many project-related applications, such as progress monitoring and quality control. However, they are not widely produced in most projects because a lot of effort is still necessary to manually convert remote sensing data from photogrammetry or laser scanning to an as-built model. In order to automate the generation of as-built models, the first and fundamental step is to automatically recognize infrastructure-related elements from the remote sensing data. This paper outlines a framework for creating visual pattern recognition models that can automate the recognition of infrastructure-related elements based on their visual features. The framework starts with identifying the visual characteristics of infrastructure element types and numerically representing them using image analysis tools. The derived representations, along with their relative topology, are then used to form element visual pattern recognition (VPR) models. So far, the VPR models of four infrastructure-related elements have been created using the framework. The high recognition performance of these models validates the effectiveness of the framework in recognizing infrastructure-related elements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mandarin Chinese is based on characters which are syllabic in nature and morphological in meaning. All spoken languages have syllabiotactic rules which govern the construction of syllables and their allowed sequences. These constraints are not as restrictive as those learned from word sequences, but they can provide additional useful linguistic information. Hence, it is possible to improve speech recognition performance by appropriately combining these two types of constraints. For the Chinese language considered in this paper, character level language models (LMs) can be used as a first level approximation to allowed syllable sequences. To test this idea, word and character level n-gram LMs were trained on 2.8 billion words (equivalent to 4.3 billion characters) of texts from a wide collection of text sources. Both hypothesis and model based combination techniques were investigated to combine word and character level LMs. Significant character error rate reductions up to 7.3% relative were obtained on a state-of-the-art Mandarin Chinese broadcast audio recognition task using an adapted history dependent multi-level LM that performs a log-linearly combination of character and word level LMs. This supports the hypothesis that character or syllable sequence models are useful for improving Mandarin speech recognition performance.