Biblioteca Digital

58 resultados para automatic speech recognition

em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast

Subband Correlation and Robust Speech Recognition

Relevância:

100.00% 100.00%

Publicador:

Veja mais

Robust Speech recognition using probabilistic union models

Relevância:

100.00% 100.00%

Publicador:

Veja mais

Speech Recognition with unknown partial feature corruption - a review of the union model

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper provides a summary of our studies on robust speech recognition based on a new statistical approach – the probabilistic union model. We consider speech recognition given that part of the acoustic features may be corrupted by noise. The union model is a method for basing the recognition on the clean part of the features, thereby reducing the effect of the noise on recognition. To this end, the union model is similar to the missing feature method. However, the two methods achieve this end through different routes. The missing feature method usually requires the identity of the noisy data for noise removal, while the union model combines the local features based on the union of random events, to reduce the dependence of the model on information about the noise. We previously investigated the applications of the union model to speech recognition involving unknown partial corruption in frequency band, in time duration, and in feature streams. Additionally, a combination of the union model with conventional noise-reduction techniques was studied, as a means of dealing with a mixture of known or trainable noise and unknown unexpected noise. In this paper, a unified review, in the context of dealing with unknown partial feature corruption, is provided into each of these applications, giving the appropriate theory and implementation algorithms, along with an experimental evaluation.

Veja mais

Noise compensation for speech recognition with arbitray additive noise

Relevância:

100.00% 100.00%

Publicador:

Veja mais

Modelling Sub-Band Correlation For Noise-Robust Speech Recognition

Relevância:

100.00% 100.00%

Publicador:

Veja mais

A New Posterior Based Audio-Visual Integration Method for Robust Speech Recognition

Relevância:

100.00% 100.00%

Publicador:

Veja mais

Union: a new approach for combining sub-band observations for noisy speech recognition

Relevância:

100.00% 100.00%

Publicador:

Veja mais

Audio-visual Integration for Robust Speech Recognition Using Maximum Weighted Stream Posteriors

Relevância:

100.00% 100.00%

Publicador:

Veja mais

Comparison of Image Transform-Based Features for Visual Speech Recognition in Clean and Corrupted Videos

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present results of a study into the performance of a variety of different image transform-based feature types for speaker-independent visual speech recognition of isolated digits. This includes the first reported use of features extracted using a discrete curvelet transform. The study will show a comparison of some methods for selecting features of each feature type and show the relative benefits of both static and dynamic visual features. The performance of the features will be tested on both clean video data and also video data corrupted in a variety of ways to assess each feature type's robustness to potential real-world conditions. One of the test conditions involves a novel form of video corruption we call jitter which simulates camera and/or head movement during recording.

Veja mais

Hidden Conditional Random Fields for Visual Speech Recognition

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present the application of Hidden Conditional Random Fields (HCRFs) to modelling speech for visual speech recognition. HCRFs may be easily adapted to model long range dependencies across an observation sequence. As a result visual word recognition performance can be improved as the model is able to take more of a contextual approach to generating state sequences. Results are presented from a speaker-dependent, isolated digit, visual speech recognition task using comparisons with a baseline HMM system. We firstly illustrate that word recognition rates on clean video using HCRFs can be improved by increasing the number of past and future observations being taken into account by each state. Secondly we compare model performances using various levels of video compression on the test set. As far as we are aware this is the first attempted use of HCRFs for visual speech recognition.

Veja mais

Combining noise compensation and missing-feature decoding for large vocabulary speech recognition in noise

Relevância:

100.00% 100.00%

Publicador:

Veja mais

Inter-Frame Contextual Modelling For Visual Speech Recognition

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we present a new approach to visual speech recognition which improves contextual modelling by combining Inter-Frame Dependent and Hidden Markov Models. This approach captures contextual information in visual speech that may be lost using a Hidden Markov Model alone. We apply contextual modelling to a large speaker independent isolated digit recognition task, and compare our approach to two commonly adopted feature based techniques for incorporating speech dynamics. Results are presented from baseline feature based systems and the combined modelling technique. We illustrate that both of these techniques achieve similar levels of performance when used independently. However significant improvements in performance can be achieved through a combination of the two. In particular we report an improvement in excess of 17% relative Word Error Rate in comparison to our best baseline system.

Veja mais

FPGA Implementation of a Pipelined Gaussian Calculation for HMM-Based Large Vocabulary Speech Recognition

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A scalable large vocabulary, speaker independent speech recognition system is being developed using Hidden Markov Models (HMMs) for acoustic modeling and a Weighted Finite State Transducer (WFST) to compile sentence, word, and phoneme models. The system comprises a software backend search and an FPGA-based Gaussian calculation which are covered here. In this paper, we present an efficient pipelined design implemented both as an embedded peripheral and as a scalable, parallel hardware accelerator. Both architectures have been implemented on an Alpha Data XRC-5T1, reconfigurable computer housing a Virtex 5 SX95T FPGA. The core has been tested and is capable of calculating a full set of Gaussian results from 3825 acoustic models in 9.03 ms which coupled with a backend search of 5000 words has provided an accuracy of over 80%. Parallel implementations have been designed with up to 32 cores and have been successfully implemented with a clock frequency of 133?MHz.

Veja mais

Acceleration Of HMM-Based Speech Recognition System By Parallel FPGA Gaussian Calculation

Relevância:

100.00% 100.00%

Publicador:

Veja mais

GPU acceleration of Automated Speech Recognition for Mobile Devices

Relevância:

100.00% 100.00%

Publicador:

Veja mais

58 resultados para automatic speech recognition

em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast

Filtro por publicador