Biblioteca Digital

109 resultados para speech databases

em Queensland University of Technology - ePrints Archive

The effect of dialect mismatch on likelihood-maximising speech enhancement for noise-robust speech recognition

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Traditional speech enhancement methods optimise signal-level criteria such as signal-to-noise ratio, but these approaches are sub-optimal for noise-robust speech recognition. Likelihood-maximising (LIMA) frameworks are an alternative that optimise parameters of enhancement algorithms based on state sequences generated for utterances with known transcriptions. Previous reports of LIMA frameworks have shown significant promise for improving speech recognition accuracies under additive background noise for a range of speech enhancement techniques. In this paper we discuss the drawbacks of the LIMA approach when multiple layers of acoustic mismatch are present – namely background noise and speaker accent. Experimentation using LIMA-based Mel-filterbank noise subtraction on American and Australian English in-car speech databases supports this discussion, demonstrating that inferior speech recognition performance occurs when a second layer of mismatch is seen during evaluation.

The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The QUT-NOISE-TIMIT corpus consists of 600 hours of noisy speech sequences designed to enable a thorough evaluation of voice activity detection (VAD) algorithms across a wide variety of common background noise scenarios. In order to construct the final mixed-speech database, a collection of over 10 hours of background noise was conducted across 10 unique locations covering 5 common noise scenarios, to create the QUT-NOISE corpus. This background noise corpus was then mixed with speech events chosen from the TIMIT clean speech corpus over a wide variety of noise lengths, signal-to-noise ratios (SNRs) and active speech proportions to form the mixed-speech QUT-NOISE-TIMIT corpus. The evaluation of five baseline VAD systems on the QUT-NOISE-TIMIT corpus is conducted to validate the data and show that the variety of noise available will allow for better evaluation of VAD systems than existing approaches in the literature.

The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The QUT-NOISE-SRE protocol is designed to mix the large QUT-NOISE database, consisting of over 10 hours of back- ground noise, collected across 10 unique locations covering 5 common noise scenarios, with commonly used speaker recognition datasets such as Switchboard, Mixer and the speaker recognition evaluation (SRE) datasets provided by NIST. By allowing common, clean, speech corpora to be mixed with a wide variety of noise conditions, environmental reverberant responses, and signal-to-noise ratios, this protocol provides a solid basis for the development, evaluation and benchmarking of robust speaker recognition algorithms, and is freely available to download alongside the QUT-NOISE database. In this work, we use the QUT-NOISE-SRE protocol to evaluate a state-of-the-art PLDA i-vector speaker recognition system, demonstrating the importance of designing voice-activity-detection front-ends specifically for speaker recognition, rather than aiming for perfect coherence with the true speech/non-speech boundaries.

Quantitative EEG Normative Databases: A Comparative Investigation

Relevância:

20.00% 20.00%

Publicador:

Adaptive Fusion of Speech and Lip Information for Robust Speaker identification

Relevância:

20.00% 20.00%

Publicador:

Tenancy Databases, Professional Practices and Housing Access among Low-Income Tenants in the Private Rental Sector in Australia

Relevância:

20.00% 20.00%

Publicador:

Application of the Trended Hidden Markov Model to Speech Synthesis

Relevância:

20.00% 20.00%

Publicador:

Speech Enhancement by Formant Sharpening in the Cepstral Domain

Relevância:

20.00% 20.00%

Publicador:

Automatic Speech Segmentation with HMM

Relevância:

20.00% 20.00%

Publicador:

Multilingual Phone Clustering for Recognition of Spontaneous Indonesian Speech Utilising Pronunciation Modelling Techniques

Relevância:

20.00% 20.00%

Publicador:

Characterising Learners: Speech-language Difficulties and ESL

Relevância:

20.00% 20.00%

Publicador:

Using a Free-Parts Representation for Visual Speech Recognition

Relevância:

20.00% 20.00%

Publicador:

A Hybrid LP-Harmonics Model for Low Bit-Rate Speech Compression with Natural Quality

Relevância:

20.00% 20.00%

Publicador:

Adaptive Parameter Compensation for Robust Hands-Free Speech Recognition Using a Dual Beamforming Microphone Array

Relevância:

20.00% 20.00%

Publicador:

An investigation of HMM classifier combination ctrategies for improved audio-visual speech recognition

Relevância:

20.00% 20.00%

Publicador:

«
1
2
3
4
5
6
7
8
»