Biblioteca Digital

In this paper, we present a new approach to visual speech recognition which improves contextual modelling by combining Inter-Frame Dependent and Hidden Markov Models. This approach captures contextual information in visual speech that may be lost using a Hidden Markov Model alone. We apply contextual modelling to a large speaker independent isolated digit recognition task, and compare our approach to two commonly adopted feature based techniques for incorporating speech dynamics. Results are presented from baseline feature based systems and the combined modelling technique. We illustrate that both of these techniques achieve similar levels of performance when used independently. However significant improvements in performance can be achieved through a combination of the two. In particular we report an improvement in excess of 17% relative Word Error Rate in comparison to our best baseline system.

Veja mais

Prosodic and related features that signify emotional colouring in conversational speech.

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Emotion in speech: Towards an integration of linguistic, paralinguistic, and psychological analysis

Relevância:

20.00% 20.00%

Publicador:

Veja mais

On emotion recognition of faces and of speech using neural networks, fuzzy logic and the ASSESS system

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Placing Science in an Age of Oratory: Spaces of Scientific Speech in Mid-Victorian Edinburgh

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Adapting noisy speech models – extended uncertainty decoding

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Rise time and formant transition duration in the discrimination of speech sounds: The Ba–Wa distinction in developmental dyslexia

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Across languages, children with developmental dyslexia have a specific difficulty with the neural representation of the sound structure (phonological structure) of speech. One likely cause of their difficulties with phonology is a perceptual difficulty in auditory temporal processing (Tallal, 1980). Tallal (1980) proposed that basic auditory processing of brief, rapidly successive acoustic changes is compromised in dyslexia, thereby affecting phonetic discrimination (e.g. discriminating /b/ from /d/) via impaired discrimination of formant transitions (rapid acoustic changes in frequency and intensity). However, an alternative auditory temporal hypothesis is that the basic auditory processing of the slower amplitude modulation cues in speech is compromised (Goswami , 2002). Here, we contrast children's perception of a synthetic speech contrast (ba/wa) when it is based on the speed of the rate of change of frequency information (formant transition duration) versus the speed of the rate of change of amplitude modulation (rise time). We show that children with dyslexia have excellent phonetic discrimination based on formant transition duration, but poor phonetic discrimination based on envelope cues. The results explain why phonetic discrimination may be allophonic in developmental dyslexia (Serniclaes , 2004), and suggest new avenues for the remediation of developmental dyslexia. © 2010 Blackwell Publishing Ltd.

Veja mais

FPGA Implementation of a Pipelined Gaussian Calculation for HMM-Based Large Vocabulary Speech Recognition

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A scalable large vocabulary, speaker independent speech recognition system is being developed using Hidden Markov Models (HMMs) for acoustic modeling and a Weighted Finite State Transducer (WFST) to compile sentence, word, and phoneme models. The system comprises a software backend search and an FPGA-based Gaussian calculation which are covered here. In this paper, we present an efficient pipelined design implemented both as an embedded peripheral and as a scalable, parallel hardware accelerator. Both architectures have been implemented on an Alpha Data XRC-5T1, reconfigurable computer housing a Virtex 5 SX95T FPGA. The core has been tested and is capable of calculating a full set of Gaussian results from 3825 acoustic models in 9.03 ms which coupled with a backend search of 5000 words has provided an accuracy of over 80%. Parallel implementations have been designed with up to 32 cores and have been successfully implemented with a clock frequency of 133?MHz.

Veja mais

Why do they hate us? Making peace between psychology and prisoners:Making peace between prisoners and psychology

Relevância:

20.00% 20.00%

Publicador:

Veja mais

100 resultados para Hate Speech

Filtro por publicador