118 resultados para speech databases


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This output is an invited and refereed chapter in the second of the two book length outputs resulting from the EU HUMAINE grant and follow-on grants. The book is in the OUP Affective Science Series and is intended to provide a theoretically oriented state of the art model for those working in the area of affective computing. Each chapter provides a synthesis of a specific area and presents new data/findings/approaches developed by the author(s) which take the area further. This chapter is in the section on ‘Approaches to developing expression corpora and databases.’ The chapter provides a critical synthesis of the issues involved in databases for affective computing and introduces the SEMAINE SAL Database, developed as an integral part of the EU SEMAINE Project (The Sensitive Agent Project 2008-2011) which is an interdisciplinary project. The project aimed to develop a computer interface that would allow a human to interact with an artificial agent in an emotional manner.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

For many applications of emotion recognition, such as virtual agents, the system must select responses while the user is speaking. This requires reliable on-line recognition of the user’s affect. However most emotion recognition systems are based on turnwise processing. We present a novel approach to on-line emotion recognition from speech using Long Short-Term Memory Recurrent Neural Networks. Emotion is recognised frame-wise in a two-dimensional valence-activation continuum. In contrast to current state-of-the-art approaches, recognition is performed on low-level signal frames, similar to those used for speech recognition. No statistical functionals are applied to low-level feature contours. Framing at a higher level is therefore unnecessary and regression outputs can be produced in real-time for every low-level input frame. We also investigate the benefits of including linguistic features on the signal frame level obtained by a keyword spotter.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This is a study of free speech and hate speech with reference to the international standards and to the United States jurisprudence. The study, in a comparative and critical fashion, depicts the historical evolution and the application of the concept of ‘free speech,’ within the context of ‘hate speech.’ The main question of this article is how free speech can be discerned from hate speech, and whether the latter should be restricted. To this end, it examines the regulation of free speech under the First Amendment to the United States Constitution, and in light of the international standards, particularly under the International Convention on the Elimination of All Forms of Racial Discrimination, International Covenant on Civil and Political Rights, and the European Convention on Human Rights and Fundamental Freedoms. The study not only illustrates how elusive the endeavour of striking a balance between free speech and other vital interests could be, but also discusses whether and how hate speech should be eliminated within the ‘marketplace of ideas.’

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Across languages, children with developmental dyslexia have a specific difficulty with the neural representation of the sound structure (phonological structure) of speech. One likely cause of their difficulties with phonology is a perceptual difficulty in auditory temporal processing (Tallal, 1980). Tallal (1980) proposed that basic auditory processing of brief, rapidly successive acoustic changes is compromised in dyslexia, thereby affecting phonetic discrimination (e.g. discriminating /b/ from /d/) via impaired discrimination of formant transitions (rapid acoustic changes in frequency and intensity). However, an alternative auditory temporal hypothesis is that the basic auditory processing of the slower amplitude modulation cues in speech is compromised (Goswami , 2002). Here, we contrast children's perception of a synthetic speech contrast (ba/wa) when it is based on the speed of the rate of change of frequency information (formant transition duration) versus the speed of the rate of change of amplitude modulation (rise time). We show that children with dyslexia have excellent phonetic discrimination based on formant transition duration, but poor phonetic discrimination based on envelope cues. The results explain why phonetic discrimination may be allophonic in developmental dyslexia (Serniclaes , 2004), and suggest new avenues for the remediation of developmental dyslexia. © 2010 Blackwell Publishing Ltd.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A scalable large vocabulary, speaker independent speech recognition system is being developed using Hidden Markov Models (HMMs) for acoustic modeling and a Weighted Finite State Transducer (WFST) to compile sentence, word, and phoneme models. The system comprises a software backend search and an FPGA-based Gaussian calculation which are covered here. In this paper, we present an efficient pipelined design implemented both as an embedded peripheral and as a scalable, parallel hardware accelerator. Both architectures have been implemented on an Alpha Data XRC-5T1, reconfigurable computer housing a Virtex 5 SX95T FPGA. The core has been tested and is capable of calculating a full set of Gaussian results from 3825 acoustic models in 9.03 ms which coupled with a backend search of 5000 words has provided an accuracy of over 80%. Parallel implementations have been designed with up to 32 cores and have been successfully implemented with a clock frequency of 133?MHz.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

There are multiple reasons to expect that recognising the verbal content of emotional speech will be a difficult problem, and recognition rates reported in the literature are in fact low. Including information about prosody improves recognition rate for emotions simulated by actors, but its relevance to the freer patterns of spontaneous speech is unproven. This paper shows that recognition rate for spontaneous emotionally coloured speech can be improved by using a language model based on increased representation of emotional utterances. The models are derived by adapting an already existing corpus, the British National Corpus (BNC). An emotional lexicon is used to identify emotionally coloured words, and sentences containing these words are recombined with the BNC to form a corpus with a raised proportion of emotional material. Using a language model based on that technique improves recognition rate by about 20%. (c) 2005 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador: