927 resultados para Expressive speech
Resumo:
In order to obtain more human like sounding humanmachine interfaces we must first be able to give them expressive capabilities in the way of emotional and stylistic features so as to closely adequate them to the intended task. If we want to replicate those features it is not enough to merely replicate the prosodic information of fundamental frequency and speaking rhythm. The proposed additional layer is the modification of the glottal model, for which we make use of the GlottHMM parameters. This paper analyzes the viability of such an approach by verifying that the expressive nuances are captured by the aforementioned features, obtaining 95% recognition rates on styled speaking and 82% on emotional speech. Then we evaluate the effect of speaker bias and recording environment on the source modeling in order to quantify possible problems when analyzing multi-speaker databases. Finally we propose a speaking styles separation for Spanish based on prosodic features and check its perceptual significance.
Resumo:
When designing human-machine interfaces it is important to consider not only the bare bones functionality but also the ease of use and accessibility it provides. When talking about voice-based inter- faces, it has been proven that imbuing expressiveness into the synthetic voices increases signi?cantly its perceived naturalness, which in the end is very helpful when building user friendly interfaces. This paper proposes an adaptation based expressiveness transplantation system capable of copying the emotions of a source speaker into any desired target speaker with just a few minutes of read speech and without requiring the record- ing of additional expressive data. This system was evaluated through a perceptual test for 3 speakers showing up to an average of 52% emotion recognition rates relative to the natural voice recognition rates, while at the same time keeping good scores in similarity and naturality.
Resumo:
This paper describes a module for the prediction of emotions in text chats in Spanish, oriented to its use in specific-domain text-to-speech systems. A general overview of the system is given, and the results of some evaluations carried out with two corpora of real chat messages are described. These results seem to indicate that this system offers a performance similar to other systems described in the literature, for a more complex task than other systems (identification of emotions and emotional intensity in the chat domain).
Resumo:
Dans ce mémoire, les contes de trois conteurs contemporains du Québec – Jos Gallant d’André Lemelin, Ti Pinge de Joujou Turenne et L’entrain à vapeur, de Fred Pellerin – font avant tout l’objet d’une lecture pragmatique afin de mieux comprendre comment le conteur, qui emploie le canevas en spectacle, transmet une fiction à un auditoire ou à un lectorat. L’étude présente d’abord une analyse comparative de chacune des prestations avec la version publiée d’un même récit et met ainsi en relief leurs points de convergence et de divergence. Selon l’hypothèse avancée, l’analyse de la prestation des conteurs qui suivent un canevas révèlerait comment s’y manifestent les dimensions performatives et les articulations du discours fictionnel. Corrélativement, l’examen des rapports entre le conteur et son public permet ensuite de s’interroger sur le statut du narrateur et de voir en quoi et comment, durant la performance, la fiction est partagée avec l’auditoire. L’analyse des énoncés performatifs, inspirés des travaux de Kerbrat-Orechionni et la dynamique de vectorisation proposée par Pavis pour l’étude de la gestuelle, des mimiques et de la voix, sont mises à contribution et visent également à dégager les outils pouvant servir à l’analyse des spectacles de contes. Au terme de cette recherche, l’auteure démontre les avantages liés au canevas, notamment en ce qui concerne les interactions qu’il favorise avec le public et dans la liberté qu’il procure, en permettant de modifier ou d’adapter le discours et les ressources expressives du conteur à chacune de ses représentations.
Resumo:
Les objectifs de ce programme de recherche étaient, d’une part, d’apporter une compréhension critique des techniques non-invasives utilisées dans la localisation et/ou la latéralisation des aires langagières et mnésiques en tenant compte de leurs avantages, de leurs limites propres ainsi que de leur pertinence dans un contexte clinique. D’autre part, d’approfondir notre compréhension de l’organisation cérébrale langagière auprès d’une population de sujets ayant une agénésie du corps calleux en utilisant un protocole de neuroimagerie. Afin de répondre à notre premier objectif, une revue critique de la littérature des méthodes de neuroimagerie utilisées pour la latéralisation et la localisation des aires cérébrales sous-tendant le traitement langagier et mnésique dans le contexte du bilan préchirurgical des patients épileptiques a été effectuée. Ce travail a permis d’identifier que certaines de ces nouvelles techniques et plus spécialement leur combinaison, montrent un potentiel réel dans ce contexte clinique. Cette recherche a également permis de mettre en lumière que ces méthodes ont encore un grand besoin d’être raffinées et standardisées avant d’être utilisées comme remplacement au test à l’amobarbital intracarotidien dans un contexte clinique sécuritaire. Afin de répondre à notre deuxième objectif, nous avons exploré les patrons de latéralisation du langage auprès de six sujets acalleux en utilisant un protocle d’imagerie par résonance magnétique fonctionnelle (IRMf). Les résultats indiquent que les individus ayant une agénésie du corps calleux montrent un patron d’activation cérébrale tout aussi latéralisé que nos deux groupes contrôles (QI apparié et QI élevé) lors du traitement du langage réceptif. Les sujets ayant une agénésie du corps calleux montrent également un patron de latéralisation comparable à leur groupe contrôle apparié pour le QI pour la tâche de langage expressif. Lorsque l’on compare les sujets ayant une agénésie du corps calleux au groupe contrôle de QI élevé, ces derniers montrent une latéralisation moins marquée uniquement pour la région frontale lors de la tâche de langage expressif. En conclusion, les résultats de cette étude ne supportent pas l’affirmation que le corps calleux jouerait un rôle inhibiteur essentiel afin de permettre un développement normal de la latéralisation hémisphérique pour le langage.
Resumo:
The significance of the body in electronic music parties as a sign for communicating and socializing among participants is the focus of this work. Qualitative research undertaken in this study seeks to investigate how sociability happens at raves and nightclubs in Natal/RN. Sociability is understood here as a play expression involving the dimensions of music, dance and party; the body, seen from a transdisciplinary approach, is understood as a symbolic instance, with its own meanings, as a result and a producer of social and as a cross between the cultural and the biological. The body has a communicative potential, is primary media. An intersection point between nature and culture, it serves as the seat of emotions and sociability, since it is through it that social relations are made. In electronic music parties, the body is interpreted based on its communication signs: clothing, accessories, body movements, tactile contact, body language, interactions between the public and dj, the dj and the public, gestures, expressive speech of emotions. Through such signs, body communication and a sense of community among participants develop sociability in the festive place and change the mood of the dancers. The Natal s electronic music parties young goer interacts on parties, adopts cheerful and receptive positions towards the other, maintains physical contact, values dance as a form of communication and lists happiness as the main feeling aroused in electronic music festivals. To achieve this result, a plurimetodological approach was used, which consisted of various methodological devices and various techniques of investigation: ethnographic observation, individual and informal interview techniques, photographic record of the scene, in-depth interview and application thirty questionnaires to patrons of electronic music parties
Resumo:
A fala apresenta aspectos paralinguísticos que não pertencem ao código linguístico convencional, mas contribuem significativamente para a unidade temática do discurso, Essas realizações se constituem em enunciados não-lexicalizados que funcionam que funcionam como atos de fala completos nas interações comunicativas interpessoais. Sobre essas emissões não-verbais, Campbell (2002a, 2002b, 2003 e 2004), Maekawa (2004), Fujie et. al (2004), Hoult (2004), Key (1958) apud Steimberg (1988) postulam que elas constribuem para a manifestação da fala expressiva. Para os autores, é justamente o fenômeno da paralinguagem que sinaliza informações sobre atitudes, opiniões e emoções do falante em relação ao interlocutor ou ao tópico discursivo. Nesse sentido, investigamos, neste trabalho, as manifestações paralinguísticas recorrentes em conversas informais para demonstrarmos seu papel expressivo na linguagem falada. Para tanto, fizemos um levantamento de 450 ocorrências de elementos paralinguísticos no processo de transcrição de amostras de falas do Português Regional Paraense produzidas em situações reais de conversação. Pressupondo que essas realizações não-verbais são caracterizadas por variações prosódicas, nós as submetemos a uma análise fonética por meio do software PRAAT. A partir dessa análise, constatamos a contribuição de duas propriedades: a frequência fundamental (F0) e o tempo de emissão, para a manifestação expressiva dos elementos paralinguísticos no discurso falado. Além disso, identificamos também a silabação como uma propriedade comum às realizações sonoras focalizadas. Após o processo de análise, fizemos a descrição do uso e do funcionamento desses elementos nas conversas, bem como da contribuição deles para a manifestação da fala expressiva. Os resultados nos mostram que os elementos paralinguísticos, além de contribuírem para a fluência do discurso falado, desempenham a função de sinalizar compreensão, interesse e/ou atenção, gerenciar relações interpessoais e expressar emoções, atitudes e afeto.
Resumo:
This essay asks whether there is a relation between action-serving and meaning-serving intentions. The idea that the intentions involved in meaning and action are nominally designated alike as intentionalities does not guarantee any special logical or conceptual connections between the intentionality of referential thoughts and thought-expressive speech acts with the intentionality of doing. The latter category is typified by overt physical actions in order to communicate by engaging in speech acts, but also includes at the origin of all artistic and symbolic expression such cerebral and linguistic doings as thinking propositional thoughts. There are exactly four possibilities by which meaning and action intentionalities might be related to be systematically investigated. Meaning-serving and action-serving intentionalities, topologically speaking, might exclude one another, partially overlap with one another, or subsume one in the other or the other in the one. The theoretical separation of the two ostensible categories of intendings is criticized, as is their partial overlap, in light of the proposal that thinking and artistic and symbolic expression are activities that favor the inclusion of paradigm meaning-serving intentions as among a larger domain of action-serving intentions. The only remaining alternative is then developed, of including action-serving intentions reductively in meaning-serving intentions, and is defended as offering in an unexpected way the most cogent universal reductive ontology in which the intentionality of doing generally relates to the specific intentionality of referring in thought to the objects of predications, and of its artistic and symbolic expression.
Resumo:
This demo concerns a recently developed prototype of an emotionally-sensitive autonomous HiFi Spoken Conversa- tional Agent, called NEMOHIFI. The baseline agent was developed by the Speech Technology Group (GTH) and has recently been integrated with an emotional engine called NEMO (Need-inspired Emotional Model) to enable it to adapt to users emotion and respond to the users using ap- propriate expressive speech. NEMOHIFI controls and man- ages the HiFi audio system, and for end users, its functions equate a remote control, except that instead of clicking, the user interacts with the agent using voice. A pairwise com- parison between the baseline (non-adaptive) and NEMO- HIFI is also presented.
Resumo:
Few studies have focused on the language acquisition of higher multiple birth sets. In this study, the communication skills of 51 triplet children are described. The measures used were: mean length of utterance; type-token ratio; conversational nets; phoneme repertoire; and number of different types of phonological processes used. The data gained were used to compare the communication skills of triplets with those of twins, singletons and normative data available in the literature. Siblings within triplet sets were also compared using language samples obtained from adult-child interactions and when the three children were playing together. The results indicated that the triplets' early communication skills were different from those of both singletons and twins. The triplets' difficulties included delayed syntactic development, limited use of different language functions and delayed phonological development. In contrast, twins' communication profile is characterised by disordered phonological development.
Resumo:
Background: In Portugal, the routine clinical practice of speech and language therapists (SLTs) in treating children with all types of speech sound disorder (SSD) continues to be articulation therapy (AT). There is limited use of phonological therapy (PT) or phonological awareness training in Portugal. Additionally, at an international level there is a focus on collecting information on and differentiating between the effectiveness of PT and AT for children with different types of phonologically based SSD, as well as on the role of phonological awareness in remediating SSD. It is important to collect more evidence for the most effective and efficient type of intervention approach for different SSDs and for these data to be collected from diverse linguistic and cultural perspectives. Aims: To evaluate the effectiveness of a PT and AT approach for treatment of 14 Portuguese children, aged 4.0–6.7 years, with a phonologically based SSD. Methods & Procedures: The children were randomly assigned to one of the two treatment approaches (seven children in each group). All children were treated by the same SLT, blind to the aims of the study, over three blocks of a total of 25 weekly sessions of intervention. Outcome measures of phonological ability (percentage of consonants correct (PCC), percentage occurrence of different phonological processes and phonetic inventory) were taken before and after intervention. A qualitative assessment of intervention effectiveness from the perspective of the parents of participants was included. Outcomes & Results: Both treatments were effective in improving the participants’ speech, with the children receiving PT showing a more significant improvement in PCC score than those receiving the AT. Children in the PT group also showed greater generalization to untreated words than those receiving AT. Parents reported both intervention approaches to be as effective in improving their children’s speech. Conclusions & Implications: The PT (combination of expressive phonological tasks, phonological awareness, listening and discrimination activities) proved to be an effective integrated method of improving phonological SSD in children. These findings provide some evidence for Portuguese SLTs to employ PT with children with phonologically based SSD
Resumo:
This case study presents corpus data gathered from a Spanish-English bilingual child with expressive language delay. Longitudinal data on the child’s linguistic development was collected from the onset of productive speech at age 1;1 until age 4 over the course of 28 video-taped sessions with the child’s principal caregivers. A literature review focused on the relationship between language delay and persisting disorders—including a discussion of the frequent difficulty in distinguishing between the two at early stages of bilingual development—is followed by an analysis of the child’s productive development in 2 distinct phases. An attempt is made to assess the child’s speech at age 4 for preliminary signs of SLI and to consider techniques for identifying ‘at risk’ bilingual children (that is, those with productive language delay, poor oral fluency, and family history of language problems) based on samples of recorded and transcribed speech.
Resumo:
The present study investigates the predictive value of the early appearance of simultaneous pointing-speech combinations. An experimental task was used to obtain a communicative productive sample from nineteen children at 1;0 and 1;3. Infant’s communicative productions, in combination with gaze joint engagement patterns, were analyzed in relation to different social conditions. The results show a significant effect of age and social condition on infants’ communicative productions. Gesture-speech combinations seem to work as a strong communicative resource to attract the adult’s attention in social demanding communicative contexts. Gaze joint engagement was used in combination with simultaneous pointing-speech combinations to attract adults’ attention during social demanding conditions. Finally, the use of simultaneous pointing-speech combinations at 1;0 in demanding conditions predicted greater expressive vocabulary acquisition at 1;3 and 1;6. These results indicate that the use of gesture-speech combinations may be considered a significant step towards the early integration of language components.
Resumo:
Traditional Text-To-Speech (TTS) systems have been developed using especially-designed non-expressive scripted recordings. In order to develop a new generation of expressive TTS systems in the Simple4All project, real recordings from the media should be used for training new voices with a whole new range of speaking styles. However, for processing this more spontaneous material, the new systems must be able to deal with imperfect data (multi-speaker recordings, background and foreground music and noise), filtering out low-quality audio segments and creating mono-speaker clusters. In this paper we compare several architectures for combining speaker diarization and music and noise detection which improve the precision and overall quality of the segmentation.
Resumo:
Investigation of the three-generation KE family, half of whose members are affected by a pronounced verbal dyspraxia, has led to identification of their core deficit as one involving sequential articulation and orofacial praxis. A positron emission tomography activation study revealed functional abnormalities in both cortical and subcortical motor-related areas of the frontal lobe, while quantitative analyses of magnetic resonance imaging scans revealed structural abnormalities in several of these same areas, particularly the caudate nucleus, which was found to be abnormally small bilaterally. A recent linkage study [Fisher, S., Vargha-Khadem, F., Watkins, K. E., Monaco, A. P. & Pembry, M. E. (1998) Nat. Genet. 18, 168–170] localized the abnormal gene (SPCH1) to a 5.6-centiMorgan interval in the chromosomal band 7q31. The genetic mutation or deletion in this region has resulted in the abnormal development of several brain areas that appear to be critical for both orofacial movements and sequential articulation, leading to marked disruption of speech and expressive language.