966 resultados para Speech Communication
Resumo:
The aim of this study was to compare the prosodic profiles of English and Spanish-speaking children with WS, examining cross-linguistic differences. Two groups of children with WS, English and Spanish, of similar chronological and nonverbal mental age, were compared on performance in expressive and receptive prosodic tasks from the Profiling Elements of Prosody in Speech-Communication (PEPS-C) battery in its English or Spanish version. Differences between the English and Spanish WS groups were found regarding the understanding of affect through prosodic means,using prosody to make words more prominent, and imitating different prosodic patterns. Such differences between the two WS groups on function prosody tasks mirrored the cross-linguistic differences already reported in typically developing children.
Resumo:
The aim of the present study is to investigate the developmental profile of three aspects of prosody function, i.e. affect, focus and turn-endings in children with Williams and in those with Down’s syndrome compared to typically developing English speaking children. The tasks used were part of the computer-based battery, Profiling Elements of Prosody for Speech Communication (Peppe, McCann & Gibon, 2003). Cross-sectional developmental trajectories linking chronological and non-verbal mental age and affects and turn-ending functions of prosody were constructed. The results showed an atypical profile in both clinical populations. More interestingly, the profiles were atypical for different reasons, suggesting multiple and possibly different developmental pathways to the acquisition of prosody in these two populations.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
A fala apresenta aspectos paralinguísticos que não pertencem ao código linguístico convencional, mas contribuem significativamente para a unidade temática do discurso, Essas realizações se constituem em enunciados não-lexicalizados que funcionam que funcionam como atos de fala completos nas interações comunicativas interpessoais. Sobre essas emissões não-verbais, Campbell (2002a, 2002b, 2003 e 2004), Maekawa (2004), Fujie et. al (2004), Hoult (2004), Key (1958) apud Steimberg (1988) postulam que elas constribuem para a manifestação da fala expressiva. Para os autores, é justamente o fenômeno da paralinguagem que sinaliza informações sobre atitudes, opiniões e emoções do falante em relação ao interlocutor ou ao tópico discursivo. Nesse sentido, investigamos, neste trabalho, as manifestações paralinguísticas recorrentes em conversas informais para demonstrarmos seu papel expressivo na linguagem falada. Para tanto, fizemos um levantamento de 450 ocorrências de elementos paralinguísticos no processo de transcrição de amostras de falas do Português Regional Paraense produzidas em situações reais de conversação. Pressupondo que essas realizações não-verbais são caracterizadas por variações prosódicas, nós as submetemos a uma análise fonética por meio do software PRAAT. A partir dessa análise, constatamos a contribuição de duas propriedades: a frequência fundamental (F0) e o tempo de emissão, para a manifestação expressiva dos elementos paralinguísticos no discurso falado. Além disso, identificamos também a silabação como uma propriedade comum às realizações sonoras focalizadas. Após o processo de análise, fizemos a descrição do uso e do funcionamento desses elementos nas conversas, bem como da contribuição deles para a manifestação da fala expressiva. Os resultados nos mostram que os elementos paralinguísticos, além de contribuírem para a fluência do discurso falado, desempenham a função de sinalizar compreensão, interesse e/ou atenção, gerenciar relações interpessoais e expressar emoções, atitudes e afeto.
Resumo:
Prosody or speech melody subserves linguistic (e.g., question intonation) and emotional functions in speech communication. Findings from lesion studies and imaging experiments suggest that, depending on function or acoustic stimulus structure, prosodic speech components are differentially processed in the right and left hemispheres. This direct current (DC) potential study investigated the linguistic processing of digitally manipulated pitch contours of sentences that carried an emotional or neutral intonation. Discrimination of linguistic prosody was better for neutral stimuli as compared to happily as well as fearfully spoken sentences. Brain activation was increased during the processing of happy sentences as compared to neutral utterances. Neither neutral nor emotional stimuli evoked lateralized processing in the left or right hemisphere, indicating bilateral mechanisms of linguistic processing for pitch direction. Acoustic stimulus analysis suggested that prosodic components related to emotional intonation, such as pitch variability, interfered with linguistic processing of pitch course direction.
Resumo:
This paper examines the provision of interpretation services to immigrants with limited English proficiency in Federally Qualified Health Centers, through examination of barriers and best practices. The United States is a nation of immigrants; currently, more than 38 million, or 12.5 percent of the total population, is foreign-born. A substantial portion of this population does not have health insurance or speak English fluently: barriers that reduce the likelihood that they will access traditional health care organizations. This service void is filled by FQHCs, which are non-profit, community-directed providers that remove common barriers to care by serving communities who otherwise confront financial, geographic, language, and cultural barriers. By examining the importance and the implementation of medical interpretation services in FQHCs, suggestions for the future are presented.^
Resumo:
Multiple guidelines recommend debriefing of actual resuscitations to improve clinical performance. We implemented a novel standardized debriefing program using a Debriefing In Situ Conversation after Emergent Resuscitations Now (DISCERN) tool. Following the development of the evidence-based DISCERN tool, we conducted an observational study of all resuscitations (intubation, CPR, and/or defibrillation) at a pediatric emergency department (ED) over one year. Resuscitation interventions, patient survival, and physician team leader characteristics were analyzed as predictors for debriefing. Each debriefing's participants, time duration, and content were recorded. Thematic content of debriefings was categorized by framework approach into Team Emergency Assessment Measure (TEAM) elements. There were 241 resuscitations and 63 (26%) debriefings. A higher proportion of debriefings occurred after CPR (p<0.001) or ED death (p<0.001). Debriefing participants always included an attending and nurse; the median number of staff roles present was six. Median interval (from resuscitation end to start of debriefing) & debriefing durations were 33 (IQR 15,67) and 10 minutes (IQR 5,12), respectively. Common TEAM themes included co-operation/coordination (30%), communication (22%), and situational awareness (15%). Stated reasons for not debriefing included: unnecessary (78%), time constraints (19%), or other reasons (3%). Debriefings with the DISCERN tool usually involved higher acuity resuscitations, involved most of the indicated personnel, and lasted less than 10 minutes. This qualitative tool could be adapted to other settings. Future studies are needed to evaluate for potential impacts on education, quality improvement programming, and staff emotional well-being.^
Resumo:
In this paper, we describe a complete development platform that features different innovative acceleration strategies, not included in any other current platform, that simplify and speed up the definition of the different elements required to design a spoken dialog service. The proposed accelerations are mainly based on using the information from the backend database schema and contents, as well as cumulative information produced throughout the different steps in the design. Thanks to these accelerations, the interaction between the designer and the platform is improved, and in most cases the design is reduced to simple confirmations of the “proposals” that the platform dynamically provides at each step. In addition, the platform provides several other accelerations such as configurable templates that can be used to define the different tasks in the service or the dialogs to obtain or show information to the user, automatic proposals for the best way to request slot contents from the user (i.e. using mixed-initiative forms or directed forms), an assistant that offers the set of more probable actions required to complete the definition of the different tasks in the application, or another assistant for solving specific modality details such as confirmations of user answers or how to present them the lists of retrieved results after querying the backend database. Additionally, the platform also allows the creation of speech grammars and prompts, database access functions, and the possibility of using mixed initiative and over-answering dialogs. In the paper we also describe in detail each assistant in the platform, emphasizing the different kind of methodologies followed to facilitate the design process at each one. Finally, we describe the results obtained in both a subjective and an objective evaluation with different designers that confirm the viability, usefulness, and functionality of the proposed accelerations. Thanks to the accelerations, the design time is reduced in more than 56% and the number of keystrokes by 84%.
Resumo:
Detecting user affect automatically during real-time conversation is the main challenge towards our greater aim of infusing social intelligence into a natural-language mixed-initiative High-Fidelity (Hi-Fi) audio control spoken dialog agent. In recent years, studies on affect detection from voice have moved on to using realistic, non-acted data, which is subtler. However, it is more challenging to perceive subtler emotions and this is demonstrated in tasks such as labelling and machine prediction. This paper attempts to address part of this challenge by considering the role of user satisfaction ratings and also conversational/dialog features in discriminating contentment and frustration, two types of emotions that are known to be prevalent within spoken human-computer interaction. However, given the laboratory constraints, users might be positively biased when rating the system, indirectly making the reliability of the satisfaction data questionable. Machine learning experiments were conducted on two datasets, users and annotators, which were then compared in order to assess the reliability of these datasets. Our results indicated that standard classifiers were significantly more successful in discriminating the abovementioned emotions and their intensities (reflected by user satisfaction ratings) from annotator data than from user data. These results corroborated that: first, satisfaction data could be used directly as an alternative target variable to model affect, and that they could be predicted exclusively by dialog features. Second, these were only true when trying to predict the abovementioned emotions using annotator?s data, suggesting that user bias does exist in a laboratory-led evaluation.
Resumo:
This paper presents an automatic strategy to decide how to pronounce a Capital Letter Sequence (CLS) in a Text to Speech system (TTS). If CLS is well known by the TTS, it can be expanded in several words. But when the CLS is unknown, the system has two alternatives: spelling it (abbreviation) or pronouncing it as a new word (acronym). In Spanish, there is a high relationship between letters and phonemes. Because of this, when a CLS is similar to other words in Spanish, there is a high tendency to pronounce it as a standard word. This paper proposes an automatic method for detecting acronyms. Additionaly, this paper analyses the discrimination capability of some features, and several strategies for combining them in order to obtain the best classifier. For the best classifier, the classification error is 8.45%. About the feature analysis, the best features have been the Letter Sequence Perplexity and the Average N-gram order.
Resumo:
This paper describes the language identification (LID) system developed by the Patrol team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded communication channels. We show that techniques originally developed for LID on telephone speech (e.g., for the NIST language recognition evaluations) remain effective on the noisy RATS data, provided that careful consideration is applied when designing the training and development sets. In addition, we show significant improvements from the use of Wiener filtering, neural network based and language dependent i-vector modeling, and fusion.
Resumo:
This paper describes a novel approach to phonotactic LID, where instead of using soft-counts based on phoneme lattices, we use posteriogram to obtain n-gram counts. The high-dimensional vectors of counts are reduced to low-dimensional units for which we adapted the commonly used term i-vectors. The reduction is based on multinomial subspace modeling and is designed to work in the total-variability space. The proposed technique was tested on the NIST 2009 LRE set with better results to a system based on using soft-counts (Cavg on 30s: 3.15% vs 3.43%), and with very good results when fused with an acoustic i-vector LID system (Cavg on 30s acoustic 2.4% vs 1.25%). The proposed technique is also compared with another low dimensional projection system based on PCA. In comparison with the original soft-counts, the proposed technique provides better results, reduces the problems due to sparse counts, and avoids the process of using pruning techniques when creating the lattices.
Resumo:
Current text-to-speech systems are developed using studio-recorded speech in a neutral style or based on acted emotions. However, the proliferation of media sharing sites would allow developing a new generation of speech-based systems which could cope with spontaneous and styled speech. This paper proposes an architecture to deal with realistic recordings and carries out some experiments on unsupervised speaker diarization. In order to maximize the speaker purity of the clusters while keeping a high speaker coverage, the paper evaluates the F-measure of a diarization module, achieving high scores (>85%) especially when the clusters are longer than 30 seconds, even for the more spontaneous and expressive styles (such as talk shows or sports).
Resumo:
Several methods to improve multiple distant microphone (MDM) speaker diarization based on Time Delay of Arrival (TDOA) features are evaluated in this paper. All of them avoid the use of a single reference channel to calculate the TDOA values and, based on different criteria, select among all possible pairs of microphones a set of pairs that will be used to estimate the TDOA's. The evaluated methods have been named the "Dynamic Margin" (DM), the "Extreme Regions" (ER), the "Most Common" (MC), the "Cross Correlation" (XCorr) and the "Principle Component Analysis" (PCA). It is shown that all methods improve the baseline results for the development set and four of them improve also the results for the evaluation set. Improvements of 3.49% and 10.77% DER relative are obtained for DM and ER respectively for the test set. The XCorr and PCA methods achieve an improvement of 36.72% and 30.82% DER relative for the test set. Moreover, the computational cost for the XCorr method is 20% less than the baseline.