984 resultados para Decoding Speech Prosody


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech melody or prosody subserves linguistic, emotional, and pragmatic functions in speech communication. Prosodic perception is based on the decoding of acoustic cues with a predominant function of frequency-related information perceived as speaker's pitch. Evaluation of prosodic meaning is a cognitive function implemented in cortical and subcortical networks that generate continuously updated affective or linguistic speaker impressions. Various brain-imaging methods allow delineation of neural structures involved in prosody processing. In contrast to functional magnetic resonance imaging techniques, DC (direct current, slow) components of the EEG directly measure cortical activation without temporal delay. Activation patterns obtained with this method are highly task specific and intraindividually reproducible. Studies presented here investigated the topography of prosodic stimulus processing in dependence on acoustic stimulus structure and linguistic or affective task demands, respectively. Data obtained from measuring DC potentials demonstrated that the right hemisphere has a predominant role in processing emotions from the tone of voice, irrespective of emotional valence. However, right hemisphere involvement is modulated by diverse speech and language-related conditions that are associated with a left hemisphere participation in prosody processing. The degree of left hemisphere involvement depends on several factors such as (i) articulatory demands on the perceiver of prosody (possibly, also the poser), (ii) a relative left hemisphere specialization in processing temporal cues mediating prosodic meaning, and (iii) the propensity of prosody to act on the segment level in order to modulate word or sentence meaning. The specific role of top-down effects in terms of either linguistically or affectively oriented attention on lateralization of stimulus processing is not clear and requires further investigations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Parkinson's disease patients may have difficulty decoding prosodic emotion cues. These data suggest that the basal ganglia are involved, but may reflect dorsolateral prefrontal cortex dysfunction. An auditory emotional n-back task and cognitive n-back task were administered to 33 patients and 33 older adult controls, as were an auditory emotional Stroop task and cognitive Stroop task. No deficit was observed on the emotion decoding tasks; this did not alter with increased frontal lobe load. However, on the cognitive tasks, patients performed worse than older adult controls, suggesting that cognitive deficits may be more prominent. The impact of frontal lobe dysfunction on prosodic emotion cue decoding may only become apparent once frontal lobe pathology rises above a threshold.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

THE RIGORS OF ESTABLISHING INNATENESS and domain specificity pose challenges to adaptationist models of music evolution. In articulating a series of constraints, the authors of the target articles provide strategies for investigating the potential origins of music. We propose additional approaches for exploring theories based on exaptation. We discuss a view of music as a multimodal system of engaging with affect, enabled by capacities of symbolism and a theory of mind.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work examines prosody modelling for the Standard Yorùbá (SY) language in the context of computer text-to-speech synthesis applications. The thesis of this research is that it is possible to develop a practical prosody model by using appropriate computational tools and techniques which combines acoustic data with an encoding of the phonological and phonetic knowledge provided by experts. Our prosody model is conceptualised around a modular holistic framework. The framework is implemented using the Relational Tree (R-Tree) techniques (Ehrich and Foith, 1976). R-Tree is a sophisticated data structure that provides a multi-dimensional description of a waveform. A Skeletal Tree (S-Tree) is first generated using algorithms based on the tone phonological rules of SY. Subsequent steps update the S-Tree by computing the numerical values of the prosody dimensions. To implement the intonation dimension, fuzzy control rules where developed based on data from native speakers of Yorùbá. The Classification And Regression Tree (CART) and the Fuzzy Decision Tree (FDT) techniques were tested in modelling the duration dimension. The FDT was selected based on its better performance. An important feature of our R-Tree framework is its flexibility in that it facilitates the independent implementation of the different dimensions of prosody, i.e. duration and intonation, using different techniques and their subsequent integration. Our approach provides us with a flexible and extendible model that can also be used to implement, study and explain the theory behind aspects of the phenomena observed in speech prosody.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a novel prosody model in the context of computer text-to-speech synthesis applications for tone languages. We have demonstrated its applicability using the Standard Yorùbá (SY) language. Our approach is motivated by the theory that abstract and realised forms of various prosody dimensions should be modelled within a modular and unified framework [Coleman, J.S., 1994. Polysyllabic words in the YorkTalk synthesis system. In: Keating, P.A. (Ed.), Phonological Structure and Forms: Papers in Laboratory Phonology III, Cambridge University Press, Cambridge, pp. 293–324]. We have implemented this framework using the Relational Tree (R-Tree) technique. R-Tree is a sophisticated data structure for representing a multi-dimensional waveform in the form of a tree. The underlying assumption of this research is that it is possible to develop a practical prosody model by using appropriate computational tools and techniques which combine acoustic data with an encoding of the phonological and phonetic knowledge provided by experts. To implement the intonation dimension, fuzzy logic based rules were developed using speech data from native speakers of Yorùbá. The Fuzzy Decision Tree (FDT) and the Classification and Regression Tree (CART) techniques were tested in modelling the duration dimension. For practical reasons, we have selected the FDT for implementing the duration dimension of our prosody model. To establish the effectiveness of our prosody model, we have also developed a Stem-ML prosody model for SY. We have performed both quantitative and qualitative evaluations on our implemented prosody models. The results suggest that, although the R-Tree model does not predict the numerical speech prosody data as accurately as the Stem-ML model, it produces synthetic speech prosody with better intelligibility and naturalness. The R-Tree model is particularly suitable for speech prosody modelling for languages with limited language resources and expertise, e.g. African languages. Furthermore, the R-Tree model is easy to implement, interpret and analyse.

Relevância:

90.00% 90.00%

Publicador:

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Decoding emotional prosody is crucial for successful social interactions, and continuous monitoring of emotional intent via prosody requires working memory. It has been proposed by Ross and others that emotional prosody cognitions in the right hemisphere are organized in an analogous fashion to propositional language functions in the left hemisphere. This study aimed to test the applicability of this model in the context of prefrontal cortex working memory functions. BOLD response data were therefore collected during performance of two emotional working memory tasks by participants undergoing fMRI. In the prosody task, participants identified the emotion conveyed in pre-recorded sentences, and working memory load was manipulated in the style of an N-back task. In the matched lexico-semantic task, participants identified the emotion conveyed by sentence content. Block-design neuroimaging data were analyzed parametrically with SPM5. At first, working memory for emotional prosody appeared to be right-lateralized in the PFC, however, further analyses revealed that it shared much bilateral prefrontal functional neuroanatomy with working memory for lexico-semantic emotion. Supplementary separate analyses of males and females suggested that these language functions were less bilateral in females, but their inclusion did not alter the direction of laterality. It is concluded that Ross et al.'s model is not applicable to prefrontal cortex working memory functions, that evidence that working memory cannot be subdivided in prefrontal cortex according to material type is increased, and that incidental working memory demands may explain the frontal lobe involvement in emotional prosody comprehension as revealed by neuroimaging studies. (c) 2007 Elsevier Inc. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We frequently encounter conflicting emotion cues. This study examined how the neural response to emotional prosody differed in the presence of congruent and incongruent lexico-semantic cues. Two hypotheses were assessed: (i) decoding emotional prosody with conflicting lexico-semantic cues would activate brain regions associated with cognitive conflict (anterior cingulate and dorsolateral prefrontal cortex) or (ii) the increased attentional load of incongruent cues would modulate the activity of regions that decode emotional prosody (right lateral temporal cortex). While the participants indicated the emotion conveyed by prosody, functional magnetic resonance imaging data were acquired on a 3T scanner using blood oxygenation level-dependent contrast. Using SPM5, the response to congruent cues was contrasted with that to emotional prosody alone, as was the response to incongruent lexico-semantic cues (for the 'cognitive conflict' hypothesis). The right lateral temporal lobe region of interest analyses examined modulation of activity in this brain region between these two contrasts (for the 'prosody cortex' hypothesis). Dorsolateral prefrontal and anterior cingulate cortex activity was not observed, and neither was attentional modulation of activity in right lateral temporal cortex activity. However, decoding emotional prosody with incongruent lexico-semantic cues was strongly associated with left inferior frontal gyrus activity. This specialist form of conflict is therefore not processed by the brain using the same neural resources as non-affective cognitive conflict and neither can it be handled by associated sensory cortex alone. The recruitment of inferior frontal cortex may indicate increased semantic processing demands but other contributory functions of this region should be explored.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Annotation Pro - a description of techniques, methods implemented in the tool, as well as the list of all built in functionalities and features of the user interface, and usage tips.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The common focus of the studies brought together in this work is the prosodic segmentation of spontaneous speech. The theoretically most central aspect is the introduction and further development of the IJ-model of intonational chunking. The study consists of a general introduction and five detailed studies that approach prosodic chunking from different perspectives. The data consist of recordings of face-to-face interaction in several spoken varieties of Finnish and Finland Swedish; the methodology is usage-based and qualitative. The term “speech prosody” refers primarily to the melodic and rhythmic characteristics of speech. Both speaking and understanding speech require the ability to segment the flow of speech into suitably sized prosodic chunks. In order to be usage-based, a study of spontaneous speech consequently needs to be based on material that is segmented into prosodic chunks of various sizes. The segmentation is seen to form a hierarchy of chunking. The prosodic models that have so far been developed and employed in Finland have been based on sentences read aloud, which has made it difficult to apply these models in the analysis of spontaneous speech. The prosodic segmentation of spontaneous speech has not previously been studied in detail in Finland. This research focuses mainly on the following three questions: (1) What are the factors that need to be considered when developing a model of prosodic segmentation of speech, so that the model can be employed regardless of the language or dialect under analysis? (2) What are the characteristics of a prosodic chunk, and what are the similarities in the ways chunks of different languages and varieties manifest themselves that will make it possible to analyze different data according to the same criteria? (3) How does the IJ-model of intonational chunking introduced as a solution to question (1) function in practice in the study of different varieties of Finnish and Finland Swedish? The boundaries of the prosodic chunks were manually marked in the material according to context-specific acoustic and auditory criteria. On the basis of the data analyzed, the IJ-model was further elaborated and implemented, thus allowing comparisons between different language varieties. On the basis of the empirical comparisons, a prosodic typology is presented for the dialects of Swedish in Finland. The general contention is that the principles of the IJ-model can readily be used as a methodological tool for prosodic analysis irrespective of language varieties.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis presents an experimental study of the speech prosody of identical and non-identical twins. Speech fluency, pauses, speech rate, utterance length and speech frequency were examined phonetically, auditorily, semantically and statistically. The methods included both reading tasks (reading the alphabet, numerical lists, sentences with foreign loan words, holiday theme questions as well as 1.5 pages of text with long sentences and complex words) and spontaneous speech tasks (picture description and answering holiday theme questions). The subjects were Finnish-speaking 22-28-year-old female twins: 8 identical (monozygotic) and 10 non-identical (dizygotic) pairs. One pair was male-female. Comparisons were made between twin groups and between sisters. The data was regathered from four twin pairs, to make it possible to investigate some subjects intra-individually. In addition phoneticians, phonetic students and people without knowledge of phonetic science were tested in two listening experiments. The results showed that the dizygotic twins differed more from each other than monozygotic twins and that monozygotic twin sisters shared more similarities than dizygotic twin sisters. For example, between monozygotic twin sisters smaller differences were found between word count, utterance length and speech rate in spontaneous speech tasks. Dizygotic twin sisters made more different kinds of reading mistakes with the same target words than monozygotic twin sisters, while monozygotic twin sisters made more of the same reading mistakes with the same target words than dizygotic twin sisters. The listening experiments showed that only professional phoneticians were able to recognize the twin sisters. Even though the twins had the possibility to freely choose their speech rate, pausing and speech frequency, they used their own speech patterns; these included the same average speech frequency, average speech rate, type of pausing routine or filled pauses, and other speech mannerisms throughout their speech.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A production experiment investigated the tonal shape of Finnish finite verbs in transitive sentences without narrow focus. Traditional descriptions of Finnish stating that non- focused finite verbs do not receive accents were only partly supported. Verbs were found to have a consistently smaller pitch range than words in other word classes, but their pitch contours were neither flat nor explainable by pure interpolation.