32 resultados para audiovisual speech perception

em Helda - Digital Repository of University of Helsinki


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech has both auditory and visual components (heard speech sounds and seen articulatory gestures). During all perception, selective attention facilitates efficient information processing and enables concentration on high-priority stimuli. Auditory and visual sensory systems interact at multiple processing levels during speech perception and, further, the classical motor speech regions seem also to participate in speech perception. Auditory, visual, and motor-articulatory processes may thus work in parallel during speech perception, their use possibly depending on the information available and the individual characteristics of the observer. Because of their subtle speech perception difficulties possibly stemming from disturbances at elemental levels of sensory processing, dyslexic readers may rely more on motor-articulatory speech perception strategies than do fluent readers. This thesis aimed to investigate the neural mechanisms of speech perception and selective attention in fluent and dyslexic readers. We conducted four functional magnetic resonance imaging experiments, during which subjects perceived articulatory gestures, speech sounds, and other auditory and visual stimuli. Gradient echo-planar images depicting blood oxygenation level-dependent contrast were acquired during stimulus presentation to indirectly measure brain hemodynamic activation. Lip-reading activated the primary auditory cortex, and selective attention to visual speech gestures enhanced activity within the left secondary auditory cortex. Attention to non-speech sounds enhanced auditory cortex activity bilaterally; this effect showed modulation by sound presentation rate. A comparison between fluent and dyslexic readers' brain hemodynamic activity during audiovisual speech perception revealed stronger activation of predominantly motor speech areas in dyslexic readers during a contrast test that allowed exploration of the processing of phonetic features extracted from auditory and visual speech. The results show that visual speech perception modulates hemodynamic activity within auditory cortex areas once considered unimodal, and suggest that the left secondary auditory cortex specifically participates in extracting the linguistic content of seen articulatory gestures. They are strong evidence for the importance of attention as a modulator of auditory cortex function during both sound processing and visual speech perception, and point out the nature of attention as an interactive process (influenced by stimulus-driven effects). Further, they suggest heightened reliance on motor-articulatory and visual speech perception strategies among dyslexic readers, possibly compensating for their auditory speech perception difficulties.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Asperger Syndrome (AS) belongs to autism spectrum disorders where both verbal and non-verbal communication difficulties are at the core of the impairment. Social communication requires a complex use of affective, linguistic-cognitive and perceptual processes. In the four studies included in the current thesis, some of the linguistic and perceptual factors that are important for face-to-face communication were studied using behavioural methods. In all four studies the results obtained from individuals with AS were compared with typically developed age, gender and IQ matched controls. First, the language skills of school-aged children were characterized in detail with standardized tests that measured different aspects of receptive and expressive language (Study I). The children with AS were found to be worse than the controls in following complex verbal instructions. Next, the visual perception of facial expressions of emotion with varying degrees of visual detail was examined (Study II). Adults with AS were found to have impaired recognition of facial expressions on the basis of very low spatial frequencies which are important for processing global information. Following that, multisensory perception was investigated by looking at audiovisual speech perception (Studies III and IV). Adults with AS were found to perceive audiovisual speech qualitatively differently from typically developed adults, although both groups were equally accurate in recognizing auditory and visual speech presented alone. Finally, the effect of attention on audiovisual speech perception was studied by registering eye gaze behaviour (Study III) and by studying the voluntary control of visual attention (Study IV). The groups did not differ in eye gaze behaviour or in the voluntary control of visual attention. The results of the study series demonstrate that many factors underpinning face-to-face social communication are atypical in AS. In contrast with previous assumptions about intact language abilities, the current results show that children with AS have difficulties in understanding complex verbal instructions. Furthermore, the study makes clear that deviations in the perception of global features in faces expressing emotions as well as in the multisensory perception of speech are likely to harm face-to-face social communication.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The overlapping sound pressure waves that enter our brain via the ears and auditory nerves must be organized into a coherent percept. Modelling the regularities of the auditory environment and detecting unexpected changes in these regularities, even in the absence of attention, is a necessary prerequisite for orientating towards significant information as well as speech perception and communication, for instance. The processing of auditory information, in particular the detection of changes in the regularities of the auditory input, gives rise to neural activity in the brain that is seen as a mismatch negativity (MMN) response of the event-related potential (ERP) recorded by electroencephalography (EEG). --- As the recording of MMN requires neither a subject s behavioural response nor attention towards the sounds, it can be done even with subjects with problems in communicating or difficulties in performing a discrimination task, for example, from aphasic and comatose patients, newborns, and even fetuses. Thus with MMN one can follow the evolution of central auditory processing from the very early, often critical stages of development, and also in subjects who cannot be examined with the more traditional behavioural measures of auditory discrimination. Indeed, recent studies show that central auditory processing, as indicated by MMN, is affected in different clinical populations, such as schizophrenics, as well as during normal aging and abnormal childhood development. Moreover, the processing of auditory information can be selectively impaired for certain auditory attributes (e.g., sound duration, frequency) and can also depend on the context of the sound changes (e.g., speech or non-speech). Although its advantages over behavioral measures are undeniable, a major obstacle to the larger-scale routine use of the MMN method, especially in clinical settings, is the relatively long duration of its measurement. Typically, approximately 15 minutes of recording time is needed for measuring the MMN for a single auditory attribute. Recording a complete central auditory processing profile consisting of several auditory attributes would thus require from one hour to several hours. In this research, I have contributed to the development of new fast multi-attribute MMN recording paradigms in which several types and magnitudes of sound changes are presented in both speech and non-speech contexts in order to obtain a comprehensive profile of auditory sensory memory and discrimination accuracy in a short measurement time (altogether approximately 15 min for 5 auditory attributes). The speed of the paradigms makes them highly attractive for clinical research, their reliability brings fidelity to longitudinal studies, and the language context is especially suitable for studies on language impairments such as dyslexia and aphasia. In addition I have presented an even more ecological paradigm, and more importantly, an interesting result in view of the theory of MMN where the MMN responses are recorded entirely without a repetitive standard tone. All in all, these paradigms contribute to the development of the theory of auditory perception, and increase the feasibility of MMN recordings in both basic and clinical research. Moreover, they have already proven useful in studying for instance dyslexia, Asperger syndrome and schizophrenia.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the field of second language (L2) acquisition, the term `foreign accent´ is often used to refer to speech characteristics that differ from the pronunciation of native speakers. Foreign accent may affect the intelligibility and perceived comprehensibility of speech and it is also sometimes associated with negative attitudes. The degree of L2 learners foreign accent and the speech characteristics that account for it have previously been studied through speech perception experiments and acoustic measurements. Perception experiments have shown that native listeners are easily able to identify foreign accent in speech. However to date, no studies have been done on the assessment of foreign accent in the speech of non-native speakers of Finnish. The aim of this study is to examine how native speakers of Finnish rate the degree of foreign accentedness in the speech of Russian L2 learners of Finnish. Furthermore, phonetic analysis is used to study the characteristics of speech that affect the perceived strength of foreign accent. Altogether 96 native speakers of Finnish listened to excerpts of read-aloud and spontaneous Finnish speech from ten Russian and six Finnish female speakers. The Russian speakers were intermediate and advanced learners of Finnish and had all immigrated to Finland as adults. Among the listeners, was a group of teachers of Finnish as an L2, and it was presumed that these teachers had been exposed to foreign accent in Finnish and were used to hearing it. The temporal aspects and segmental properties of speech were phonetically analysed in the speech of the Russian speakers in order to measure their effect on the perceived degree of accent. Although wide differences were observed in the use of the rating scale among the listeners, they were still quite unanimous on which speakers had the strongest foreign accent and which had the mildest. The listeners background factors had little effect on their ratings, and the ratings of the teachers of Finnish as an L2 did not differ from those of the other listeners. However, a clear difference was noted in the ratings of the two types of stimuli used in the perception experiment: the read-aloud speech was rated as more strongly accented than the spontaneous speech. It is important to note that the assessment of foreign accent is affected by many factors and their complex interactions in the experimental setting. Futher the study found that, both the temporal aspects of speech, often associated with fluency, and the number of single deviant phonetic segments contributed to the perceived degree of accentedness in the speech of the native Russian speakers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes a new flexible delexicalization method based on glottal excited parametric speech synthesis scheme. The system utilizes inverse filtered glottal flow and all-pole modelling of the vocal tract. The method provides a possibil- ity to retain and manipulate all relevant prosodic features of any kind of speech. Most importantly, the features include voice quality, which has not been properly modeled in earlier delex- icalization methods. The functionality of the new method was tested in a prosodic tagging experiment aimed at providing word prominence data for a text-to-speech synthesis system. The ex- periment confirmed the usefulness of the method and further corroborated earlier evidence that linguistic factors influence the perception of prosodic prominence.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Speech rhythm is an essential part of speech processing. It is the outcome of the workings of a combination of linguistic and non-linguistic parameters, many of which also have other functions in speech. This study focusses on the acoustic and auditive realization of two linguistic parameters of rhythm: (1) sentence stress, and (2) speech rate and pausing. The aim was to find out how well Finnish comprehensive school pupils realize these two parameters in English and how native speakers of English react to Finnish pupils English rhythm. The material was elicited by means of a story-telling task and questionnaires. Three female and three male pupils representing different levels of oral skills in English were selected as the experimental group. The control group consisted of two female and two male native speakers of English. The stories were analysed acoustically and auditorily with respect to interstress intervals, weak forms, fundamental frequency, pausing, and speech as well as articulation rate. In addition, 52 native speakers of English were asked to rate the intelligibility of the Finnish pupils English with respect to speech rhythm and give their attitudes on what the pupils sounded like. Results showed that Finnish pupils can produce isochronous interstress intervals in English, but that too large a proportion of these intervals contain pauses. A closer analysis of the pauses revealed that Finnish pupils pause too frequently and in inappropriate places when they speak English. Frequent pausing was also found to cause slow speech rates. The findings of the fundamental frequency (F0) measurements indicate that Finnish pupils tend to make a slightly narrower F0 difference between stressed and unstressed syllables than the native speakers of English. Furthermore, Finnish pupils appear to know how to reduce the duration and quality of unstressed sounds, but they fail to do it frequently enough. Native listeners gave lower intelligibility and attitude scores to pupils with more anomalous speech rhythm. Finnish pupils rhythm anomalies seemed to derive from various learning- or learner-related factors rather than from the differences between English and Finnish. This study demonstrates that pausing may be a more important component of English speech rhythm than sentence stress as far as Finnish adolescents are concerned and that interlanguage development is affected by various factors and characterised by jumps or periods of stasis. Other theoretical, methodological and pedagogical implications of the results are also discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study is an inquiry into three related topics in Aristotle’s psychology: the perception of seeing, the perception of past perception, and the perception of sleeping. Over the past decades, Aristotle’s account of the perception of perception has been studied in numerous articles and chapters of books. However, there is no monograph that attempts to give a comprehensive analysis of this account and to assess its relation and significance to Aristotle’s psychological theory in general as well as to other theories pertaining to the topics (e.g. theories of consciousness), be they ancient, medieval, modern, or contemporary. This study intends to fill this gap and to further the research into Aristotle’s philosophy and into the philosophy of mind. The present study is based on an accurate analysis of the sources, on their Platonic background, and on later interpretations within the commentary tradition up to the present. From a methodological point of view, this study represents systematically orientated research into the history of philosophy, in which special attention is paid to the philosophical problems inherent in the sources, to the distinctions drawn, and to the arguments put forward as well as to their philosophical assessment. In addition to contributing many new findings concerning the topics under discussion, this study shows that Aristotle’s account of the perception of perception substantially differs from many later theories of consciousness. This study also suggests that Aristotle be regarded as a consistent direct realist, not only in respect of sense perception, but also in respect of memory.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This dissertation consists of four articles and an introduction. The five parts address the same topic, nonverbal predication in Erzya, from different perspectives. The work is at the same time linguistic typology and Uralic studies. The findings based on a large corpus of empirical Erzya data, which was collected using several different methods and included recordings of the spoken language, made it possible for the present study to apply, then test and finally discuss the previous theories based on cross-linguistic data. Erzya makes use of multiple predication patterns which vary from totally analytic to the morphologically very complex. Nonverbal predicate clause types are classified on the basis of propositional acts in clauses denoting class-membership, identity, property and location. The predicates of these clauses are nouns, adjectives and locational expressions, respectively. The following three predication strategies in Erzya nonverbal predication can be identified: i. the zero-copula construction, ii. the predicative suffix construction and iii. the copula construction. It has been suggested that verbs and nouns cannot be clearly distinguished on morphological grounds when functioning as predicates in Erzya. This study shows that even though predicativity must not be considered a sufficient tool for defining parts of speech in any language, the Erzya lexical classes of adjective, noun and verb can be distinguished from each other also in predicate position. The relative frequency and degree of obligation for using the predicative suffix construction decreases when moving left to right on the scale verb adjective/locative noun ( identificational statement). The predicative suffix is the main pattern in the present tense over the whole domain of nonverbal predication in Standard Erzya, but if it is replaced it is most likely to be with a zero-copula construction in a nominal predication. This study exploits the theory of (a)symmetry for the first time in order to describe verbal vs. nonverbal predication. It is shown that the asymmetry of paradigms and constructions differentiates the lexical classes. Asymmetrical structures are motivated by functional level asymmetry. Variation in predication as such adds to the complexity of the grammar. When symmetric structures are employed, the functional complexity of grammar decreases, even though morphological complexity increases. The genre affects the employment of predication strategies in Erzya. There are differences in the relative frequency of the patterns, and some patterns are totally lacking from some of the data. The clearest difference is that the past tense predicative suffix construction occurs relatively frequently in Standard Erzya, while it occurs infrequently in the other data. Also, the predicative suffixes of the present tense are used more regularly in written Standard Erzya than in any other genre. The genre also affects the incidence of the translative in uľ(ń)ems copula constructions. In translations from Russian to Erzya the translative case is employed relatively frequently in comparison to other data. This study reveals differences between the two Mordvinic languages Erzya and Moksha. The predicative suffixes (bound person markers) of the present tense are used more regularly in Moksha in all kinds of nonverbal predicate clauses compared to Erzya. It should further be observed that identificational statements are encoded with a predicative suffix in Moksha, but seldom in Erzya. Erzya clauses are more frequently encoded using zero-constructions, displaying agreement in number only.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Comprehension of a complex acoustic signal - speech - is vital for human communication, with numerous brain processes required to convert the acoustics into an intelligible message. In four studies in the present thesis, cortical correlates for different stages of speech processing in a mature linguistic system of adults were investigated. In two further studies, developmental aspects of cortical specialisation and its plasticity in adults were examined. In the present studies, electroencephalographic (EEG) and magnetoencephalographic (MEG) recordings of the mismatch negativity (MMN) response elicited by changes in repetitive unattended auditory events and the phonological mismatch negativity (PMN) response elicited by unexpected speech sounds in attended speech inputs served as the main indicators of cortical processes. Changes in speech sounds elicited the MMNm, the magnetic equivalent of the electric MMN, that differed in generator loci and strength from those elicited by comparable changes in non-speech sounds, suggesting intra- and interhemispheric specialisation in the processing of speech and non-speech sounds at an early automatic processing level. This neuronal specialisation for the mother tongue was also reflected in the more efficient formation of stimulus representations in auditory sensory memory for typical native-language speech sounds compared with those formed for unfamiliar, non-prototype speech sounds and simple tones. Further, adding a speech or non-speech sound context to syllable changes was found to modulate the MMNm strength differently in the left and right hemispheres. Following the acoustic-phonetic processing of speech input, phonological effort related to the selection of possible lexical (word) candidates was linked with distinct left-hemisphere neuronal populations. In summary, the results suggest functional specialisation in the neuronal substrates underlying different levels of speech processing. Subsequently, plasticity of the brain's mature linguistic system was investigated in adults, in whom representations for an aurally-mediated communication system, Morse code, were found to develop within the same hemisphere where representations for the native-language speech sounds were already located. Finally, recording and localization of the MMNm response to changes in speech sounds was successfully accomplished in newborn infants, encouraging future MEG investigations on, for example, the state of neuronal specialisation at birth.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Autism and Asperger syndrome (AS) are neurodevelopmental disorders characterised by deficient social and communication skills, as well as restricted, repetitive patterns of behaviour. The language development in individuals with autism is significantly delayed and deficient, whereas in individuals with AS, the structural aspects of language develop quite normally. Both groups, however, have semantic-pragmatic language deficits. The present thesis investigated auditory processing in individuals with autism and AS. In particular, the discrimination of and orienting to speech and non-speech sounds was studied, as well as the abstraction of invariant sound features from speech-sound input. Altogether five studies were conducted with auditory event-related brain potentials (ERP); two studies also included a behavioural sound-identification task. In three studies, the subjects were children with autism, in one study children with AS, and in one study adults with AS. In children with autism, even the early stages of sound encoding were deficient. In addition, these children had altered sound-discrimination processes characterised by enhanced spectral but deficient temporal discrimination. The enhanced pitch discrimination may partly explain the auditory hypersensitivity common in autism, and it may compromise the filtering of relevant auditory information from irrelevant information. Indeed, it was found that when sound discrimination required abstracting invariant features from varying input, children with autism maintained their superiority in pitch processing, but lost it in vowel processing. Finally, involuntary orienting to sound changes was deficient in children with autism in particular with respect to speech sounds. This finding is in agreement with previous studies on autism suggesting deficits in orienting to socially relevant stimuli. In contrast to children with autism, the early stages of sound encoding were fairly unimpaired in children with AS. However, sound discrimination and orienting were rather similarly altered in these children as in those with autism, suggesting correspondences in the auditory phenotype in these two disorders which belong to the same continuum. Unlike children with AS, adults with AS showed enhanced processing of duration changes, suggesting developmental changes in auditory processing in this disorder.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Humans are a social species with the internal capability to process social information from other humans. To understand others behavior and to react accordingly, it is necessary to infer their internal states, emotions and aims, which are conveyed by subtle nonverbal bodily cues such as postures, gestures, and facial expressions. This thesis investigates the brain functions underlying the processing of such social information. Studies I and II of this thesis explore the neural basis of perceiving pain from another person s facial expressions by means of functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG). In Study I, observing another s facial expression of pain activated the affective pain system (previously associated with self-experienced pain) in accordance with the intensity of the observed expression. The strength of the response in anterior insula was also linked to the observer s empathic abilities. The cortical processing of facial pain expressions advanced from the visual to temporal-lobe areas at similar latencies (around 300 500 ms) to those previously shown for emotional expressions such as fear or disgust. Study III shows that perceiving a yawning face is associated with middle and posterior STS activity, and the contagiousness of a yawn correlates negatively with amygdalar activity. Study IV explored the brain correlates of interpreting social interaction between two members of the same species, in this case human and canine. Observing interaction engaged brain activity in very similar manner for both species. Moreover, the body and object sensitive brain areas of dog experts differentiated interaction from noninteraction in both humans and dogs whereas in the control subjects, similar differentiation occurred only for humans. Finally, Study V shows the engagement of the brain area associated with biological motion when exposed to the sounds produced by a single human being walking. However, more complex pattern of activation, with the walking sounds of several persons, suggests that as the social situation becomes more complex so does the brain response. Taken together, these studies demonstrate the roles of distinct cortical and subcortical brain regions in the perception and sharing of others internal states via facial and bodily gestures, and the connection of brain responses to behavioral attributes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The synchronization of neuronal activity, especially in the beta- (14-30 Hz) /gamma- (30 80 Hz) frequency bands, is thought to provide a means for the integration of anatomically distributed processing and for the formation of transient neuronal assemblies. Thus non-stimulus locked (i.e. induced) gamma-band oscillations are believed to underlie feature binding and the formation of neuronal object representations. On the other hand, the functional roles of neuronal oscillations in slower theta- (4 8 Hz) and alpha- (8 14 Hz) frequency bands remain controversial. In addition, early stimulus-locked activity has been largely ignored, as it is believed to reflect merely the physical properties of sensory stimuli. With human neuromagnetic recordings, both the functional roles of gamma- and alpha-band oscillations and the significance of early stimulus-locked activity in neuronal processing were examined in this thesis. Study I of this thesis shows that even the stimulus-locked (evoked) gamma oscillations were sensitive to high-level stimulus features for speech and non-speech sounds, suggesting that they may underlie the formation of early neuronal object representations for stimuli with a behavioural relevance. Study II shows that neuronal processing for consciously perceived and unperceived stimuli differed as early as 30 ms after stimulus onset. This study also showed that the alpha band oscillations selectively correlated with conscious perception. Study III, in turn, shows that prestimulus alpha-band oscillations influence the subsequent detection and processing of sensory stimuli. Further, in Study IV, we asked whether phase synchronization between distinct frequency bands is present in cortical circuits. This study revealed prominent task-sensitive phase synchrony between alpha and beta/gamma oscillations. Finally, the implications of Studies II, III, and IV to the broader scientific context are analysed in the last study of this thesis (V). I suggest, in this thesis that neuronal processing may be extremely fast and that the evoked response is important for cognitive processes. I also propose that alpha oscillations define the global neuronal workspace of perception, action, and consciousness and, further, that cross-frequency synchronization is required for the integration of neuronal object representations into global neuronal workspace.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Selective attention refers to the process in which certain information is actively selected for conscious processing, while other information is ignored. The aim of the present studies was to investigate the human brain mechanisms of auditory and audiovisual selective attention with functional magnetic resonance imaging (fMRI), electroencephalography (EEG) and magnetoencephalography (MEG). The main focus was on attention-related processing in the auditory cortex. It was found that selective attention to sounds strongly enhances auditory cortex activity associated with processing the sounds. In addition, the amplitude of this attention-related modulation was shown to increase with the presentation rate of attended sounds. Attention to the pitch of sounds and to their location appeared to enhance activity in overlapping auditory-cortex regions. However, attention to location produced stronger activity than attention to pitch in the temporo-parietal junction and frontal cortical regions. In addition, a study on bimodal attentional selection found stronger audiovisual than auditory or visual attention-related modulations in the auditory cortex. These results were discussed in light of Näätänen s attentional-trace theory and other research concerning the brain mechanisms of selective attention.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The human visual system has adapted to function in different lighting environments and responds to contrast instead of the amount of light as such. On the one hand, this ensures constancy of perception, for example, white paper looks white both in bright sunlight and in dim moonlight, because contrast is invariant to changes in overall light level. On the other hand, the brightness of the surfaces has to be reconstructed from the contrast signal because no signal from surfaces as such is conveyed to the visual cortex. In the visual cortex, the visual image is decomposed to local features by spatial filters that are selective for spatial frequency, orientation, and phase. Currently it is not known, however, how these features are subsequently integrated to form objects and object surfaces. In this thesis the integration mechanisms of achromatic surfaces were studied by psychophysically measuring the spatial frequency and orientation tuning of brightness perception. In addition, the effect of textures on the spread of brightness and the effect of phase of the inducing stimulus on brightness were measured. The novel findings of the thesis are that (1) a narrow spatial frequency band, independent of stimulus size and complexity, mediates brightness information (2) figure-ground brightness illusions are narrowly tuned for orientation (3) texture borders, without any luminance difference, are able to block the spread of brightness, and (4) edges and even- and odd-symmetric Gabors have a similar antagonistic effect on brightness. The narrow spatial frequency tuning suggests that only a subpopulation of neurons in V1 is involved in brightness perception. The independence of stimulus size and complexity indicates that the narrow tuning reflects hard-wired processing in the visual system. Further, it seems that figure-ground segregation and mechanisms integrating contrast polarities are closely related to the low level mechanisms of brightness perception. In conclusion, the results of the thesis suggest that a subpopulation of neurons in visual cortex selectively integrates information from different contrast polarities to reconstruct surface brightness.