9 resultados para Visual Speaker Recognition, Visual Speech Recognition, Cascading Appearance-Based Features
em Helda - Digital Repository of University of Helsinki
Resumo:
Asperger Syndrome (AS) belongs to autism spectrum disorders where both verbal and non-verbal communication difficulties are at the core of the impairment. Social communication requires a complex use of affective, linguistic-cognitive and perceptual processes. In the four studies included in the current thesis, some of the linguistic and perceptual factors that are important for face-to-face communication were studied using behavioural methods. In all four studies the results obtained from individuals with AS were compared with typically developed age, gender and IQ matched controls. First, the language skills of school-aged children were characterized in detail with standardized tests that measured different aspects of receptive and expressive language (Study I). The children with AS were found to be worse than the controls in following complex verbal instructions. Next, the visual perception of facial expressions of emotion with varying degrees of visual detail was examined (Study II). Adults with AS were found to have impaired recognition of facial expressions on the basis of very low spatial frequencies which are important for processing global information. Following that, multisensory perception was investigated by looking at audiovisual speech perception (Studies III and IV). Adults with AS were found to perceive audiovisual speech qualitatively differently from typically developed adults, although both groups were equally accurate in recognizing auditory and visual speech presented alone. Finally, the effect of attention on audiovisual speech perception was studied by registering eye gaze behaviour (Study III) and by studying the voluntary control of visual attention (Study IV). The groups did not differ in eye gaze behaviour or in the voluntary control of visual attention. The results of the study series demonstrate that many factors underpinning face-to-face social communication are atypical in AS. In contrast with previous assumptions about intact language abilities, the current results show that children with AS have difficulties in understanding complex verbal instructions. Furthermore, the study makes clear that deviations in the perception of global features in faces expressing emotions as well as in the multisensory perception of speech are likely to harm face-to-face social communication.
Resumo:
The present thesis discusses relevant issues in education: 1) learning disabilities including the role of comorbidity in LDs, and 2) the use of research-based interventions. This thesis consists of a series of four studies (three articles), which deepens the knowledge of the field of special education. Intervention studies (N=242) aimed to examine whether training using a nonverbal auditory-visual matching computer program had a remedial effect in different learning disabilities, such as developmental dyslexia, Attention Deficit Disorder (ADD) and Specific Language Impairment (SLI). These studies were conducted in both Finland and Sweden. The intervention’s non-verbal character made an international perspective possible. The results of the intervention studies confirmed, that the auditory-visual matching computer program, called Audilex had positive intervention effects. In Study I of children with developmental dyslexia there were also improvements in reading skills, specifically in reading nonsense words and reading speed. These improvements in tasks, which are thought to rely on phonological processing, suggest that such reading difficulties in dyslexia may stem in part from more basic perceptual difficulties, including those required to manage the visual and auditory components of the decoding task. In Study II the intervention had a positive effect on children with dyslexia; older students with dyslexia and surprisingly, students with ADD also benefited from this intervention. In conclusion, the role of comorbidity was apparent. An intervention effect was evident also in students’ school behavior. Study III showed that children with SLI experience difficulties very similar to those of children with dyslexia in auditory-visual matching. Children with language-based learning disabilities, such as dyslexia and SLI benefited from the auditory-visual matching intervention. Also comorbidity was evident among these children; in addition to formal diagnoses, comorbidity was explored with an assessment inventory, which was developed for this thesis. Interestingly, an overview of the data of this thesis shows positive intervention effects in all studies despite learning disability, language, gender or age. These findings have been described by a concept inter-modal transpose. Self-evidently these issues need further studies. In learning disabilities the aim in the future will also be to identify individuals at risk rather than by deficit; this aim can be achieved by using research-based interventions, intensified support in general education and inclusive special education. Keywords: learning disabilities, developmental dyslexia, attention deficit disorder, specific language impairment, language-based learning disabilities, comorbidity, auditory-visual matching, research-based interventions, inter-modal transpose
Resumo:
Speech has both auditory and visual components (heard speech sounds and seen articulatory gestures). During all perception, selective attention facilitates efficient information processing and enables concentration on high-priority stimuli. Auditory and visual sensory systems interact at multiple processing levels during speech perception and, further, the classical motor speech regions seem also to participate in speech perception. Auditory, visual, and motor-articulatory processes may thus work in parallel during speech perception, their use possibly depending on the information available and the individual characteristics of the observer. Because of their subtle speech perception difficulties possibly stemming from disturbances at elemental levels of sensory processing, dyslexic readers may rely more on motor-articulatory speech perception strategies than do fluent readers. This thesis aimed to investigate the neural mechanisms of speech perception and selective attention in fluent and dyslexic readers. We conducted four functional magnetic resonance imaging experiments, during which subjects perceived articulatory gestures, speech sounds, and other auditory and visual stimuli. Gradient echo-planar images depicting blood oxygenation level-dependent contrast were acquired during stimulus presentation to indirectly measure brain hemodynamic activation. Lip-reading activated the primary auditory cortex, and selective attention to visual speech gestures enhanced activity within the left secondary auditory cortex. Attention to non-speech sounds enhanced auditory cortex activity bilaterally; this effect showed modulation by sound presentation rate. A comparison between fluent and dyslexic readers' brain hemodynamic activity during audiovisual speech perception revealed stronger activation of predominantly motor speech areas in dyslexic readers during a contrast test that allowed exploration of the processing of phonetic features extracted from auditory and visual speech. The results show that visual speech perception modulates hemodynamic activity within auditory cortex areas once considered unimodal, and suggest that the left secondary auditory cortex specifically participates in extracting the linguistic content of seen articulatory gestures. They are strong evidence for the importance of attention as a modulator of auditory cortex function during both sound processing and visual speech perception, and point out the nature of attention as an interactive process (influenced by stimulus-driven effects). Further, they suggest heightened reliance on motor-articulatory and visual speech perception strategies among dyslexic readers, possibly compensating for their auditory speech perception difficulties.
Resumo:
Although immensely complex, speech is also a very efficient means of communication between humans. Understanding how we acquire the skills necessary for perceiving and producing speech remains an intriguing goal for research. However, while learning is likely to begin as soon as we start hearing speech, the tools for studying the language acquisition strategies in the earliest stages of development remain scarce. One prospective strategy is statistical learning. In order to investigate its role in language development, we designed a new research method. The method was tested in adults using magnetoencephalography (MEG) as a measure of cortical activity. Neonatal brain activity was measured with electroencephalography (EEG). Additionally, we developed a method for assessing the integration of seen and heard syllables in the developing brain as well as a method for assessing the role of visual speech when learning phoneme categories. The MEG study showed that adults learn statistical properties of speech during passive listening of syllables. The amplitude of the N400m component of the event-related magnetic fields (ERFs) reflected the location of syllables within pseudowords. The amplitude was also enhanced for syllables in a statistically unexpected position. The results suggest a role for the N400m component in statistical learning studies in adults. Using the same research design with sleeping newborn infants, the auditory event-related potentials (ERPs) measured with EEG reflected the location of syllables within pseudowords. The results were successfully replicated in another group of infants. The results show that even newborn infants have a powerful mechanism for automatic extraction of statistical characteristics from speech. We also found that 5-month-old infants integrate some auditory and visual syllables into a fused percept, whereas other syllable combinations are not fully integrated. Auditory syllables were paired with visual syllables possessing a different phonetic identity, and the ERPs for these artificial syllable combinations were compared with the ERPs for normal syllables. For congruent auditory-visual syllable combinations, the ERPs did not differ from those for normal syllables. However, for incongruent auditory-visual syllable combinations, we observed a mismatch response in the ERPs. The results show an early ability to perceive speech cross-modally. Finally, we exposed two groups of 6-month-old infants to artificially created auditory syllables located between two stereotypical English syllables in the formant space. The auditory syllables followed, equally for both groups, a unimodal statistical distribution, suggestive of a single phoneme category. The visual syllables combined with the auditory syllables, however, were different for the two groups, one group receiving visual stimuli suggestive of two separate phoneme categories, the other receiving visual stimuli suggestive of only one phoneme category. After a short exposure, we observed different learning outcomes for the two groups of infants. The results thus show that visual speech can influence learning of phoneme categories. Altogether, the results demonstrate that complex language learning skills exist from birth. They also suggest a role for the visual component of speech in the learning of phoneme categories.
Resumo:
The human visual system has adapted to function in different lighting environments and responds to contrast instead of the amount of light as such. On the one hand, this ensures constancy of perception, for example, white paper looks white both in bright sunlight and in dim moonlight, because contrast is invariant to changes in overall light level. On the other hand, the brightness of the surfaces has to be reconstructed from the contrast signal because no signal from surfaces as such is conveyed to the visual cortex. In the visual cortex, the visual image is decomposed to local features by spatial filters that are selective for spatial frequency, orientation, and phase. Currently it is not known, however, how these features are subsequently integrated to form objects and object surfaces. In this thesis the integration mechanisms of achromatic surfaces were studied by psychophysically measuring the spatial frequency and orientation tuning of brightness perception. In addition, the effect of textures on the spread of brightness and the effect of phase of the inducing stimulus on brightness were measured. The novel findings of the thesis are that (1) a narrow spatial frequency band, independent of stimulus size and complexity, mediates brightness information (2) figure-ground brightness illusions are narrowly tuned for orientation (3) texture borders, without any luminance difference, are able to block the spread of brightness, and (4) edges and even- and odd-symmetric Gabors have a similar antagonistic effect on brightness. The narrow spatial frequency tuning suggests that only a subpopulation of neurons in V1 is involved in brightness perception. The independence of stimulus size and complexity indicates that the narrow tuning reflects hard-wired processing in the visual system. Further, it seems that figure-ground segregation and mechanisms integrating contrast polarities are closely related to the low level mechanisms of brightness perception. In conclusion, the results of the thesis suggest that a subpopulation of neurons in visual cortex selectively integrates information from different contrast polarities to reconstruct surface brightness.
Resumo:
Oral cancer ranks among the 10 most common cancers worldwide. Since it is commonly diagnosed at locally advanced stage, curing the cancer demands extensive tissue resection. The emergent defect is reconstructed generally with a free flap transfer. Repair of the upper aerodigestive track with maintenance of its multiform activities is challenging. The aim of the study was to extract comprehensive treatment outcomes for patients having undergone microvascular free flap transfer because of large oral cavity or pharyngeal cancer. Ninety-four patients were analyzed for postoperative survival and complications. Forty-four patients were followed-up and analyzed for functional outcome, which was determined in terms of quality of life, speech, swallowing, and intraoral sensation. Quality of life was assessed using the University of Washington Head and Neck Questionnaire. Speech was analyzed for aerodynamic parameters and for nasal acoustic energy, as well as perceptually for articulatory proficiency, voice quality, and intelligibility. Videofluorography was performed to determine the swallowing ability. Intraoral sensation was measured by moving 2-point discrimination. The 3-year overall survival was over 40%. The 1-year disease-free survival was 43%. Postoperative complications arose in over half of the patients. Flap success rate was high. Perioperative mortality varied between 2% and 11%. Unemployment and heavy drinking were the strongest predictors of survival. Sociodemographic factors were found to associate with quality of life. The global quality of life score deteriorated and did not return to the preoperative level. Significant reduction was detectable in the domains measuring chewing and speech, and in appearance and shoulder function. The basic elements necessary for normal speech were maintained. Speech intelligibility reduced and was related to the misarticulations of the /r/ and /s/ phonemes. Deviant /r/ and /s/ persisted in most patients. Hoarseness and hypernasality occurred infrequently. One year postoperatively, 98% of the patients had achieved oral nutrition and half of them were on a regular masticated diet. Overt and silent aspiration was encountered throughout the follow-up. At 12-month swallow test, 44% of the patients aspirated, 70% of whom silently. Of these patients, 15% presented with pulmonary changes referring to aspiration. Intraoral sensation weakened but was unrelated to oral functions. The results provide new data for oral reconstructions and highlight the importance of the functional outcome of the treatment for an oral cancer patient. The mouth and the pharynx encompass a unit of utmost functional complexity. Surgery should continue to make progress in this area, and methods that lead to good function should be developed. Operational outcome should always be evaluated in terms of function.
Resumo:
Semantic processing can be studied with semantic priming. Target words that are preceded by semantically related prime words are recognized faster and more accurately than targets preceded by unrelated prime words. Semantic priming also affects the magnitude of the N400 event-related potential. The response is smaller to a target word when it is preceded by a related than an unrelated prime word. The effect is called the N400 effect. It is not yet clear, however, how attention modulates semantic priming and the N400 effect. This study investigated how the direction of attention affects the semantic processing of speech. The N400 effect was studied in experimental conditions in which the subjects attention was directed 1) away from the speech stimuli, 2) to phonological features of the speech stimuli, and 3) to semantic features of the speech stimuli. The first aim of the study was to investigate whether the N400 effect for spoken words is dependent on attention to the auditory information. The second aim was to study the differences in the N400 effect when attention is directed to the semantic or other features of speech stimuli. The results showed an N400 effect even when attention was directed away from the speech stimuli. The N400 effect was, however, stronger in conditions during which the speech stimuli were attended. The magnitude of the behavioral semantic priming and the N400 effect did not differ between the conditions during which attention was directed to the semantic or phonological features of the words. The findings indicate that the semantic processing of spoken words is not dependent on attention to auditory information. Furthermore, the results suggest that whether or not semantic processing is relevant for the task performance does not affect the semantic processing of attended spoken words.
Resumo:
The neural basis of visual perception can be understood only when the sequence of cortical activity underlying successful recognition is known. The early steps in this processing chain, from retina to the primary visual cortex, are highly local, and the perception of more complex shapes requires integration of the local information. In Study I of this thesis, the progression from local to global visual analysis was assessed by recording cortical magnetoencephalographic (MEG) responses to arrays of elements that either did or did not form global contours. The results demonstrated two spatially and temporally distinct stages of processing: The first, emerging 70 ms after stimulus onset around the calcarine sulcus, was sensitive to local features only, whereas the second, starting at 130 ms across the occipital and posterior parietal cortices, reflected the global configuration. To explore the links between cortical activity and visual recognition, Studies II III presented subjects with recognition tasks of varying levels of difficulty. The occipito-temporal responses from 150 ms onwards were closely linked to recognition performance, in contrast to the 100-ms mid-occipital responses. The averaged responses increased gradually as a function of recognition performance, and further analysis (Study III) showed the single response strengths to be graded as well. Study IV addressed the attention dependence of the different processing stages: Occipito-temporal responses peaking around 150 ms depended on the content of the visual field (faces vs. houses), whereas the later and more sustained activity was strongly modulated by the observers attention. Hemodynamic responses paralleled the pattern of the more sustained electrophysiological responses. Study V assessed the temporal processing capacity of the human object recognition system. Above sufficient luminance, contrast and size of the object, the processing speed was not limited by such low-level factors. Taken together, these studies demonstrate several distinct stages in the cortical activation sequence underlying the object recognition chain, reflecting the level of feature integration, difficulty of recognition, and direction of attention.
Resumo:
The paradigm of computational vision hypothesizes that any visual function -- such as the recognition of your grandparent -- can be replicated by computational processing of the visual input. What are these computations that the brain performs? What should or could they be? Working on the latter question, this dissertation takes the statistical approach, where the suitable computations are attempted to be learned from the natural visual data itself. In particular, we empirically study the computational processing that emerges from the statistical properties of the visual world and the constraints and objectives specified for the learning process. This thesis consists of an introduction and 7 peer-reviewed publications, where the purpose of the introduction is to illustrate the area of study to a reader who is not familiar with computational vision research. In the scope of the introduction, we will briefly overview the primary challenges to visual processing, as well as recall some of the current opinions on visual processing in the early visual systems of animals. Next, we describe the methodology we have used in our research, and discuss the presented results. We have included some additional remarks, speculations and conclusions to this discussion that were not featured in the original publications. We present the following results in the publications of this thesis. First, we empirically demonstrate that luminance and contrast are strongly dependent in natural images, contradicting previous theories suggesting that luminance and contrast were processed separately in natural systems due to their independence in the visual data. Second, we show that simple cell -like receptive fields of the primary visual cortex can be learned in the nonlinear contrast domain by maximization of independence. Further, we provide first-time reports of the emergence of conjunctive (corner-detecting) and subtractive (opponent orientation) processing due to nonlinear projection pursuit with simple objective functions related to sparseness and response energy optimization. Then, we show that attempting to extract independent components of nonlinear histogram statistics of a biologically plausible representation leads to projection directions that appear to differentiate between visual contexts. Such processing might be applicable for priming, \ie the selection and tuning of later visual processing. We continue by showing that a different kind of thresholded low-frequency priming can be learned and used to make object detection faster with little loss in accuracy. Finally, we show that in a computational object detection setting, nonlinearly gain-controlled visual features of medium complexity can be acquired sequentially as images are encountered and discarded. We present two online algorithms to perform this feature selection, and propose the idea that for artificial systems, some processing mechanisms could be selectable from the environment without optimizing the mechanisms themselves. In summary, this thesis explores learning visual processing on several levels. The learning can be understood as interplay of input data, model structures, learning objectives, and estimation algorithms. The presented work adds to the growing body of evidence showing that statistical methods can be used to acquire intuitively meaningful visual processing mechanisms. The work also presents some predictions and ideas regarding biological visual processing.