947 resultados para Auditory-visual teaching
Resumo:
The detection of voice activity is a challenging problem, especially when the level of acoustic noise is high. Most current approaches only utilise the audio signal, making them susceptible to acoustic noise. An obvious approach to overcome this is to use the visual modality. The current state-of-the-art visual feature extraction technique is one that uses a cascade of visual features (i.e. 2D-DCT, feature mean normalisation, interstep LDA). In this paper, we investigate the effectiveness of this technique for the task of visual voice activity detection (VAD), and analyse each stage of the cascade and quantify the relative improvement in performance gained by each successive stage. The experiments were conducted on the CUAVE database and our results highlight that the dynamics of the visual modality can be used to good effect to improve visual voice activity detection performance.
Resumo:
The performance of visual speech recognition (VSR) systems are significantly influenced by the accuracy of the visual front-end. The current state-of-the-art VSR systems use off-the-shelf face detectors such as Viola- Jones (VJ) which has limited reliability for changes in illumination and head poses. For a VSR system to perform well under these conditions, an accurate visual front end is required. This is an important problem to be solved in many practical implementations of audio visual speech recognition systems, for example in automotive environments for an efficient human-vehicle computer interface. In this paper, we re-examine the current state-of-the-art VSR by comparing off-the-shelf face detectors with the recently developed Fourier Lucas-Kanade (FLK) image alignment technique. A variety of image alignment and visual speech recognition experiments are performed on a clean dataset as well as with a challenging automotive audio-visual speech dataset. Our results indicate that the FLK image alignment technique can significantly outperform off-the shelf face detectors, but requires frequent fine-tuning.
Resumo:
Purpose - To consider a more visual approach to property law teaching practices. This will be achieved by exploring the existence of ‘visual learners’ as a student body, evaluating the use of more visual teaching techniques in academic practice, recognising the historic dominance of text in legal education, and examining the potential for heightening visual teaching practices in the teaching of property law. Design/methodology/approach – The paper reviews and analyses some of the available literature on visual pedagogy, and visual approaches to legal education, but also introduces an amount of academic practitioner analysis. Findings – This paper evidences that, rather than focusing on the categorisation of ‘visual learner’, the modern academic practitioner should employ the customary use of more visual stimuli; consequently becoming a more ‘visual teacher’. This paper demonstrates that these practices, if performed effectively, can impact upon the information literacy of the whole student body: It also proffers a number of suggestions as to how this could be achieved within property law teaching practices. Practical implications – The paper will provide support for early-career academic practitioners, who are entering a teaching profession in a period of accelerated and continual change, by presenting an overview of pedagogic practices in the area. It will also provide a stimulus for those currently teaching on property law modules and support their transition to a more visual form of teaching practice. Originality/value – This paper provides a comprehensive overview of visual pedagogy in legal education, and specifically within that of property law, which has not been conducted elsewhere.
Resumo:
The present study evaluated the use of stimulus equivalence in teaching monetary skills to school aged children with autism. An AB within-subject design with periodic probes was used. At pretest, three participants demonstrated relation DA, an auditory-visual relation (matching dictated coin values to printed coin prices). Using a three-choice match-to-sample procedure, with a multi-component intervention package, these participants were taught two trained relations, BA (matching coins to printed prices) and CA (matching coin combinations to printed prices). Two participants achieved positive tests of equivalence, and the third participant demonstrated emergent performances with a symmetric and transitive relation. In addition, two participants were able to show generalization of learned skills with a parent, in a second naturalistic setting. The present research replicates and extends the results of previous studies by demonstrating that stimulus equivalence can be used to teach an adaptive skill to children with autism.
Resumo:
Synesthesia entails a special kind of sensory perception, where stimulation in one sensory modality leads to an internally generated perceptual experience of another, not stimulated sensory modality. This phenomenon can be viewed as an abnormal multisensory integration process as here the synesthetic percept is aberrantly fused with the stimulated modality. Indeed, recent synesthesia research has focused on multimodal processing even outside of the specific synesthesia-inducing context and has revealed changed multimodal integration, thus suggesting perceptual alterations at a global level. Here, we focused on audio-visual processing in synesthesia using a semantic classification task in combination with visually or auditory-visually presented animated and in animated objects in an audio-visual congruent and incongruent manner. Fourteen subjects with auditory-visual and/or grapheme-color synesthesia and 14 control subjects participated in the experiment. During presentation of the stimuli, event-related potentials were recorded from 32 electrodes. The analysis of reaction times and error rates revealed no group differences with best performance for audio-visually congruent stimulation indicating the well-known multimodal facilitation effect. We found enhanced amplitude of the N1 component over occipital electrode sites for synesthetes compared to controls. The differences occurred irrespective of the experimental condition and therefore suggest a global influence on early sensory processing in synesthetes.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Pós-graduação em Psicologia do Desenvolvimento e Aprendizagem - FC
Resumo:
Integrating information from multiple sources is a crucial function of the brain. Examples of such integration include multiple stimuli of different modalties, such as visual and auditory, multiple stimuli of the same modality, such as auditory and auditory, and integrating stimuli from the sensory organs (i.e. ears) with stimuli delivered from brain-machine interfaces.
The overall aim of this body of work is to empirically examine stimulus integration in these three domains to inform our broader understanding of how and when the brain combines information from multiple sources.
First, I examine visually-guided auditory, a problem with implications for the general problem in learning of how the brain determines what lesson to learn (and what lessons not to learn). For example, sound localization is a behavior that is partially learned with the aid of vision. This process requires correctly matching a visual location to that of a sound. This is an intrinsically circular problem when sound location is itself uncertain and the visual scene is rife with possible visual matches. Here, we develop a simple paradigm using visual guidance of sound localization to gain insight into how the brain confronts this type of circularity. We tested two competing hypotheses. 1: The brain guides sound location learning based on the synchrony or simultaneity of auditory-visual stimuli, potentially involving a Hebbian associative mechanism. 2: The brain uses a ‘guess and check’ heuristic in which visual feedback that is obtained after an eye movement to a sound alters future performance, perhaps by recruiting the brain’s reward-related circuitry. We assessed the effects of exposure to visual stimuli spatially mismatched from sounds on performance of an interleaved auditory-only saccade task. We found that when humans and monkeys were provided the visual stimulus asynchronously with the sound but as feedback to an auditory-guided saccade, they shifted their subsequent auditory-only performance toward the direction of the visual cue by 1.3-1.7 degrees, or 22-28% of the original 6 degree visual-auditory mismatch. In contrast when the visual stimulus was presented synchronously with the sound but extinguished too quickly to provide this feedback, there was little change in subsequent auditory-only performance. Our results suggest that the outcome of our own actions is vital to localizing sounds correctly. Contrary to previous expectations, visual calibration of auditory space does not appear to require visual-auditory associations based on synchrony/simultaneity.
My next line of research examines how electrical stimulation of the inferior colliculus influences perception of sounds in a nonhuman primate. The central nucleus of the inferior colliculus is the major ascending relay of auditory information before it reaches the forebrain, and thus an ideal target for understanding low-level information processing prior to the forebrain, as almost all auditory signals pass through the central nucleus of the inferior colliculus before reaching the forebrain. Thus, the inferior colliculus is the ideal structure to examine to understand the format of the inputs into the forebrain and, by extension, the processing of auditory scenes that occurs in the brainstem. Therefore, the inferior colliculus was an attractive target for understanding stimulus integration in the ascending auditory pathway.
Moreover, understanding the relationship between the auditory selectivity of neurons and their contribution to perception is critical to the design of effective auditory brain prosthetics. These prosthetics seek to mimic natural activity patterns to achieve desired perceptual outcomes. We measured the contribution of inferior colliculus (IC) sites to perception using combined recording and electrical stimulation. Monkeys performed a frequency-based discrimination task, reporting whether a probe sound was higher or lower in frequency than a reference sound. Stimulation pulses were paired with the probe sound on 50% of trials (0.5-80 µA, 100-300 Hz, n=172 IC locations in 3 rhesus monkeys). Electrical stimulation tended to bias the animals’ judgments in a fashion that was coarsely but significantly correlated with the best frequency of the stimulation site in comparison to the reference frequency employed in the task. Although there was considerable variability in the effects of stimulation (including impairments in performance and shifts in performance away from the direction predicted based on the site’s response properties), the results indicate that stimulation of the IC can evoke percepts correlated with the frequency tuning properties of the IC. Consistent with the implications of recent human studies, the main avenue for improvement for the auditory midbrain implant suggested by our findings is to increase the number and spatial extent of electrodes, to increase the size of the region that can be electrically activated and provide a greater range of evoked percepts.
My next line of research employs a frequency-tagging approach to examine the extent to which multiple sound sources are combined (or segregated) in the nonhuman primate inferior colliculus. In the single-sound case, most inferior colliculus neurons respond and entrain to sounds in a very broad region of space, and many are entirely spatially insensitive, so it is unknown how the neurons will respond to a situation with more than one sound. I use multiple AM stimuli of different frequencies, which the inferior colliculus represents using a spike timing code. This allows me to measure spike timing in the inferior colliculus to determine which sound source is responsible for neural activity in an auditory scene containing multiple sounds. Using this approach, I find that the same neurons that are tuned to broad regions of space in the single sound condition become dramatically more selective in the dual sound condition, preferentially entraining spikes to stimuli from a smaller region of space. I will examine the possibility that there may be a conceptual linkage between this finding and the finding of receptive field shifts in the visual system.
In chapter 5, I will comment on these findings more generally, compare them to existing theoretical models, and discuss what these results tell us about processing in the central nervous system in a multi-stimulus situation. My results suggest that the brain is flexible in its processing and can adapt its integration schema to fit the available cues and the demands of the task.
Resumo:
Purpose: The classic study of Sumby and Pollack (1954, JASA, 26(2), 212-215) demonstrated that visual information aided speech intelligibility under noisy auditory conditions. Their work showed that visual information is especially useful under low signal-to-noise conditions where the auditory signal leaves greater margins for improvement. We investigated whether simulated cataracts interfered with the ability of participants to use visual cues to help disambiguate the auditory signal in the presence of auditory noise. Methods: Participants in the study were screened to ensure normal visual acuity (mean of 20/20) and normal hearing (auditory threshold ≤ 20 dB HL). Speech intelligibility was tested under an auditory only condition and two visual conditions: normal vision and simulated cataracts. The light scattering effects of cataracts were imitated using cataract-simulating filters. Participants wore blacked-out glasses in the auditory only condition and lens-free frames in the normal auditory-visual condition. Individual sentences were spoken by a live speaker in the presence of prerecorded four-person background babble set to a speech-to-noise ratio (SNR) of -16 dB. The SNR was determined in a preliminary experiment to support 50% correct identification of sentence under the auditory only conditions. The speaker was trained to match the rate, intensity and inflections of a prerecorded audio track of everyday speech sentences. The speaker was blind to the visual conditions of the participant to control for bias.Participants’ speech intelligibility was measured by comparing the accuracy of their written account of what they believed the speaker to have said to the actual spoken sentence. Results: Relative to the normal vision condition, speech intelligibility was significantly poorer when participants wore simulated catarcts. Conclusions: The results suggest that cataracts may interfere with the acquisition of visual cues to speech perception.
Resumo:
Limited research is available on how well visual cues integrate with auditory cues to improve speech intelligibility in persons with visual impairments, such as cataracts. We investigated whether simulated cataracts interfered with participants’ ability to use visual cues to help disambiguate a spoken message in the presence of spoken background noise. We tested 21 young adults with normal visual acuity and hearing sensitivity. Speech intelligibility was tested under three conditions: auditory only with no visual input, auditory-visual with normal viewing, and auditory-visual with simulated cataracts. Central Institute for the Deaf (CID) Everyday Speech Sentences were spoken by a live talker, mimicking a pre-recorded audio track, in the presence of pre-recorded four-person background babble at a signal-to-noise ratio (SNR) of -13 dB. The talker was masked to the experimental conditions to control for experimenter bias. Relative to the normal vision condition, speech intelligibility was significantly poorer, [t (20) = 4.17, p < .01, Cohen’s d =1.0], in the simulated cataract condition. These results suggest that cataracts can interfere with speech perception, which may occur through a reduction in visual cues, less effective integration or a combination of the two effects. These novel findings contribute to our understanding of the association between two common sensory problems in adults: reduced contrast sensitivity associated with cataracts and reduced face-to-face communication in noise.
Resumo:
This research explores the foci, methods and processes of mental training by pianists who are active as performers and teachers. The research is based on the concept of mental training as a solely mental mode of practising. Musician s mental training takes place without an instrument or the physical act of playing. The research seeks answers to questions: 1) What are the foci of a pianist s mental training? 2) How does a pianist carry out the mental training? 3) What does mental training in music entail as a process? The research approach is qualitative, and the materials were gathered from thematic interviews. The aim of practising is always an improved result both in the act of playing and the performance. Mental training by a pianist is collaboration between technical, auditory, visual, kinaesthetic and affective factors. Also interpretation, memory and overcoming stage fright are needed. Technical, cognitive and performance skills are involved. According to the results of this research, mental training is a goal-oriented activity which can have an impact on all of these factors. Without a musical inner ear and its functionality, true musicianship cannot exist. One particular result of this research is the conceptualisation of opening up the inner ear. Auditory exercises and internally playing mental images are essential elements of the mental practice of a musician. Visual images, such as a picture of music notation or a performance event, are the point of focus for musicians who find visual images to be the easiest to realise. When developing technical skills by using mental training, it is important to focus on the technically most difficult sections. It is also necessary to focus on the holistic experiencing of the performance situation. By building on positive energies and strengths, the so-called psyching up may be the most important element in mental training. Based on the results of this research, a synthesis is outlined of the music event as an activity process, built on representations and schemes. Mental training aims at the most ideal possible act of playing and the creation of a musical event; these are achieved by focussing on various mental images produced by the different senses, together with concrete practising. Mental training in sports and in music share common factors. Both modes of practising, mental as well as physical, involve three important elements: planning, realisation and evaluation of the practice. In music, however, the goal is an artistic end result which does not often apply to an athletic event. Keywords: Mental training in music, auditory imagining, visualisation, kinaesthetic-mental experience, mastery of the psyche
Resumo:
Speech has both auditory and visual components (heard speech sounds and seen articulatory gestures). During all perception, selective attention facilitates efficient information processing and enables concentration on high-priority stimuli. Auditory and visual sensory systems interact at multiple processing levels during speech perception and, further, the classical motor speech regions seem also to participate in speech perception. Auditory, visual, and motor-articulatory processes may thus work in parallel during speech perception, their use possibly depending on the information available and the individual characteristics of the observer. Because of their subtle speech perception difficulties possibly stemming from disturbances at elemental levels of sensory processing, dyslexic readers may rely more on motor-articulatory speech perception strategies than do fluent readers. This thesis aimed to investigate the neural mechanisms of speech perception and selective attention in fluent and dyslexic readers. We conducted four functional magnetic resonance imaging experiments, during which subjects perceived articulatory gestures, speech sounds, and other auditory and visual stimuli. Gradient echo-planar images depicting blood oxygenation level-dependent contrast were acquired during stimulus presentation to indirectly measure brain hemodynamic activation. Lip-reading activated the primary auditory cortex, and selective attention to visual speech gestures enhanced activity within the left secondary auditory cortex. Attention to non-speech sounds enhanced auditory cortex activity bilaterally; this effect showed modulation by sound presentation rate. A comparison between fluent and dyslexic readers' brain hemodynamic activity during audiovisual speech perception revealed stronger activation of predominantly motor speech areas in dyslexic readers during a contrast test that allowed exploration of the processing of phonetic features extracted from auditory and visual speech. The results show that visual speech perception modulates hemodynamic activity within auditory cortex areas once considered unimodal, and suggest that the left secondary auditory cortex specifically participates in extracting the linguistic content of seen articulatory gestures. They are strong evidence for the importance of attention as a modulator of auditory cortex function during both sound processing and visual speech perception, and point out the nature of attention as an interactive process (influenced by stimulus-driven effects). Further, they suggest heightened reliance on motor-articulatory and visual speech perception strategies among dyslexic readers, possibly compensating for their auditory speech perception difficulties.
Resumo:
Although immensely complex, speech is also a very efficient means of communication between humans. Understanding how we acquire the skills necessary for perceiving and producing speech remains an intriguing goal for research. However, while learning is likely to begin as soon as we start hearing speech, the tools for studying the language acquisition strategies in the earliest stages of development remain scarce. One prospective strategy is statistical learning. In order to investigate its role in language development, we designed a new research method. The method was tested in adults using magnetoencephalography (MEG) as a measure of cortical activity. Neonatal brain activity was measured with electroencephalography (EEG). Additionally, we developed a method for assessing the integration of seen and heard syllables in the developing brain as well as a method for assessing the role of visual speech when learning phoneme categories. The MEG study showed that adults learn statistical properties of speech during passive listening of syllables. The amplitude of the N400m component of the event-related magnetic fields (ERFs) reflected the location of syllables within pseudowords. The amplitude was also enhanced for syllables in a statistically unexpected position. The results suggest a role for the N400m component in statistical learning studies in adults. Using the same research design with sleeping newborn infants, the auditory event-related potentials (ERPs) measured with EEG reflected the location of syllables within pseudowords. The results were successfully replicated in another group of infants. The results show that even newborn infants have a powerful mechanism for automatic extraction of statistical characteristics from speech. We also found that 5-month-old infants integrate some auditory and visual syllables into a fused percept, whereas other syllable combinations are not fully integrated. Auditory syllables were paired with visual syllables possessing a different phonetic identity, and the ERPs for these artificial syllable combinations were compared with the ERPs for normal syllables. For congruent auditory-visual syllable combinations, the ERPs did not differ from those for normal syllables. However, for incongruent auditory-visual syllable combinations, we observed a mismatch response in the ERPs. The results show an early ability to perceive speech cross-modally. Finally, we exposed two groups of 6-month-old infants to artificially created auditory syllables located between two stereotypical English syllables in the formant space. The auditory syllables followed, equally for both groups, a unimodal statistical distribution, suggestive of a single phoneme category. The visual syllables combined with the auditory syllables, however, were different for the two groups, one group receiving visual stimuli suggestive of two separate phoneme categories, the other receiving visual stimuli suggestive of only one phoneme category. After a short exposure, we observed different learning outcomes for the two groups of infants. The results thus show that visual speech can influence learning of phoneme categories. Altogether, the results demonstrate that complex language learning skills exist from birth. They also suggest a role for the visual component of speech in the learning of phoneme categories.
Resumo:
O presente trabalho visa a contribuir com o avanço das pesquisas na área de multimodalidade, mais especificamente na área aplicada ao contexto de ensino de língua estrangeira. Analisa-se uma amostra de textos multimodais em um livro didático produzido e utilizado no Brasil como ferramenta para o ensino de inglês como língua estrangeira para alunos adultos iniciantes em um curso livre. Tendo em vista a preocupação, apontada no próprio material didático, em atender às necessidades e expectativas desses alunos, objetiva-se, através desta investigação: verificar como se dá a interação entre o verbal e o visual no livro didático selecionado; verificar como essa interação contribui para atingir os objetivos pedagógicos propostos pelo material; e, por fim, contribuir, de alguma maneira, para o letramento multimodal de alunos em língua estrangeira. Tais objetivos determinam a natureza híbrida desta pesquisa que, além da sua dimensão analítico-descritiva, apresenta também uma dimensão pedagógica, que visa a apresentar propostas de trabalho multimodal com algumas das atividades selecionadas para análise. A seleção dos textos multimodais para a composição do corpus desta pesquisa foi baseada na observação da recorrência de imagens com determinados personagens ao longo do livro. Tal recorrência provocou questionamentos que só poderiam ser respondidos a partir da análise desses personagens representados em situações de (inter)ação, o que deu lugar à seleção das representações narrativas que os incluíssem. Os personagens em questão são desenhos criados para os fins pedagógicos do material e são representados em situações sociais muito limitadas: a maior parte dessas representações parece formar uma sequência narrativa cuja interação acontece em uma festa; entre as outras representações, que não representam a referida festa como contexto, incluem-se interações no escritório, no restaurante, no parque e ao telefone. Uma análise da representação visual desses atores sociais revelou que, apesar da inclusão de uma negra entre os personagens, e a consequente suposta visão multicultural transmitida com essa inclusão, os participantes representam um grupo homogêneo, pertencentes ao mesmo segmento social, que só interagem entre eles mesmos em situações sociais limitadas, não sendo, portanto, representativos da diversidade étnica, social e cultural do Brasil, ou dos países em que o inglês é falado. Após a análise da representação dos atores sociais, analisam-se, com vistas a atingir os objetivos deste trabalho, os padrões de representação e de interação nos textos multimodais selecionados, segundo categorias do quadro da multimodalidade de van Leeuwen (1996). Verifica-se, a partir de tais análises, que o verbal e o visual nem sempre apresentam uma relação direta, e que, quando apresentam, tal relação não é explorada pelo material, tornando o visual um elemento meramente decorativo que, na maioria das vezes, em nada contribui para o desenvolvimento das unidades. Por essa razão, e por se tratar de uma pesquisa centrada no contexto pedagógico, propõem-se, ao final das análises, atividades de exploração de alguns dos textos multimodais analisados, visando à formação multimodal do aluno em língua estrangeira
Resumo:
This paper gives an introduction to "Interculture TV", an educational videocast project initiated by the Department of "Intercultural Studies and Business Communications" at the Friedrich Schiller University, Jena. The project provides open access to audio-visual teaching/learning materials produced by intercultural student work groups and offers opportunities for cooperation. Starting from a definition of the term "educast", the article analyses the videocast episodes on Interculture TV and discusses their potential for inter-cultural instruction and learning.