960 resultados para audiovisual speech perception
Resumo:
A survey was conducted to investigate secondary school support teachers' perceptions of speech-language pathology services to students experiencing language difficulties. Information was sought regarding support teachers' understanding of language disorder, their experience with students who have language difficulties and their involvement with speech-language pathologists with regard to these students. Support teachers' views on supporting adolescents who are experiencing language difficulties were also sought as well as information regarding their satisfaction with speech-language pathology services to adolescents. Findings indicated variations in support teachers' perceptions, including mixed views regarding how speech-language pathologists should offer assistance to students. The need for support teachers and speech-language pathologists to offer each other professional training was indicated.
Resumo:
Here we use two filtered speech tasks to investigate children’s processing of slow (<4 Hz) versus faster (∼33 Hz) temporal modulations in speech. We compare groups of children with either developmental dyslexia (Experiment 1) or speech and language impairments (SLIs, Experiment 2) to groups of typically-developing (TD) children age-matched to each disorder group. Ten nursery rhymes were filtered so that their modulation frequencies were either low-pass filtered (<4 Hz) or band-pass filtered (22 – 40 Hz). Recognition of the filtered nursery rhymes was tested in a picture recognition multiple choice paradigm. Children with dyslexia aged 10 years showed equivalent recognition overall to TD controls for both the low-pass and band-pass filtered stimuli, but showed significantly impaired acoustic learning during the experiment from low-pass filtered targets. Children with oral SLIs aged 9 years showed significantly poorer recognition of band pass filtered targets compared to their TD controls, and showed comparable acoustic learning effects to TD children during the experiment. The SLI samples were also divided into children with and without phonological difficulties. The children with both SLI and phonological difficulties were impaired in recognizing both kinds of filtered speech. These data are suggestive of impaired temporal sampling of the speech signal at different modulation rates by children with different kinds of developmental language disorder. Both SLI and dyslexic samples showed impaired discrimination of amplitude rise times. Implications of these findings for a temporal sampling framework for understanding developmental language disorders are discussed.
Resumo:
For most people, speech production is relatively effortless and error-free. Yet it has long been recognized that we need some type of control over what we are currently saying and what we plan to say. Precisely how we monitor our internal and external speech has been a topic of research interest for several decades. The predominant approach in psycholinguistics has assumed monitoring of both is accomplished via systems responsible for comprehending others' speech. This special topic aimed to broaden the field, firstly by examining proposals that speech production might also engage more general systems, such as those involved in action monitoring. A second aim was to examine proposals for a production-specific, internal monitor. Both aims require that we also specify the nature of the representations subject to monitoring.
New Method for Delexicalization and its Application to Prosodic Tagging for Text-to-Speech Synthesis
Resumo:
This paper describes a new flexible delexicalization method based on glottal excited parametric speech synthesis scheme. The system utilizes inverse filtered glottal flow and all-pole modelling of the vocal tract. The method provides a possibil- ity to retain and manipulate all relevant prosodic features of any kind of speech. Most importantly, the features include voice quality, which has not been properly modeled in earlier delex- icalization methods. The functionality of the new method was tested in a prosodic tagging experiment aimed at providing word prominence data for a text-to-speech synthesis system. The ex- periment confirmed the usefulness of the method and further corroborated earlier evidence that linguistic factors influence the perception of prosodic prominence.
Resumo:
This correspondence describes a method for automated segmentation of speech. The method proposed in this paper uses a specially designed filter-bank called Bach filter-bank which makes use of 'music' related perception criteria. The speech signal is treated as continuously time varying signal as against a short time stationary model. A comparative study has been made of the performances using Mel, Bark and Bach scale filter banks. The preliminary results show up to 80 % matches within 20 ms of the manually segmented data, without any information of the content of the text and without any language dependence. The Bach filters are seen to marginally outperform the other filters.
Resumo:
The paper describes a modular, unit selection based TTS framework, which can be used as a research bed for developing TTS in any new language, as well as studying the effect of changing any parameter during synthesis. Using this framework, TTS has been developed for Tamil. Synthesis database consists of 1027 phonetically rich prerecorded sentences. This framework has already been tested for Kannada. Our TTS synthesizes intelligible and acceptably natural speech, as supported by high mean opinion scores. The framework is further optimized to suit embedded applications like mobiles and PDAs. We compressed the synthesis speech database with standard speech compression algorithms used in commercial GSM phones and evaluated the quality of the resultant synthesized sentences. Even with a highly compressed database, the synthesized output is perceptually close to that with uncompressed database. Through experiments, we explored the ambiguities in human perception when listening to Tamil phones and syllables uttered in isolation,thus proposing to exploit the misperception to substitute for missing phone contexts in the database. Listening experiments have been conducted on sentences synthesized by deliberately replacing phones with their confused ones.
Resumo:
Human listeners can identify vowels regardless of speaker size, although the sound waves for an adult and a child speaking the ’same’ vowel would differ enormously. The differences are mainly due to the differences in vocal tract length (VTL) and glottal pulse rate (GPR) which are both related to body size. Automatic speech recognition machines are notoriously bad at understanding children if they have been trained on the speech of an adult. In this paper, we propose that the auditory system adapts its analysis of speech sounds, dynamically and automatically to the GPR and VTL of the speaker on a syllable-to-syllable basis. We illustrate how this rapid adaptation might be performed with the aid of a computational version of the auditory image model, and we propose that an auditory preprocessor of this form would improve the robustness of speech recognisers.
Resumo:
A decade ago, perceiving emotion was generally equated with taking a sample (a still photograph or a few seconds of speech) that unquestionably signified an archetypal emotional state, and attaching the appropriate label. Computational research has shifted that paradigm in multiple ways. Concern with realism is key. Emotion generally colours ongoing action and interaction: describing that colouring is a different problem from categorizing brief episodes of relatively pure emotion. Multiple challenges flow from that. Describing emotional colouring is a challenge in itself. One approach is to use everyday categories describing states that are partly emotional and partly cognitive. Another approach is to use dimensions. Both approaches need ways to deal with gradual changes over time and mixed emotions. Attaching target descriptions to a sample poses problems of both procedure and validation. Cues are likely to be distributed both in time and across modalities, and key decisions may depend heavily on context. The usefulness of acted data is limited because it tends not to reproduce these features. By engaging with these challenging issues, research is not only achieving impressive results, but also offering a much deeper understanding of the problem.
Resumo:
Across languages, children with developmental dyslexia have a specific difficulty with the neural representation of the sound structure (phonological structure) of speech. One likely cause of their difficulties with phonology is a perceptual difficulty in auditory temporal processing (Tallal, 1980). Tallal (1980) proposed that basic auditory processing of brief, rapidly successive acoustic changes is compromised in dyslexia, thereby affecting phonetic discrimination (e.g. discriminating /b/ from /d/) via impaired discrimination of formant transitions (rapid acoustic changes in frequency and intensity). However, an alternative auditory temporal hypothesis is that the basic auditory processing of the slower amplitude modulation cues in speech is compromised (Goswami , 2002). Here, we contrast children's perception of a synthetic speech contrast (ba/wa) when it is based on the speed of the rate of change of frequency information (formant transition duration) versus the speed of the rate of change of amplitude modulation (rise time). We show that children with dyslexia have excellent phonetic discrimination based on formant transition duration, but poor phonetic discrimination based on envelope cues. The results explain why phonetic discrimination may be allophonic in developmental dyslexia (Serniclaes , 2004), and suggest new avenues for the remediation of developmental dyslexia. © 2010 Blackwell Publishing Ltd.
Resumo:
Introduction: Rhythm organises musical events into patterns and forms, and rhythm perception in music is usually studied by using metrical tasks. Metrical structure also plays an organisational function in the phonology of language, via speech prosody, and there is evidence for rhythmic perceptual difficulties in developmental dyslexia. Here we investigate the hypothesis that the accurate perception of musical metrical structure is related to basic auditory perception of rise time, and also to phonological and literacy development in children. Methods: A battery of behavioural tasks was devised to explore relations between musical metrical perception, auditory perception of amplitude envelope structure, phonological awareness (PA) and reading in a sample of 64 typically-developing children and children with developmental dyslexia. Results: We show that individual differences in the perception of amplitude envelope rise time are linked to musical metrical sensitivity, and that musical metrical sensitivity predicts PA and reading development, accounting for over 60% of variance in reading along with age and I.Q. Even the simplest metrical task, based on a duple metrical structure, was performed significantly more poorly by the children with dyslexia. Conclusions: The accurate perception of metrical structure may be critical for phonological development and consequently for the development of literacy. Difficulties in metrical processing are associated with basic auditory rise time processing difficulties, suggesting a primary sensory impairment in developmental dyslexia in tracking the lower-frequency modulations in the speech envelope. © 2010 Elsevier.
Resumo:
There is a substantial body of evidence – going back over decades – which indicates that the employment sphere is difficult for those who suffer a speech disability. To a large extent, I argue, this is due to the setting of merit in terms of orality and aesthetic. It also relates to the low perception of competence of the speech disabled. I argue that to be effective against discrimination the notion of merit and its assessment requires focus. ‘Merit’ as a concept in discrimination law has had its critics, yet it remains important to investigate it as social construct in order to help understand discrimination and how to counter this. For example, in this article I look at an instance where the resetting of what was viewed as ‘meritorious’ in judicial recruitment successfully improved the diversity in lower judicial posts.
Further, given the relative failure of the employment tribunal system to improve the general position of those who are disabled, I look to alternative methods of countering disability discrimination. The suggestion provided is that an enforced ombudsman type approach capable of dealing with what may be the core issue around employment discrimination (‘merit’) would provide a better mechanism for handling the general situation of disability discrimination than the tribunal system.
Resumo:
Dissertação apresentada à Escola Superior de Comunicação Social como parte dos requisitos para obtenção de grau de mestre em Audiovisual e Multimédia.
Resumo:
The physiological basis of human cerebral asymmetry for language remains mysterious. We have used simultaneous physiological and anatomical measurements to investigate the issue. Concentrating on neural oscillatory activity in speech-specific frequency bands and exploring interactions between gestural (motor) and auditory-evoked activity, we find, in the absence of language-related processing, that left auditory, somatosensory, articulatory motor, and inferior parietal cortices show specific, lateralized, speech-related physiological properties. With the addition of ecologically valid audiovisual stimulation, activity in auditory cortex synchronizes with left-dominant input from the motor cortex at frequencies corresponding to syllabic, but not phonemic, speech rhythms. Our results support theories of language lateralization that posit a major role for intrinsic, hardwired perceptuomotor processing in syllabic parsing and are compatible both with the evolutionary view that speech arose from a combination of syllable-sized vocalizations and meaningful hand gestures and with developmental observations suggesting phonemic analysis is a developmentally acquired process.