8 resultados para Speech-processing technologies
em Helda - Digital Repository of University of Helsinki
Resumo:
Comprehension of a complex acoustic signal - speech - is vital for human communication, with numerous brain processes required to convert the acoustics into an intelligible message. In four studies in the present thesis, cortical correlates for different stages of speech processing in a mature linguistic system of adults were investigated. In two further studies, developmental aspects of cortical specialisation and its plasticity in adults were examined. In the present studies, electroencephalographic (EEG) and magnetoencephalographic (MEG) recordings of the mismatch negativity (MMN) response elicited by changes in repetitive unattended auditory events and the phonological mismatch negativity (PMN) response elicited by unexpected speech sounds in attended speech inputs served as the main indicators of cortical processes. Changes in speech sounds elicited the MMNm, the magnetic equivalent of the electric MMN, that differed in generator loci and strength from those elicited by comparable changes in non-speech sounds, suggesting intra- and interhemispheric specialisation in the processing of speech and non-speech sounds at an early automatic processing level. This neuronal specialisation for the mother tongue was also reflected in the more efficient formation of stimulus representations in auditory sensory memory for typical native-language speech sounds compared with those formed for unfamiliar, non-prototype speech sounds and simple tones. Further, adding a speech or non-speech sound context to syllable changes was found to modulate the MMNm strength differently in the left and right hemispheres. Following the acoustic-phonetic processing of speech input, phonological effort related to the selection of possible lexical (word) candidates was linked with distinct left-hemisphere neuronal populations. In summary, the results suggest functional specialisation in the neuronal substrates underlying different levels of speech processing. Subsequently, plasticity of the brain's mature linguistic system was investigated in adults, in whom representations for an aurally-mediated communication system, Morse code, were found to develop within the same hemisphere where representations for the native-language speech sounds were already located. Finally, recording and localization of the MMNm response to changes in speech sounds was successfully accomplished in newborn infants, encouraging future MEG investigations on, for example, the state of neuronal specialisation at birth.
Resumo:
Speech rhythm is an essential part of speech processing. It is the outcome of the workings of a combination of linguistic and non-linguistic parameters, many of which also have other functions in speech. This study focusses on the acoustic and auditive realization of two linguistic parameters of rhythm: (1) sentence stress, and (2) speech rate and pausing. The aim was to find out how well Finnish comprehensive school pupils realize these two parameters in English and how native speakers of English react to Finnish pupils English rhythm. The material was elicited by means of a story-telling task and questionnaires. Three female and three male pupils representing different levels of oral skills in English were selected as the experimental group. The control group consisted of two female and two male native speakers of English. The stories were analysed acoustically and auditorily with respect to interstress intervals, weak forms, fundamental frequency, pausing, and speech as well as articulation rate. In addition, 52 native speakers of English were asked to rate the intelligibility of the Finnish pupils English with respect to speech rhythm and give their attitudes on what the pupils sounded like. Results showed that Finnish pupils can produce isochronous interstress intervals in English, but that too large a proportion of these intervals contain pauses. A closer analysis of the pauses revealed that Finnish pupils pause too frequently and in inappropriate places when they speak English. Frequent pausing was also found to cause slow speech rates. The findings of the fundamental frequency (F0) measurements indicate that Finnish pupils tend to make a slightly narrower F0 difference between stressed and unstressed syllables than the native speakers of English. Furthermore, Finnish pupils appear to know how to reduce the duration and quality of unstressed sounds, but they fail to do it frequently enough. Native listeners gave lower intelligibility and attitude scores to pupils with more anomalous speech rhythm. Finnish pupils rhythm anomalies seemed to derive from various learning- or learner-related factors rather than from the differences between English and Finnish. This study demonstrates that pausing may be a more important component of English speech rhythm than sentence stress as far as Finnish adolescents are concerned and that interlanguage development is affected by various factors and characterised by jumps or periods of stasis. Other theoretical, methodological and pedagogical implications of the results are also discussed.
Resumo:
Autism and Asperger syndrome (AS) are neurodevelopmental disorders characterised by deficient social and communication skills, as well as restricted, repetitive patterns of behaviour. The language development in individuals with autism is significantly delayed and deficient, whereas in individuals with AS, the structural aspects of language develop quite normally. Both groups, however, have semantic-pragmatic language deficits. The present thesis investigated auditory processing in individuals with autism and AS. In particular, the discrimination of and orienting to speech and non-speech sounds was studied, as well as the abstraction of invariant sound features from speech-sound input. Altogether five studies were conducted with auditory event-related brain potentials (ERP); two studies also included a behavioural sound-identification task. In three studies, the subjects were children with autism, in one study children with AS, and in one study adults with AS. In children with autism, even the early stages of sound encoding were deficient. In addition, these children had altered sound-discrimination processes characterised by enhanced spectral but deficient temporal discrimination. The enhanced pitch discrimination may partly explain the auditory hypersensitivity common in autism, and it may compromise the filtering of relevant auditory information from irrelevant information. Indeed, it was found that when sound discrimination required abstracting invariant features from varying input, children with autism maintained their superiority in pitch processing, but lost it in vowel processing. Finally, involuntary orienting to sound changes was deficient in children with autism in particular with respect to speech sounds. This finding is in agreement with previous studies on autism suggesting deficits in orienting to socially relevant stimuli. In contrast to children with autism, the early stages of sound encoding were fairly unimpaired in children with AS. However, sound discrimination and orienting were rather similarly altered in these children as in those with autism, suggesting correspondences in the auditory phenotype in these two disorders which belong to the same continuum. Unlike children with AS, adults with AS showed enhanced processing of duration changes, suggesting developmental changes in auditory processing in this disorder.
Resumo:
The study examines various uses of computer technology in acquisition of information for visually impaired people. For this study 29 visually impaired persons took part in a survey about their experiences concerning acquisition of infomation and use of computers, especially with a screen magnification program, a speech synthesizer and a braille display. According to the responses, the evolution of computer technology offers an important possibility for visually impaired people to cope with everyday activities and interacting with the environment. Nevertheless, the functionality of assistive technology needs further development to become more usable and versatile. Since the challenges of independent observation of environment were emphasized in the survey, the study led into developing a portable text vision system called Tekstinäkö. Contrary to typical stand-alone applications, Tekstinäkö system was constructed by combining devices and programs that are readily available on consumer market. As the system operates, pictures are taken by a digital camera and instantly transmitted to a text recognition program in a laptop computer that talks out loud the text using a speech synthesizer. Visually impaired test users described that even unsure interpretations of the texts in the environment given by Tekstinäkö system are at least a welcome addition to complete perception of the environment. It became clear that even with a modest development work it is possible to bring new, useful and valuable methods to everyday life of disabled people. Unconventional production process of the system appeared to be efficient as well. Achieved results and the proposed working model offer one suggestion for giving enough attention to easily overlooked needs of the people with special abilities. ACM Computing Classification System (1998): K.4.2 Social Issues: Assistive technologies for persons with disabilities I.4.9 Image processing and computer vision: Applications Keywords: Visually impaired, computer-assisted, information, acquisition, assistive technology, computer, screen magnification program, speech synthesizer, braille display, survey, testing, text recognition, camera, text, perception, picture, environment, trasportation, guidance, independence, vision, disabled, blind, speech, synthesizer, braille, software engineering, programming, program, system, freeware, shareware, open source, Tekstinäkö, text vision, TopOCR, Autohotkey, computer engineering, computer science
Resumo:
Speech has both auditory and visual components (heard speech sounds and seen articulatory gestures). During all perception, selective attention facilitates efficient information processing and enables concentration on high-priority stimuli. Auditory and visual sensory systems interact at multiple processing levels during speech perception and, further, the classical motor speech regions seem also to participate in speech perception. Auditory, visual, and motor-articulatory processes may thus work in parallel during speech perception, their use possibly depending on the information available and the individual characteristics of the observer. Because of their subtle speech perception difficulties possibly stemming from disturbances at elemental levels of sensory processing, dyslexic readers may rely more on motor-articulatory speech perception strategies than do fluent readers. This thesis aimed to investigate the neural mechanisms of speech perception and selective attention in fluent and dyslexic readers. We conducted four functional magnetic resonance imaging experiments, during which subjects perceived articulatory gestures, speech sounds, and other auditory and visual stimuli. Gradient echo-planar images depicting blood oxygenation level-dependent contrast were acquired during stimulus presentation to indirectly measure brain hemodynamic activation. Lip-reading activated the primary auditory cortex, and selective attention to visual speech gestures enhanced activity within the left secondary auditory cortex. Attention to non-speech sounds enhanced auditory cortex activity bilaterally; this effect showed modulation by sound presentation rate. A comparison between fluent and dyslexic readers' brain hemodynamic activity during audiovisual speech perception revealed stronger activation of predominantly motor speech areas in dyslexic readers during a contrast test that allowed exploration of the processing of phonetic features extracted from auditory and visual speech. The results show that visual speech perception modulates hemodynamic activity within auditory cortex areas once considered unimodal, and suggest that the left secondary auditory cortex specifically participates in extracting the linguistic content of seen articulatory gestures. They are strong evidence for the importance of attention as a modulator of auditory cortex function during both sound processing and visual speech perception, and point out the nature of attention as an interactive process (influenced by stimulus-driven effects). Further, they suggest heightened reliance on motor-articulatory and visual speech perception strategies among dyslexic readers, possibly compensating for their auditory speech perception difficulties.
Resumo:
We use parallel weighted finite-state transducers to implement a part-of-speech tagger, which obtains state-of-the-art accuracy when used to tag the Europarl corpora for Finnish, Swedish and English. Our system consists of a weighted lexicon and a guesser combined with a bigram model factored into two weighted transducers. We use both lemmas and tag sequences in the bigram model, which guarantees reliable bigram estimates.
Resumo:
Asperger Syndrome (AS) belongs to autism spectrum disorders where both verbal and non-verbal communication difficulties are at the core of the impairment. Social communication requires a complex use of affective, linguistic-cognitive and perceptual processes. In the four studies included in the current thesis, some of the linguistic and perceptual factors that are important for face-to-face communication were studied using behavioural methods. In all four studies the results obtained from individuals with AS were compared with typically developed age, gender and IQ matched controls. First, the language skills of school-aged children were characterized in detail with standardized tests that measured different aspects of receptive and expressive language (Study I). The children with AS were found to be worse than the controls in following complex verbal instructions. Next, the visual perception of facial expressions of emotion with varying degrees of visual detail was examined (Study II). Adults with AS were found to have impaired recognition of facial expressions on the basis of very low spatial frequencies which are important for processing global information. Following that, multisensory perception was investigated by looking at audiovisual speech perception (Studies III and IV). Adults with AS were found to perceive audiovisual speech qualitatively differently from typically developed adults, although both groups were equally accurate in recognizing auditory and visual speech presented alone. Finally, the effect of attention on audiovisual speech perception was studied by registering eye gaze behaviour (Study III) and by studying the voluntary control of visual attention (Study IV). The groups did not differ in eye gaze behaviour or in the voluntary control of visual attention. The results of the study series demonstrate that many factors underpinning face-to-face social communication are atypical in AS. In contrast with previous assumptions about intact language abilities, the current results show that children with AS have difficulties in understanding complex verbal instructions. Furthermore, the study makes clear that deviations in the perception of global features in faces expressing emotions as well as in the multisensory perception of speech are likely to harm face-to-face social communication.