12 resultados para Speech Acoustics
em Helda - Digital Repository of University of Helsinki
Resumo:
Comprehension of a complex acoustic signal - speech - is vital for human communication, with numerous brain processes required to convert the acoustics into an intelligible message. In four studies in the present thesis, cortical correlates for different stages of speech processing in a mature linguistic system of adults were investigated. In two further studies, developmental aspects of cortical specialisation and its plasticity in adults were examined. In the present studies, electroencephalographic (EEG) and magnetoencephalographic (MEG) recordings of the mismatch negativity (MMN) response elicited by changes in repetitive unattended auditory events and the phonological mismatch negativity (PMN) response elicited by unexpected speech sounds in attended speech inputs served as the main indicators of cortical processes. Changes in speech sounds elicited the MMNm, the magnetic equivalent of the electric MMN, that differed in generator loci and strength from those elicited by comparable changes in non-speech sounds, suggesting intra- and interhemispheric specialisation in the processing of speech and non-speech sounds at an early automatic processing level. This neuronal specialisation for the mother tongue was also reflected in the more efficient formation of stimulus representations in auditory sensory memory for typical native-language speech sounds compared with those formed for unfamiliar, non-prototype speech sounds and simple tones. Further, adding a speech or non-speech sound context to syllable changes was found to modulate the MMNm strength differently in the left and right hemispheres. Following the acoustic-phonetic processing of speech input, phonological effort related to the selection of possible lexical (word) candidates was linked with distinct left-hemisphere neuronal populations. In summary, the results suggest functional specialisation in the neuronal substrates underlying different levels of speech processing. Subsequently, plasticity of the brain's mature linguistic system was investigated in adults, in whom representations for an aurally-mediated communication system, Morse code, were found to develop within the same hemisphere where representations for the native-language speech sounds were already located. Finally, recording and localization of the MMNm response to changes in speech sounds was successfully accomplished in newborn infants, encouraging future MEG investigations on, for example, the state of neuronal specialisation at birth.
Resumo:
Speech rhythm is an essential part of speech processing. It is the outcome of the workings of a combination of linguistic and non-linguistic parameters, many of which also have other functions in speech. This study focusses on the acoustic and auditive realization of two linguistic parameters of rhythm: (1) sentence stress, and (2) speech rate and pausing. The aim was to find out how well Finnish comprehensive school pupils realize these two parameters in English and how native speakers of English react to Finnish pupils English rhythm. The material was elicited by means of a story-telling task and questionnaires. Three female and three male pupils representing different levels of oral skills in English were selected as the experimental group. The control group consisted of two female and two male native speakers of English. The stories were analysed acoustically and auditorily with respect to interstress intervals, weak forms, fundamental frequency, pausing, and speech as well as articulation rate. In addition, 52 native speakers of English were asked to rate the intelligibility of the Finnish pupils English with respect to speech rhythm and give their attitudes on what the pupils sounded like. Results showed that Finnish pupils can produce isochronous interstress intervals in English, but that too large a proportion of these intervals contain pauses. A closer analysis of the pauses revealed that Finnish pupils pause too frequently and in inappropriate places when they speak English. Frequent pausing was also found to cause slow speech rates. The findings of the fundamental frequency (F0) measurements indicate that Finnish pupils tend to make a slightly narrower F0 difference between stressed and unstressed syllables than the native speakers of English. Furthermore, Finnish pupils appear to know how to reduce the duration and quality of unstressed sounds, but they fail to do it frequently enough. Native listeners gave lower intelligibility and attitude scores to pupils with more anomalous speech rhythm. Finnish pupils rhythm anomalies seemed to derive from various learning- or learner-related factors rather than from the differences between English and Finnish. This study demonstrates that pausing may be a more important component of English speech rhythm than sentence stress as far as Finnish adolescents are concerned and that interlanguage development is affected by various factors and characterised by jumps or periods of stasis. Other theoretical, methodological and pedagogical implications of the results are also discussed.
Resumo:
This dissertation consists of four articles and an introduction. The five parts address the same topic, nonverbal predication in Erzya, from different perspectives. The work is at the same time linguistic typology and Uralic studies. The findings based on a large corpus of empirical Erzya data, which was collected using several different methods and included recordings of the spoken language, made it possible for the present study to apply, then test and finally discuss the previous theories based on cross-linguistic data. Erzya makes use of multiple predication patterns which vary from totally analytic to the morphologically very complex. Nonverbal predicate clause types are classified on the basis of propositional acts in clauses denoting class-membership, identity, property and location. The predicates of these clauses are nouns, adjectives and locational expressions, respectively. The following three predication strategies in Erzya nonverbal predication can be identified: i. the zero-copula construction, ii. the predicative suffix construction and iii. the copula construction. It has been suggested that verbs and nouns cannot be clearly distinguished on morphological grounds when functioning as predicates in Erzya. This study shows that even though predicativity must not be considered a sufficient tool for defining parts of speech in any language, the Erzya lexical classes of adjective, noun and verb can be distinguished from each other also in predicate position. The relative frequency and degree of obligation for using the predicative suffix construction decreases when moving left to right on the scale verb adjective/locative noun ( identificational statement). The predicative suffix is the main pattern in the present tense over the whole domain of nonverbal predication in Standard Erzya, but if it is replaced it is most likely to be with a zero-copula construction in a nominal predication. This study exploits the theory of (a)symmetry for the first time in order to describe verbal vs. nonverbal predication. It is shown that the asymmetry of paradigms and constructions differentiates the lexical classes. Asymmetrical structures are motivated by functional level asymmetry. Variation in predication as such adds to the complexity of the grammar. When symmetric structures are employed, the functional complexity of grammar decreases, even though morphological complexity increases. The genre affects the employment of predication strategies in Erzya. There are differences in the relative frequency of the patterns, and some patterns are totally lacking from some of the data. The clearest difference is that the past tense predicative suffix construction occurs relatively frequently in Standard Erzya, while it occurs infrequently in the other data. Also, the predicative suffixes of the present tense are used more regularly in written Standard Erzya than in any other genre. The genre also affects the incidence of the translative in uľ(ń)ems copula constructions. In translations from Russian to Erzya the translative case is employed relatively frequently in comparison to other data. This study reveals differences between the two Mordvinic languages Erzya and Moksha. The predicative suffixes (bound person markers) of the present tense are used more regularly in Moksha in all kinds of nonverbal predicate clauses compared to Erzya. It should further be observed that identificational statements are encoded with a predicative suffix in Moksha, but seldom in Erzya. Erzya clauses are more frequently encoded using zero-constructions, displaying agreement in number only.
Resumo:
Autism and Asperger syndrome (AS) are neurodevelopmental disorders characterised by deficient social and communication skills, as well as restricted, repetitive patterns of behaviour. The language development in individuals with autism is significantly delayed and deficient, whereas in individuals with AS, the structural aspects of language develop quite normally. Both groups, however, have semantic-pragmatic language deficits. The present thesis investigated auditory processing in individuals with autism and AS. In particular, the discrimination of and orienting to speech and non-speech sounds was studied, as well as the abstraction of invariant sound features from speech-sound input. Altogether five studies were conducted with auditory event-related brain potentials (ERP); two studies also included a behavioural sound-identification task. In three studies, the subjects were children with autism, in one study children with AS, and in one study adults with AS. In children with autism, even the early stages of sound encoding were deficient. In addition, these children had altered sound-discrimination processes characterised by enhanced spectral but deficient temporal discrimination. The enhanced pitch discrimination may partly explain the auditory hypersensitivity common in autism, and it may compromise the filtering of relevant auditory information from irrelevant information. Indeed, it was found that when sound discrimination required abstracting invariant features from varying input, children with autism maintained their superiority in pitch processing, but lost it in vowel processing. Finally, involuntary orienting to sound changes was deficient in children with autism in particular with respect to speech sounds. This finding is in agreement with previous studies on autism suggesting deficits in orienting to socially relevant stimuli. In contrast to children with autism, the early stages of sound encoding were fairly unimpaired in children with AS. However, sound discrimination and orienting were rather similarly altered in these children as in those with autism, suggesting correspondences in the auditory phenotype in these two disorders which belong to the same continuum. Unlike children with AS, adults with AS showed enhanced processing of duration changes, suggesting developmental changes in auditory processing in this disorder.
Resumo:
Data on the influence of unilateral vocal fold paralysis on breathing, especially other than information obtained by spirometry, are relatively scarce. Even less is known about the effect of its treatment by vocal fold medialization. Consequently, there was a need to study the issue by combining multiple instruments capable of assessing airflow dynamics and voice. This need was emphasized by a recently developed medialization technique, autologous fascia injection; its effects on breathing have not previously been investigated. A cohort of ten patients with unilateral vocal fold paralysis was studied before and after autologous fascia injection by using flow-volume spirometry, body plethysmography and acoustic analysis of breathing and voice. Preoperative results were compared with those of ten healthy controls. A second cohort of 11 subjects with unilateral vocal fold paralysis was studied pre- and postoperatively by using flow-volume spirometry, impulse oscillometry, acoustic analysis of voice, voice handicap index and subjective assessment of dyspnoea. Preoperative peak inspiratory flow and specific airway conductance were significantly lower and airway resistance was significantly higher in the patients than in the healthy controls (78% vs. 107%, 73% vs. 116% and 182% vs. 125% of predicted; p = 0.004, p = 0.004 and p = 0.026, respectively). Patients had a higher root mean square of spectral power of tracheal sounds than controls, and three of them had wheezes as opposed to no wheezing in healthy subjects. Autologous fascia injection significantly improved acoustic parameters of the voice in both cohorts and voice handicap index in the latter cohort, indicating that this procedure successfully improved voice in unilateral vocal fold paralysis. Peak inspiratory flow decreased significantly as a consequence of this procedure (from 4.54 ± 1.68 l to 4.21 ± 1.26 l, p = 0.03, in pooled data of both cohorts), but no change occurred in the other variables of flow-volume spirometry, body-plethysmography and impulse oscillometry. Eight of the ten patients studied by acoustic analysis of breathing had wheezes after vocal fold medialization compared with only three patients before the procedure, and the numbers of wheezes per recorded inspirium and expirium increased significantly (from 0.02 to 0.42 and from 0.03 to 0.36; p = 0.028 and p = 0.043, respectively). In conclusion, unilateral vocal fold paralysis was observed to disturb forced breathing and also to cause some signs of disturbed tidal breathing. Findings of flow volume spirometry were consistent with variable extra-thoracic obstruction. Vocal fold medialization by autologous fascia injection improved the quality of the voice in patients with unilateral vocal fold paralysis, but also decreased peak inspiratory flow and induced wheezing during tidal breathing. However, these airflow changes did not appear to cause significant symptoms in patients.
Resumo:
Speech has both auditory and visual components (heard speech sounds and seen articulatory gestures). During all perception, selective attention facilitates efficient information processing and enables concentration on high-priority stimuli. Auditory and visual sensory systems interact at multiple processing levels during speech perception and, further, the classical motor speech regions seem also to participate in speech perception. Auditory, visual, and motor-articulatory processes may thus work in parallel during speech perception, their use possibly depending on the information available and the individual characteristics of the observer. Because of their subtle speech perception difficulties possibly stemming from disturbances at elemental levels of sensory processing, dyslexic readers may rely more on motor-articulatory speech perception strategies than do fluent readers. This thesis aimed to investigate the neural mechanisms of speech perception and selective attention in fluent and dyslexic readers. We conducted four functional magnetic resonance imaging experiments, during which subjects perceived articulatory gestures, speech sounds, and other auditory and visual stimuli. Gradient echo-planar images depicting blood oxygenation level-dependent contrast were acquired during stimulus presentation to indirectly measure brain hemodynamic activation. Lip-reading activated the primary auditory cortex, and selective attention to visual speech gestures enhanced activity within the left secondary auditory cortex. Attention to non-speech sounds enhanced auditory cortex activity bilaterally; this effect showed modulation by sound presentation rate. A comparison between fluent and dyslexic readers' brain hemodynamic activity during audiovisual speech perception revealed stronger activation of predominantly motor speech areas in dyslexic readers during a contrast test that allowed exploration of the processing of phonetic features extracted from auditory and visual speech. The results show that visual speech perception modulates hemodynamic activity within auditory cortex areas once considered unimodal, and suggest that the left secondary auditory cortex specifically participates in extracting the linguistic content of seen articulatory gestures. They are strong evidence for the importance of attention as a modulator of auditory cortex function during both sound processing and visual speech perception, and point out the nature of attention as an interactive process (influenced by stimulus-driven effects). Further, they suggest heightened reliance on motor-articulatory and visual speech perception strategies among dyslexic readers, possibly compensating for their auditory speech perception difficulties.
New Method for Delexicalization and its Application to Prosodic Tagging for Text-to-Speech Synthesis
Resumo:
This paper describes a new flexible delexicalization method based on glottal excited parametric speech synthesis scheme. The system utilizes inverse filtered glottal flow and all-pole modelling of the vocal tract. The method provides a possibil- ity to retain and manipulate all relevant prosodic features of any kind of speech. Most importantly, the features include voice quality, which has not been properly modeled in earlier delex- icalization methods. The functionality of the new method was tested in a prosodic tagging experiment aimed at providing word prominence data for a text-to-speech synthesis system. The ex- periment confirmed the usefulness of the method and further corroborated earlier evidence that linguistic factors influence the perception of prosodic prominence.
Resumo:
The study analyses the ambivalent relationship republicanism, as a form of self-government free from domination, had with the ideal of participatory oratory and non-dominated speech on the one hand, and with the danger of unhindered demagogy and its possibly fatal consequences to that form of government on the other. Although previous scholarship has delved deeply into republicanism as well as into rhetoric and public speech, the interplay between those aspects has only gathered scattered interest, and there has been no systematic study considering the variety of republican approaches to rhetoric and public speech in 17th-century England. The rare attempts to do so have been studies in English literature, and they have not analysed the political philosophy of republicanism, as the focus has been on republicanism as a literary culture. This study connects the fields of political theory, political history as well as literature in order to make a multidisciplinary contribution to intellectual history. The study shows that, within the tradition of classical republicanism, individual authors could make different choices when addressing the problematic topics of public speech and rhetoric, and the variety of their conclusions often set the authors against each other, resulting in the development of their theories through internal debates within the republican tradition. The authors under study were chosen to reflect this variety and the connections between them: the similarities between James Harrington and John Streater, and between John Milton and John Hall of Durham are shown, as well the controversies between Harrington and Milton, and Streater and Hall, respectively. In addition, by analysing the writings of Marchamont Nedham the study will show that the choices were not limited to more, or less, democratic brands of republicanism. Most significantly, the study provides a thorough analysis of the political philosophies behind the various brands of republicanism, in addition to describing them. By means of this analysis, the study shows that previous attempts to assess the role of free speech and public debate, through the lenses of modern, rights-based liberal political theory have resulted in an inappropriate framework for understanding early modern English republicanism. By approaching the topics through concepts used by the republicans legitimate authority, leadership by oratory, and republican freedom and through the frames of reference available and familiar to them roles of education and institutions the study presents a thorough and systematic analysis of the role and function of rhetoric and public speech in English republicanism. The findings of this analysis have significant consequences to our current understanding of the history and development of republican political theory, and, more generally, of the connections between democratic theory and free speech.
Resumo:
We use parallel weighted finite-state transducers to implement a part-of-speech tagger, which obtains state-of-the-art accuracy when used to tag the Europarl corpora for Finnish, Swedish and English. Our system consists of a weighted lexicon and a guesser combined with a bigram model factored into two weighted transducers. We use both lemmas and tag sequences in the bigram model, which guarantees reliable bigram estimates.