4 resultados para Phoneme
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
This dissertation considers the segmental durations of speech from the viewpoint of speech technology, especially speech synthesis. The idea is that better models of segmental durations lead to higher naturalness and better intelligibility. These features are the key factors for better usability and generality of synthesized speech technology. Even though the studies are based on a Finnish corpus the approaches apply to all other languages as well. This is possibly due to the fact that most of the studies included in this dissertation are about universal effects taking place on utterance boundaries. Also the methods invented and used here are suitable for any other study of another language. This study is based on two corpora of news reading speech and sentences read aloud. The other corpus is read aloud by a 39-year-old male, whilst the other consists of several speakers in various situations. The use of two corpora is twofold: it involves a comparison of the corpora and a broader view on the matters of interest. The dissertation begins with an overview to the phonemes and the quantity system in the Finnish language. Especially, we are covering the intrinsic durations of phonemes and phoneme categories, as well as the difference of duration between short and long phonemes. The phoneme categories are presented to facilitate the problem of variability of speech segments. In this dissertation we cover the boundary-adjacent effects on segmental durations. In initial positions of utterances we find that there seems to be initial shortening in Finnish, but the result depends on the level of detail and on the individual phoneme. On the phoneme level we find that the shortening or lengthening only affects the very first ones at the beginning of an utterance. However, on average, the effect seems to shorten the whole first word on the word level. We establish the effect of final lengthening in Finnish. The effect in Finnish has been an open question for a long time, whilst Finnish has been the last missing piece for it to be a universal phenomenon. Final lengthening is studied from various angles and it is also shown that it is not a mere effect of prominence or an effect of speech corpus with high inter- and intra-speaker variation. The effect of final lengthening seems to extend from the final to the penultimate word. On a phoneme level it reaches a much wider area than the initial effect. We also present a normalization method suitable for corpus studies on segmental durations. The method uses an utterance-level normalization approach to capture the pattern of segmental durations within each utterance. This prevents the impact of various problematic variations within the corpora. The normalization is used in a study on final lengthening to show that the results on the effect are not caused by variation in the material. The dissertation shows an implementation and prowess of speech synthesis on a mobile platform. We find that the rule-based method of speech synthesis is a real-time software solution, but the signal generation process slows down the system beyond real time. Future aspects of speech synthesis on limited platforms are discussed. The dissertation considers ethical issues on the development of speech technology. The main focus is on the development of speech synthesis with high naturalness, but the problems and solutions are applicable to any other speech technology approaches.
Resumo:
Kirjallisuusarvostelu
Resumo:
The topic of the present doctoral dissertation is the analysis of the phonological and tonal structures of a previously largely undescribed language, namely Samue. It is a Gur language belonging to the Niger-Congo language phulym, which is spoken in Burkina Faso. The data were collected during the fieldwork period in a Sama village; the data include 1800 lexical items, thousands of elicited sentences and 30 oral texts. The data were first transcribed phonetically and then the phonological and tonal analyses were conducted. The results show that the phonological system of Samue with the phoneme inventory and phonological processes has the same characteristics as other related Gur languages, although some particularities were found, such as the voicing and lenition of stop consonants in medial positions. Tonal analysis revealed three level tones, which have both lexical and grammatical functions. A particularity of the tonal system is the regressive Mid tone spreading in the verb phrase. The theoretical framework used in the study is Optimality theory. Optimality theory is rarely used in the analysis of an entire language system, and thus an objective was to see whether the theory was applicable to this type of work. Within the tonal analysis especially, some language specific constraints had to be created, although the basic Optimality Theory principle is the universal nature of the constraints. These constraints define the well-formedness of the language structures and they are differently ranked in different languages. This study gives new insights about typological phenomena in Gur languages. It is also a fundamental starting point for the Samue language in relation to the establishment of an orthography. From the theoretical point of view, the study proves that Optimality theory is largely applicable in the analysis of an entire sound system.
Resumo:
Middle ear infections (acute otitis media, AOM) are among the most common infectious diseases in childhood, their incidence being greatest at the age of 6–12 months. Approximately 10–30% of children undergo repetitive periods of AOM, referred to as recurrent acute otitis media (RAOM). Middle ear fluid during an AOM episode causes, on average, 20–30 dB of hearing loss lasting from a few days to as much as a couple of months. It is well known that even a mild permanent hearing loss has an effect on language development but so far there is no consensus regarding the consequences of RAOM on childhood language acquisition. The results of studies on middle ear infections and language development have been partly discrepant and the exact effects of RAOM on the developing central auditory nervous system are as yet unknown. This thesis aims to examine central auditory processing and speech production among 2-year-old children with RAOM. Event-related potentials (ERPs) extracted from electroencephalography can be used to objectively investigate the functioning of the central auditory nervous system. For the first time this thesis has utilized auditory ERPs to study sound encoding and preattentive auditory discrimination of speech stimuli, and neural mechanisms of involuntary auditory attention in children with RAOM. Furthermore, the level of phonological development was studied by investigating the number and the quality of consonants produced by these children. Acquisition of consonant phonemes, which are harder to hear than vowels, is a good indicator of the ability to form accurate memory representations of ambient language and has not been studied previously in Finnish-speaking children with RAOM. The results showed that the cortical sound encoding was intact but the preattentive auditory discrimination of multiple speech sound features was atypical in those children with RAOM. Furthermore, their neural mechanisms of auditory attention differed from those of their peers, thus indicating that children with RAOM are atypically sensitive to novel but meaningless sounds. The children with RAOM also produced fewer consonants than their controls. Noticeably, they had a delay in the acquisition of word-medial consonants and the Finnish phoneme /s/, which is acoustically challenging to perceive compared to the other Finnish phonemes. The findings indicate the immaturity of central auditory processing in the children with RAOM, and this might also emerge in speech production. This thesis also showed that the effects of RAOM on central auditory processing are long-lasting because the children had healthy ears at the time of the study. An effective neural network for speech sound processing is a basic requisite of language acquisition, and RAOM in early childhood should be considered as a risk factor for language development.