66 resultados para speech databases
Resumo:
Selostus: Maatalous- ja elintarviketieteiden www-pohjaiset viitetietokannat ja aihehakemistot - suomalaisen tiedonetsijän näkökulma
Resumo:
This dissertation considers the segmental durations of speech from the viewpoint of speech technology, especially speech synthesis. The idea is that better models of segmental durations lead to higher naturalness and better intelligibility. These features are the key factors for better usability and generality of synthesized speech technology. Even though the studies are based on a Finnish corpus the approaches apply to all other languages as well. This is possibly due to the fact that most of the studies included in this dissertation are about universal effects taking place on utterance boundaries. Also the methods invented and used here are suitable for any other study of another language. This study is based on two corpora of news reading speech and sentences read aloud. The other corpus is read aloud by a 39-year-old male, whilst the other consists of several speakers in various situations. The use of two corpora is twofold: it involves a comparison of the corpora and a broader view on the matters of interest. The dissertation begins with an overview to the phonemes and the quantity system in the Finnish language. Especially, we are covering the intrinsic durations of phonemes and phoneme categories, as well as the difference of duration between short and long phonemes. The phoneme categories are presented to facilitate the problem of variability of speech segments. In this dissertation we cover the boundary-adjacent effects on segmental durations. In initial positions of utterances we find that there seems to be initial shortening in Finnish, but the result depends on the level of detail and on the individual phoneme. On the phoneme level we find that the shortening or lengthening only affects the very first ones at the beginning of an utterance. However, on average, the effect seems to shorten the whole first word on the word level. We establish the effect of final lengthening in Finnish. The effect in Finnish has been an open question for a long time, whilst Finnish has been the last missing piece for it to be a universal phenomenon. Final lengthening is studied from various angles and it is also shown that it is not a mere effect of prominence or an effect of speech corpus with high inter- and intra-speaker variation. The effect of final lengthening seems to extend from the final to the penultimate word. On a phoneme level it reaches a much wider area than the initial effect. We also present a normalization method suitable for corpus studies on segmental durations. The method uses an utterance-level normalization approach to capture the pattern of segmental durations within each utterance. This prevents the impact of various problematic variations within the corpora. The normalization is used in a study on final lengthening to show that the results on the effect are not caused by variation in the material. The dissertation shows an implementation and prowess of speech synthesis on a mobile platform. We find that the rule-based method of speech synthesis is a real-time software solution, but the signal generation process slows down the system beyond real time. Future aspects of speech synthesis on limited platforms are discussed. The dissertation considers ethical issues on the development of speech technology. The main focus is on the development of speech synthesis with high naturalness, but the problems and solutions are applicable to any other speech technology approaches.
Resumo:
The aim of this study is to explore how a new concept appears inscientific discussion and research, how it diffuses to other fields and out of the scientific communities, and how the networks are formed around the concept. Text and terminology take the interest of a reader in the digital environment. Texts create networks where the terminology used is dependent on the ideas, viewsand paradigms of the field. This study is based mainly on bibliographic data. Materials for bibliometric studies have been collected from different databases. The databases are also evaluated and their quality and coverage are discussed. The thesauri of those databases that have been selected for a more in depth study have also been evaluated. The material selected has been used to study how long and in which ways an innovative publication, which can be seen as a milestone in a specific field, influences the research. The concept that has been chosen as a topic for this research is Social Capital, because it has been a popular concept in different scientific fields as well as in everyday speech and the media. It seemed to be a `fashion concept´ that appeared in different situations at the Millennium. The growth and diffusion of social capital publications has been studied. The terms connected with social capital in different fields and different stages of the development have also been analyzed. The methods that have been used in this study are growth and diffusion analysis, content analysis, citation analysis, coword analysis and cocitation analysis. One method that can be used tounderstand and to interpret results of these bibliometric studies is to interview some key persons, who are known to have a gatekeeper position in the diffusion of the concept. Thematic interviews with some Finnish researchers and specialists that have influenced the diffusion of social capital into Finnish scientificand social discussions provide background information. iv The Milestone Publications on social capital have been chosen and studied. They give answers to the question "What is Social Capital?" By comparing citations to Milestone Publications with the growth of all social capital publications in a database, we can drawconclusions about the point at which social capital became generally approved `tacit knowledge´. The contribution of the present study lies foremost in understanding the development of network structures around a new concept that has diffused in scientific communities and also outside them. The network means both networks of researchers, networks of publications and networks of concepts that describe the research field. The emphasis has been on the digital environment and onthe socalled information society that we are now living in, but in this transitional stage, the printed publications are still important and widely used in social sciences and humanities. The network formation is affected by social relations and informal contacts that push new ideas. This study also gives new information about using different research methods, like bibliometric methods supported by interviews and content analyses. It is evident that interpretation of bibliometric maps presupposes qualitative information and understanding of the phenomena under study.
Resumo:
The flow of information within modern information society has increased rapidly over the last decade. The major part of this information flow relies on the individual’s abilities to handle text or speech input. For the majority of us it presents no problems, but there are some individuals who would benefit from other means of conveying information, e.g. signed information flow. During the last decades the new results from various disciplines have all suggested towards the common background and processing for sign and speech and this was one of the key issues that I wanted to investigate further in this thesis. The basis of this thesis is firmly within speech research and that is why I wanted to design analogous test batteries for widely used speech perception tests for signers – to find out whether the results for signers would be the same as in speakers’ perception tests. One of the key findings within biology – and more precisely its effects on speech and communication research – is the mirror neuron system. That finding has enabled us to form new theories about evolution of communication, and it all seems to converge on the hypothesis that all communication has a common core within humans. In this thesis speech and sign are discussed as equal and analogical counterparts of communication and all research methods used in speech are modified for sign. Both speech and sign are thus investigated using similar test batteries. Furthermore, both production and perception of speech and sign are studied separately. An additional framework for studying production is given by gesture research using cry sounds. Results of cry sound research are then compared to results from children acquiring sign language. These results show that individuality manifests itself from very early on in human development. Articulation in adults, both in speech and sign, is studied from two perspectives: normal production and re-learning production when the apparatus has been changed. Normal production is studied both in speech and sign and the effects of changed articulation are studied with regards to speech. Both these studies are done by using carrier sentences. Furthermore, sign production is studied giving the informants possibility for spontaneous speech. The production data from the signing informants is also used as the basis for input in the sign synthesis stimuli used in sign perception test battery. Speech and sign perception were studied using the informants’ answers to questions using forced choice in identification and discrimination tasks. These answers were then compared across language modalities. Three different informant groups participated in the sign perception tests: native signers, sign language interpreters and Finnish adults with no knowledge of any signed language. This gave a chance to investigate which of the characteristics found in the results were due to the language per se and which were due to the changes in modality itself. As the analogous test batteries yielded similar results over different informant groups, some common threads of results could be observed. Starting from very early on in acquiring speech and sign the results were highly individual. However, the results were the same within one individual when the same test was repeated. This individuality of results represented along same patterns across different language modalities and - in some occasions - across language groups. As both modalities yield similar answers to analogous study questions, this has lead us to providing methods for basic input for sign language applications, i.e. signing avatars. This has also given us answers to questions on precision of the animation and intelligibility for the users – what are the parameters that govern intelligibility of synthesised speech or sign and how precise must the animation or synthetic speech be in order for it to be intelligible. The results also give additional support to the well-known fact that intelligibility in fact is not the same as naturalness. In some cases, as shown within the sign perception test battery design, naturalness decreases intelligibility. This also has to be taken into consideration when designing applications. All in all, results from each of the test batteries, be they for signers or speakers, yield strikingly similar patterns, which would indicate yet further support for the common core for all human communication. Thus, we can modify and deepen the phonetic framework models for human communication based on the knowledge obtained from the results of the test batteries within this thesis.
Resumo:
Integrum-aineistokoulutuksen 28.9. - 29.9.2011 materiaali