16 resultados para Text-to-speech
                                
Resumo:
The flow of information within modern information society has increased rapidly over the last decade. The major part of this information flow relies on the individual’s abilities to handle text or speech input. For the majority of us it presents no problems, but there are some individuals who would benefit from other means of conveying information, e.g. signed information flow. During the last decades the new results from various disciplines have all suggested towards the common background and processing for sign and speech and this was one of the key issues that I wanted to investigate further in this thesis. The basis of this thesis is firmly within speech research and that is why I wanted to design analogous test batteries for widely used speech perception tests for signers – to find out whether the results for signers would be the same as in speakers’ perception tests. One of the key findings within biology – and more precisely its effects on speech and communication research – is the mirror neuron system. That finding has enabled us to form new theories about evolution of communication, and it all seems to converge on the hypothesis that all communication has a common core within humans. In this thesis speech and sign are discussed as equal and analogical counterparts of communication and all research methods used in speech are modified for sign. Both speech and sign are thus investigated using similar test batteries. Furthermore, both production and perception of speech and sign are studied separately. An additional framework for studying production is given by gesture research using cry sounds. Results of cry sound research are then compared to results from children acquiring sign language. These results show that individuality manifests itself from very early on in human development. Articulation in adults, both in speech and sign, is studied from two perspectives: normal production and re-learning production when the apparatus has been changed. Normal production is studied both in speech and sign and the effects of changed articulation are studied with regards to speech. Both these studies are done by using carrier sentences. Furthermore, sign production is studied giving the informants possibility for spontaneous speech. The production data from the signing informants is also used as the basis for input in the sign synthesis stimuli used in sign perception test battery. Speech and sign perception were studied using the informants’ answers to questions using forced choice in identification and discrimination tasks. These answers were then compared across language modalities. Three different informant groups participated in the sign perception tests: native signers, sign language interpreters and Finnish adults with no knowledge of any signed language. This gave a chance to investigate which of the characteristics found in the results were due to the language per se and which were due to the changes in modality itself. As the analogous test batteries yielded similar results over different informant groups, some common threads of results could be observed. Starting from very early on in acquiring speech and sign the results were highly individual. However, the results were the same within one individual when the same test was repeated. This individuality of results represented along same patterns across different language modalities and - in some occasions - across language groups. As both modalities yield similar answers to analogous study questions, this has lead us to providing methods for basic input for sign language applications, i.e. signing avatars. This has also given us answers to questions on precision of the animation and intelligibility for the users – what are the parameters that govern intelligibility of synthesised speech or sign and how precise must the animation or synthetic speech be in order for it to be intelligible. The results also give additional support to the well-known fact that intelligibility in fact is not the same as naturalness. In some cases, as shown within the sign perception test battery design, naturalness decreases intelligibility. This also has to be taken into consideration when designing applications. All in all, results from each of the test batteries, be they for signers or speakers, yield strikingly similar patterns, which would indicate yet further support for the common core for all human communication. Thus, we can modify and deepen the phonetic framework models for human communication based on the knowledge obtained from the results of the test batteries within this thesis.
                                
Resumo:
The study examines the signalling of text organisation in research articles (RA) in French. The work concentrates on a particular type of organisation provided by text sequences, i.e. structures organising text to items of which at least some are signalled by markers of addition or order: First… 0… The third point… In addition… / Premièrement… 0… Le troisième point… De plus… By indicating the way the text is organised, these structures guide the reader in the reading process so that he doesn’t need to interpret the text structure himself. The aim of the work is to study factors affecting the marking of text sequences. Why is their structure sometimes signalled explicitly by markers such as secondly, whereas in other places such markers are not used? The corpus is manually XML-annotated and consists of 90 RAs (~800 000 words) in French from the fields of linguistics, education and history. The analysis highlights several factors affecting the marking of text sequences. First, exact markers (such as fist ) seem to be more frequent in sequences where all the items are explicitly signalled by a marker, whereas additive markers (such as moreover) are used in sequences with both explicitly signalled and unmarked items. The marking of explicitly signalled sequences seems thus to be precise and even repetitive, whereas the signalling of sequences with unmarked items is altogether more vague. Second, the marking of text sequences seems to depend on the length of the text. The longer the text segment, the more vague the marking. Additive markers and unmarked items are more frequent in longer sequences possibly covering several pages, whereas shorter sequences are often signalled explicitly by exact markers. Also the marker types vary according to the sequence length. Anaphoric expressions, such as first, are fairly close to their referents and are used in short sequences, connectors, such as secondly, are frequently used in sequences of intermediate length, whereas the longest sequences are often signalled by constructions composed of an ordinal and a noun acting as a subject of the sentence: The first item is… Finally, the marking of text organisation depends also on the discipline the RA belongs to. In linguistics, the marking is fairly frequent and precise; exact markers such as second are the most used, and structures with unmarked items are less common. Similarly, the marking is fairly frequent in education. In this field, however, it is also less precise than in linguistics, with frequent unmarked items and additive markers. History, on the other hand, is characterised by less frequent marking. In addition, when used, the marking in this field is also less precise and less explicit.
                                
Resumo:
The general goal of the present work was to study whether spatial perceptual asymmetry initially observed in linguistic dichotic listening studies is related to the linguistic nature of the stimuli and/or is modality-specific, as well as to investigate whether the spatial perceptual/attentional asymmetry changes as a function of age and sensory deficit via praxis. Several dichotic listening studies with linguistic stimuli have shown that the inherent perceptual right ear advantage (REA), which presumably results from the left lateralized linguistic functions (bottom-up processes), can be modified with executive functions (top-down control). Executive functions mature slowly during childhood, are well developed in adulthood, and decline as a function of ageing. In Study I, the purpose was to investigate with a cross-sectional experiment from a lifespan perspective the age-related changes in top-down control of REA for linguistic stimuli in dichotic listening with a forced-attention paradigm (DL). In Study II, the aim was to determine whether the REA is linguistic-stimulus-specific or not, and whether the lifespan changes in perceptual asymmetry observed in dichotic listening would exist also in auditory spatial attention tasks that put load on attentional control. In Study III, using visual spatial attention tasks, mimicking the auditory tasks applied in Study II, it was investigated whether or not the stimulus-non-specific rightward spatial bias found in auditory modality is a multimodal phenomenon. Finally, as it has been suggested that the absence of visual input in blind participants leads to improved auditory spatial perceptual and cognitive skills, the aim in Study IV was to determine, whether blindness modifies the ear advantage in DL. Altogether 180-190 right-handed participants between 5 and 79 years of age were studied in Studies I to III, and in Study IV the performance of 14 blind individuals was compared with that of 129 normally sighted individuals. The results showed that only rightward spatial bias was observed in tasks with intensive attentional load, independent of the type of stimuli (linguistic vs. non-linguistic) or the modality (auditory vs. visual). This multimodal rightward spatial bias probably results from a complex interaction of asymmetrical perceptual, attentional, and/or motor mechanisms. Most importantly, the strength of the rightward spatial bias changed as a function of age and augmented praxis due to sensory deficit. The efficiency of the performance in spatial attention tasks and the ability to overcome the rightward spatial bias increased during childhood, was at its best in young adulthood, and decreased as a function of ageing. Between the ages of 5 and 11 years probably at first develops movement and impulse control, followed by the gradual development of abilities to inhibit distractions and disengage attention. The errors especially in bilateral stimulus conditions suggest that a mild phenomenon resembling extinction can be observed throughout the lifespan, but especially the ability to distribute attention to multiple targets simultaneously decreases in the course of ageing. Blindness enhances the processing of auditory bilateral linguistic stimuli, the ability to overcome a stimulus-driven laterality effect related to speech sound perception, and the ability to direct attention to an appropriate spatial location. It was concluded that the ability to voluntarily suppress and inhibit the multimodal rightward spatial bias changes as a function of age and praxis due to sensory deficit and probably reflects the developmental level of executive functions.
                                
                                
                                
                                
                                
                                
                                
                                
                                
                                
                                
Resumo:
Biomedical research is currently facing a new type of challenge: an excess of information, both in terms of raw data from experiments and in the number of scientific publications describing their results. Mirroring the focus on data mining techniques to address the issues of structured data, there has recently been great interest in the development and application of text mining techniques to make more effective use of the knowledge contained in biomedical scientific publications, accessible only in the form of natural human language. This thesis describes research done in the broader scope of projects aiming to develop methods, tools and techniques for text mining tasks in general and for the biomedical domain in particular. The work described here involves more specifically the goal of extracting information from statements concerning relations of biomedical entities, such as protein-protein interactions. The approach taken is one using full parsing—syntactic analysis of the entire structure of sentences—and machine learning, aiming to develop reliable methods that can further be generalized to apply also to other domains. The five papers at the core of this thesis describe research on a number of distinct but related topics in text mining. In the first of these studies, we assessed the applicability of two popular general English parsers to biomedical text mining and, finding their performance limited, identified several specific challenges to accurate parsing of domain text. In a follow-up study focusing on parsing issues related to specialized domain terminology, we evaluated three lexical adaptation methods. We found that the accurate resolution of unknown words can considerably improve parsing performance and introduced a domain-adapted parser that reduced the error rate of theoriginal by 10% while also roughly halving parsing time. To establish the relative merits of parsers that differ in the applied formalisms and the representation given to their syntactic analyses, we have also developed evaluation methodology, considering different approaches to establishing comparable dependency-based evaluation results. We introduced a methodology for creating highly accurate conversions between different parse representations, demonstrating the feasibility of unification of idiverse syntactic schemes under a shared, application-oriented representation. In addition to allowing formalism-neutral evaluation, we argue that such unification can also increase the value of parsers for domain text mining. As a further step in this direction, we analysed the characteristics of publicly available biomedical corpora annotated for protein-protein interactions and created tools for converting them into a shared form, thus contributing also to the unification of text mining resources. The introduced unified corpora allowed us to perform a task-oriented comparative evaluation of biomedical text mining corpora. This evaluation established clear limits on the comparability of results for text mining methods evaluated on different resources, prompting further efforts toward standardization. To support this and other research, we have also designed and annotated BioInfer, the first domain corpus of its size combining annotation of syntax and biomedical entities with a detailed annotation of their relationships. The corpus represents a major design and development effort of the research group, with manual annotation that identifies over 6000 entities, 2500 relationships and 28,000 syntactic dependencies in 1100 sentences. In addition to combining these key annotations for a single set of sentences, BioInfer was also the first domain resource to introduce a representation of entity relations that is supported by ontologies and able to capture complex, structured relationships. Part I of this thesis presents a summary of this research in the broader context of a text mining system, and Part II contains reprints of the five included publications.
 
                    