91 resultados para third language
Resumo:
This paper investigates a method of automatic pronunciation scoring for use in computer-assisted language learning (CALL) systems. The method utilizes a likelihood-based `Goodness of Pronunciation' (GOP) measure which is extended to include individual thresholds for each phone based on both averaged native confidence scores and on rejection statistics provided by human judges. Further improvements are obtained by incorporating models of the subject's native language and by augmenting the recognition networks to include expected pronunciation errors. The various GOP measures are assessed using a specially recorded database of non-native speakers which has been annotated to mark phone-level pronunciation errors. Since pronunciation assessment is highly subjective, a set of four performance measures has been designed, each of them measuring different aspects of how well computer-derived phone-level scores agree with human scores. These performance measures are used to cross-validate the reference annotations and to assess the basic GOP algorithm and its refinements. The experimental results suggest that a likelihood-based pronunciation scoring metric can achieve usable performance, especially after applying the various enhancements.
Resumo:
Human locomotion is known to be influenced by observation of another person's gait. For example, athletes often synchronize their step in long distance races. However, how interaction with a virtual runner affects the gait of a real runner has not been studied. We investigated this by creating an illusion of running behind a virtual model (VM) using a treadmill and large screen virtual environment showing a video of a VM. We looked at step synchronization between the real and virtual runner and at the role of the step frequency (SF) in the real runner's perception of VM speed. We found that subjects match VM SF when asked to match VM speed with their own (Figure 1). This indicates step synchronization may be a strategy of speed matching or speed perception. Subjects chose higher speeds when VMSF was higher (though VM was 12km/h in all videos). This effect was more pronounced when the speed estimate was rated verbally while standing still. (Figure 2). This may due to correlated physical activity affecting the perception of VM speed [Jacobs et al. 2005]; or step synchronization altering the subjects' perception of self speed [Durgin et al. 2007]. Our findings indicate that third person activity in a collaborative virtual locomotive environment can have a pronounced effect on an observer's gait activity and their perceptual judgments of the activity of others: the SF of others (virtual or real) can potentially influence one's perception of self speed and lead to changes in speed and SF. A better understanding of the underlying mechanisms would support the design of more compelling virtual trainers and may be instructive for competitive athletics in the real world. © 2009 ACM.
Resumo:
Recent research into the acquisition of spoken language has stressed the importance of learning through embodied linguistic interaction with caregivers rather than through passive observation. However the necessity of interaction makes experimental work into the simulation of infant speech acquisition difficult because of the technical complexity of building real-time embodied systems. In this paper we present KLAIR: a software toolkit for building simulations of spoken language acquisition through interactions with a virtual infant. The main part of KLAIR is a sensori-motor server that supplies a client machine learning application with a virtual infant on screen that can see, hear and speak. By encapsulating the real-time complexities of audio and video processing within a server that will run on a modern PC, we hope that KLAIR will encourage and facilitate more experimental research into spoken language acquisition through interaction. Copyright © 2009 ISCA.
Resumo:
This paper investigates several approaches to bootstrapping a new spoken language understanding (SLU) component in a target language given a large dataset of semantically-annotated utterances in some other source language. The aim is to reduce the cost associated with porting a spoken dialogue system from one language to another by minimising the amount of data required in the target language. Since word-level semantic annotations are costly, Semantic Tuple Classifiers (STCs) are used in conjunction with statistical machine translation models both of which are trained from unaligned data to further reduce development time. The paper presents experiments in which a French SLU component in the tourist information domain is bootstrapped from English data. Results show that training STCs on automatically translated data produced the best performance for predicting the utterance's dialogue act type, however individual slot/value pairs are best predicted by training STCs on the source language and using them to decode translated utterances. © 2010 ISCA.