Biblioteca Digital

The purpose of this supplemental project was to collect invaluable data from the large-scale construction sites of Egnatia Odos motorway needed to validate a novel automated vision-tracking method created under the parent grant. For this purpose, one US graduate and three US undergraduate students traveled to Greece for 4 months and worked together with 2 Greek graduate students of the local faculty collaborator. This team of students monitored project activities and scheduled data collection trips on a daily basis, setup a mobile video data collection lab on the back of a truck, and drove to various sites every day to collect hundreds of hours of video from multiple cameras on a large variety of activities ranging from soil excavation to bridge construction. The US students were underrepresented students from minority groups who had never visited a foreign country. As a result, this trip was a major life experience to them. They learned how to live in a non-English speaking country, communicate with Greek students, workers and engineers. They lead a project in a very unfamiliar environment, troubleshoot myriad problems that hampered their progress daily and, above all, how to collaborate effectively and efficiently with other cultures. They returned to the US more mature, with improved leadership and problem-solving skills and a wider perspective of their profession.

Veja mais

Vowel normalisation: Time-domain processing of the internal dynamics of speech

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Human listeners can identify vowels regardless of speaker size, although the sound waves for an adult and a child speaking the ’same’ vowel would differ enormously. The differences are mainly due to the differences in vocal tract length (VTL) and glottal pulse rate (GPR) which are both related to body size. Automatic speech recognition machines are notoriously bad at understanding children if they have been trained on the speech of an adult. In this paper, we propose that the auditory system adapts its analysis of speech sounds, dynamically and automatically to the GPR and VTL of the speaker on a syllable-to-syllable basis. We illustrate how this rapid adaptation might be performed with the aid of a computational version of the auditory image model, and we propose that an auditory preprocessor of this form would improve the robustness of speech recognisers.

Veja mais

An expressive text-driven 3D talking head

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Creating a realistic talking head, which given an arbitrary text as input generates a realistic looking face speaking the text, has been a long standing research challenge. Talking heads which cannot express emotion have been made to look very realistic by using concatenative approaches [Wang et al. 2011], however allowing the head to express emotion creates a much more challenging problem and model based approaches have shown promise in this area. While 2D talking heads currently look more realistic than their 3D counterparts, they are limited both in the range of poses they can express and in the lighting conditions that they can be rendered under. Previous attempts to produce videorealistic 3D expressive talking heads [Cao et al. 2005] have produced encouraging results but not yet achieved the level of realism of their 2D counterparts.

Veja mais

Continuous asr for flexible incremental dialogue

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Spoken dialogue systems provide a convenient way for users to interact with a machine using only speech. However, they often rely on a rigid turn taking regime in which a voice activity detection (VAD) module is used to determine when the user is speaking and decide when is an appropriate time for the system to respond. This paper investigates replacing the VAD and discrete utterance recogniser of a conventional turn-taking system with a continuously operating recogniser that is always listening, and using the recogniser 1-best path to guide turn taking. In this way, a flexible framework for incremental dialogue management is possible. Experimental results show that it is possible to remove the VAD component and successfully use the recogniser best path to identify user speech, with more robustness to noise, potentially smaller latency times, and a reduction in overall recognition error rate compared to using the conventional approach. © 2013 IEEE.

Veja mais

Embodied artificial intelligence: Trends and challenges

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The field of Artificial Intelligence, which started roughly half a century ago, has a turbulent history. In the 1980s there has been a major paradigm shift towards embodiment. While embodied artificial intelligence is still highly diverse, changing, and far from "theoretically stable", a certain consensus about the important issues and methods has been achieved or is rapidly emerging. In this non-technical paper we briefly characterize the field, summarize its achievements, and identify important issues for future research. One of the fundamental unresolved problems has been and still is how thinking emerges from an embodied system. Provocatively speaking, the central issue could be captured by the question "How does walking relate to thinking?" © Springer-Verlag Berlin Heidelberg 2004.

Veja mais

9 resultados para Speaking

em Cambridge University Engineering Department Publications Database

Filtro por publicador