Biblioteca Digital

This paper describes recent improvements to the Cambridge Arabic Large Vocabulary Continuous Speech Recognition (LVCSR) Speech-to-Text (STT) system. It is shown that wordboundary context markers provide a powerful method to enhance graphemic systems by implicit phonetic information, improving the modelling capability of graphemic systems. In addition, a robust technique for full covariance Gaussian modelling in the Minimum Phone Error (MPE) training framework is introduced. This reduces the full covariance training to a diagonal covariance training problem, thereby solving related robustness problems. The full system results show that the combined use of these and other techniques within a multi-branch combination framework reduces the Word Error Rate (WER) of the complete system by up to 5.9% relative. Copyright © 2011 ISCA.

Veja mais

Design Dimensions of Intelligent Text Entry Tutors

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Automatic Selection of Recognition Errors by Respeaking the Intended Text

Relevância:

20.00% 20.00%

Publicador:

Veja mais

The role of artificial intelligence as 'text' within design

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The paper describes a new approach to artificial intelligence (AI) and its role in design. This approach argues that AI can be seen as 'text', or in other words as a medium for the communication of design knowledge and information between designers. This paper will apply these ideas to reinterpreting an existing knowledge-based system (KBS) design tool, that is, CADET - a product design evaluation tool. The paper will discuss the authorial issues, amongst others, involved in the development of AI and KBS design tools by adopting this new approach. Consequently, the designers' rights and responsibilities will be better understood as the knowledge medium, through its concern with authorship, returns control to users rather than attributing the system with agent status. © 1998 Elsevier Science Ltd. All rights reserved.

Veja mais

Word Boundary Modelling and Full Covariance Gaussians for Arabic Speech-to-Text Systems

Relevância:

20.00% 20.00%

Publicador:

Veja mais

An expressive text-driven 3D talking head

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Creating a realistic talking head, which given an arbitrary text as input generates a realistic looking face speaking the text, has been a long standing research challenge. Talking heads which cannot express emotion have been made to look very realistic by using concatenative approaches [Wang et al. 2011], however allowing the head to express emotion creates a much more challenging problem and model based approaches have shown promise in this area. While 2D talking heads currently look more realistic than their 3D counterparts, they are limited both in the range of poses they can express and in the lighting conditions that they can be rendered under. Previous attempts to produce videorealistic 3D expressive talking heads [Cao et al. 2005] have produced encouraging results but not yet achieved the level of realism of their 2D counterparts.

Veja mais

Expressive visual text-to-speech using active appearance models

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a complete system for expressive visual text-to-speech (VTTS), which is capable of producing expressive output, in the form of a 'talking head', given an input text and a set of continuous expression weights. The face is modeled using an active appearance model (AAM), and several extensions are proposed which make it more applicable to the task of VTTS. The model allows for normalization with respect to both pose and blink state which significantly reduces artifacts in the resulting synthesized sequences. We demonstrate quantitative improvements in terms of reconstruction error over a million frames, as well as in large-scale user studies, comparing the output of different systems. © 2013 IEEE.

Veja mais

INTEGRATED AUTOMATIC EXPRESSION PREDICTION AND SPEECH SYNTHESIS FROM TEXT

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Integrated Expression Prediction and Speech Synthesis From Text

Relevância:

20.00% 20.00%

Publicador:

Veja mais

An overview of the ILSP unit selection text-to-speech synthesis system

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents an overview of the Text-to-Speech synthesis system developed at the Institute for Language and Speech Processing (ILSP). It focuses on the key issues regarding the design of the system components. The system currently fully supports three languages (Greek, English, Bulgarian) and is designed in such a way to be as language and speaker independent as possible. Also, experimental results are presented which show that the system produces high quality synthetic speech in terms of naturalness and intelligibility. The system was recently ranked among the first three systems worldwide in terms of achieved quality for the English language, at the international Blizzard Challenge 2013 workshop. © 2014 Springer International Publishing.

Veja mais

28 resultados para Text edition

Filtro por publicador