Biblioteca Digital

Spoken content in languages of emerging importance needs to be searchable to provide access to the underlying information. In this paper, we investigate the problem of extending data fusion methodologies from Information Retrieval for Spoken Term Detection on low-resource languages in the framework of the IARPA Babel program. We describe a number of alternative methods improving keyword search performance. We apply these methods to Cantonese, a language that presents some new issues in terms of reduced resources and shorter query lengths. First, we show score normalization methodology that improves in average by 20% keyword search performance. Second, we show that properly combining the outputs of diverse ASR systems performs 14% better than the best normalized ASR system. © 2013 IEEE.

Veja mais

Automatic transcription of conversational telephone speech

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper discusses the Cambridge University HTK (CU-HTK) system for the automatic transcription of conversational telephone speech. A detailed discussion of the most important techniques in front-end processing, acoustic modeling and model training, language and pronunciation modeling are presented. These include the use of conversation side based cepstral normalization, vocal tract length normalization, heteroscedastic linear discriminant analysis for feature projection, minimum phone error training and speaker adaptive training, lattice-based model adaptation, confusion network based decoding and confidence score estimation, pronunciation selection, language model interpolation, and class based language models. The transcription system developed for participation in the 2002 NIST Rich Transcription evaluations of English conversational telephone speech data is presented in detail. In this evaluation the CU-HTK system gave an overall word error rate of 23.9%, which was the best performance by a statistically significant margin. Further details on the derivation of faster systems with moderate performance degradation are discussed in the context of the 2002 CU-HTK 10 × RT conversational speech transcription system. © 2005 IEEE.

Veja mais

A backward-simulation based Rao-Blackwellized particle smoother for conditionally linear Gaussian models

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this article, we develop a new Rao-Blackwellized Monte Carlo smoothing algorithm for conditionally linear Gaussian models. The algorithm is based on the forward-filtering backward-simulation Monte Carlo smoother concept and performs the backward simulation directly in the marginal space of the non-Gaussian state component while treating the linear part analytically. Unlike the previously proposed backward-simulation based Rao-Blackwellized smoothing approaches, it does not require sampling of the Gaussian state component and is also able to overcome certain normalization problems of two-filter smoother based approaches. The performance of the algorithm is illustrated in a simulated application. © 2012 IFAC.

Veja mais

A high-resolution atlas and statistical model of the human heart from multislice CT

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Atlases and statistical models play important roles in the personalization and simulation of cardiac physiology. For the study of the heart, however, the construction of comprehensive atlases and spatio-temporal models is faced with a number of challenges, in particular the need to handle large and highly variable image datasets, the multi-region nature of the heart, and the presence of complex as well as small cardiovascular structures. In this paper, we present a detailed atlas and spatio-temporal statistical model of the human heart based on a large population of 3D+time multi-slice computed tomography sequences, and the framework for its construction. It uses spatial normalization based on nonrigid image registration to synthesize a population mean image and establish the spatial relationships between the mean and the subjects in the population. Temporal image registration is then applied to resolve each subject-specific cardiac motion and the resulting transformations are used to warp a surface mesh representation of the atlas to fit the images of the remaining cardiac phases in each subject. Subsequently, we demonstrate the construction of a spatio-temporal statistical model of shape such that the inter-subject and dynamic sources of variation are suitably separated. The framework is applied to a 3D+time data set of 138 subjects. The data is drawn from a variety of pathologies, which benefits its generalization to new subjects and physiological studies. The obtained level of detail and the extendability of the atlas present an advantage over most cardiac models published previously. © 1982-2012 IEEE.

Veja mais

Expressive visual text-to-speech using active appearance models

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a complete system for expressive visual text-to-speech (VTTS), which is capable of producing expressive output, in the form of a 'talking head', given an input text and a set of continuous expression weights. The face is modeled using an active appearance model (AAM), and several extensions are proposed which make it more applicable to the task of VTTS. The model allows for normalization with respect to both pose and blink state which significantly reduces artifacts in the resulting synthesized sequences. We demonstrate quantitative improvements in terms of reconstruction error over a million frames, as well as in large-scale user studies, comparing the output of different systems. © 2013 IEEE.

Veja mais

13 resultados para NORMALIZATION

em Cambridge University Engineering Department Publications Database

Filtro por publicador