Biblioteca Digital

336 resultados para Literature speech

Engineering change: A review of the literature and an examination of the key topics

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Syllable language models for Mandarin speech recognition: exploiting character language models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mandarin Chinese is based on characters which are syllabic in nature and morphological in meaning. All spoken languages have syllabiotactic rules which govern the construction of syllables and their allowed sequences. These constraints are not as restrictive as those learned from word sequences, but they can provide additional useful linguistic information. Hence, it is possible to improve speech recognition performance by appropriately combining these two types of constraints. For the Chinese language considered in this paper, character level language models (LMs) can be used as a first level approximation to allowed syllable sequences. To test this idea, word and character level n-gram LMs were trained on 2.8 billion words (equivalent to 4.3 billion characters) of texts from a wide collection of text sources. Both hypothesis and model based combination techniques were investigated to combine word and character level LMs. Significant character error rate reductions up to 7.3% relative were obtained on a state-of-the-art Mandarin Chinese broadcast audio recognition task using an adapted history dependent multi-level LM that performs a log-linearly combination of character and word level LMs. This supports the hypothesis that character or syllable sequence models are useful for improving Mandarin speech recognition performance.

Veja mais

Importance sampling to compute likelihoods of noise-corrupted speech

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Knowledge translation in healthcare: incorporating theories of learning and knowledge from the management literature

Relevância:

20.00% 20.00%

Publicador:

Veja mais

COMPLEX CEPSTRUM AS PHASE INFORMATION IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Vowel normalisation: Time-domain processing of the internal dynamics of speech

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Human listeners can identify vowels regardless of speaker size, although the sound waves for an adult and a child speaking the ’same’ vowel would differ enormously. The differences are mainly due to the differences in vocal tract length (VTL) and glottal pulse rate (GPR) which are both related to body size. Automatic speech recognition machines are notoriously bad at understanding children if they have been trained on the speech of an adult. In this paper, we propose that the auditory system adapts its analysis of speech sounds, dynamically and automatically to the GPR and VTL of the speaker on a syllable-to-syllable basis. We illustrate how this rapid adaptation might be performed with the aid of a computational version of the auditory image model, and we propose that an auditory preprocessor of this form would improve the robustness of speech recognisers.

Veja mais

Autoregressive Models for Statistical Parametric Speech Synthesis

Relevância:

20.00% 20.00%

Publicador:

Veja mais

State estimation schemes for independent component coupled hidden Markov models

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Conventional Hidden Markov models generally consist of a Markov chain observed through a linear map corrupted by additive noise. This general class of model has enjoyed a huge and diverse range of applications, for example, speech processing, biomedical signal processing and more recently quantitative finance. However, a lesser known extension of this general class of model is the so-called Factorial Hidden Markov Model (FHMM). FHMMs also have diverse applications, notably in machine learning, artificial intelligence and speech recognition [13, 17]. FHMMs extend the usual class of HMMs, by supposing the partially observed state process is a finite collection of distinct Markov chains, either statistically independent or dependent. There is also considerable current activity in applying collections of partially observed Markov chains to complex action recognition problems, see, for example, [6]. In this article we consider the Maximum Likelihood (ML) parameter estimation problem for FHMMs. Much of the extant literature concerning this problem presents parameter estimation schemes based on full data log-likelihood EM algorithms. This approach can be slow to converge and often imposes heavy demands on computer memory. The latter point is particularly relevant for the class of FHMMs where state space dimensions are relatively large. The contribution in this article is to develop new recursive formulae for a filter-based EM algorithm that can be implemented online. Our new formulae are equivalent ML estimators, however, these formulae are purely recursive and so, significantly reduce numerical complexity and memory requirements. A computer simulation is included to demonstrate the performance of our results. © Taylor & Francis Group, LLC.

Veja mais

Structured SVMs for Automatic Speech Recognition

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Word Boundary Modelling and Full Covariance Gaussians for Arabic Speech-to-Text Systems

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Complex cepstrum for statistical parametric speech synthesis

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Structured Support Vector Machines for Noise Robust Continuous Speech Recognition

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Speech factorization for HMM-TTS based on cluster adaptive training.

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Expressive visual text-to-speech using active appearance models

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a complete system for expressive visual text-to-speech (VTTS), which is capable of producing expressive output, in the form of a 'talking head', given an input text and a set of continuous expression weights. The face is modeled using an active appearance model (AAM), and several extensions are proposed which make it more applicable to the task of VTTS. The model allows for normalization with respect to both pose and blink state which significantly reduces artifacts in the resulting synthesized sequences. We demonstrate quantitative improvements in terms of reconstruction error over a million frames, as well as in large-scale user studies, comparing the output of different systems. © 2013 IEEE.

Veja mais

A Holistic Categorization Framework for Literature on Engineering Change Management

Relevância:

20.00% 20.00%

Publicador:

Veja mais

336 resultados para Literature speech

Filtro por publicador