Biblioteca Digital

Pronunciation is an important part of speech acquisition, but little attention has been given to the mechanism or mechanisms by which it develops. Speech sound qualities, for example, have just been assumed to develop by simple imitation. In most accounts this is then assumed to be by acoustic matching, with the infant comparing his output to that of his caregiver. There are theoretical and empirical problems with both of these assumptions, and we present a computational model- Elija-that does not learn to pronounce speech sounds this way. Elija starts by exploring the sound making capabilities of his vocal apparatus. Then he uses the natural responses he gets from a caregiver to learn equivalence relations between his vocal actions and his caregiver's speech. We show that Elija progresses from a babbling stage to learning the names of objects. This demonstrates the viability of a non-imitative mechanism in learning to pronounce.

Veja mais

A statistical, formant-pattern model for segregating vowel type and vocal-tract length in developmental formant data

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Mixed source model and its adapted vocal tract filter estimate for voice transformation and synthesis

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In current methods for voice transformation and speech synthesis, the vocal tract filter is usually assumed to be excited by a flat amplitude spectrum. In this article, we present a method using a mixed source model defined as a mixture of the Liljencrants-Fant (LF) model and Gaussian noise. Using the LF model, the base approach used in this presented work is therefore close to a vocoder using exogenous input like ARX-based methods or the Glottal Spectral Separation (GSS) method. Such approaches are therefore dedicated to voice processing promising an improved naturalness compared to generic signal models. To estimate the Vocal Tract Filter (VTF), using spectral division like in GSS, we show that a glottal source model can be used with any envelope estimation method conversely to ARX approach where a least square AR solution is used. We therefore derive a VTF estimate which takes into account the amplitude spectra of both deterministic and random components of the glottal source. The proposed mixed source model is controlled by a small set of intuitive and independent parameters. The relevance of this voice production model is evaluated, through listening tests, in the context of resynthesis, HMM-based speech synthesis, breathiness modification and pitch transposition. © 2012 Elsevier B.V. All rights reserved.

Veja mais

Automatic transcription of conversational telephone speech

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper discusses the Cambridge University HTK (CU-HTK) system for the automatic transcription of conversational telephone speech. A detailed discussion of the most important techniques in front-end processing, acoustic modeling and model training, language and pronunciation modeling are presented. These include the use of conversation side based cepstral normalization, vocal tract length normalization, heteroscedastic linear discriminant analysis for feature projection, minimum phone error training and speaker adaptive training, lattice-based model adaptation, confusion network based decoding and confidence score estimation, pronunciation selection, language model interpolation, and class based language models. The transcription system developed for participation in the 2002 NIST Rich Transcription evaluations of English conversational telephone speech data is presented in detail. In this evaluation the CU-HTK system gave an overall word error rate of 23.9%, which was the best performance by a statistically significant margin. Further details on the derivation of faster systems with moderate performance degradation are discussed in the context of the 2002 CU-HTK 10 × RT conversational speech transcription system. © 2005 IEEE.

Veja mais

Biomimetic layer-by-layer assembly of artificial nacre.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Nacre is a technologically remarkable organic-inorganic composite biomaterial. It consists of an ordered multilayer structure of crystalline calcium carbonate platelets separated by porous organic layers. This microstructure exhibits both optical iridescence and mechanical toughness, which transcend those of its constituent components. Replication of nacre is essential for understanding this complex biomineral, and paves the way for tough coatings fabricated from cheap abundant materials. Fabricating a calcitic nacre imitation with biologically similar optical and mechanical properties will likely require following all steps taken in biogenic nacre synthesis. Here we present a route to artificial nacre that mimics the natural layer-by-layer approach to fabricate a hierarchical crystalline multilayer material. Its structure-function relationship was confirmed by nacre-like mechanical properties and striking optical iridescence. Our biomimetic route uses the interplay of polymer-mediated mineral growth, combined with layer-by-layer deposition of porous organic films. This is the first successful attempt to replicate nacre, using CaCO(3).

Veja mais

Vowel normalisation: Time-domain processing of the internal dynamics of speech

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Human listeners can identify vowels regardless of speaker size, although the sound waves for an adult and a child speaking the ’same’ vowel would differ enormously. The differences are mainly due to the differences in vocal tract length (VTL) and glottal pulse rate (GPR) which are both related to body size. Automatic speech recognition machines are notoriously bad at understanding children if they have been trained on the speech of an adult. In this paper, we propose that the auditory system adapts its analysis of speech sounds, dynamically and automatically to the GPR and VTL of the speaker on a syllable-to-syllable basis. We illustrate how this rapid adaptation might be performed with the aid of a computational version of the auditory image model, and we propose that an auditory preprocessor of this form would improve the robustness of speech recognisers.

Veja mais

Anti-counterfeiting and supply chain security

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Counterfeit trade developed into a severe problem for many industries. While established security features such as holograms, micro printings or chemical markers do not seem to efficiently avert trade in illicit imitation products, RFID technology, with its potential to automate product authentications, may become a powerful tool to enhance brand and product protection. The following contribution contains an overview on the implication of product counterfeiting on affected companies, provides a starting point for a structured requirements definition for RFID-based anti-counterfeiting systems, and outlines several principal solution approaches that are discussed in greater detail in the subsequent chapters. © 2008 Springer-Verlag Berlin Heidelberg.

Veja mais

12 resultados para Vocal imitation

em Cambridge University Engineering Department Publications Database

Filtro por publicador