7 resultados para Vocal nodules
em Cambridge University Engineering Department Publications Database
Resumo:
In current methods for voice transformation and speech synthesis, the vocal tract filter is usually assumed to be excited by a flat amplitude spectrum. In this article, we present a method using a mixed source model defined as a mixture of the Liljencrants-Fant (LF) model and Gaussian noise. Using the LF model, the base approach used in this presented work is therefore close to a vocoder using exogenous input like ARX-based methods or the Glottal Spectral Separation (GSS) method. Such approaches are therefore dedicated to voice processing promising an improved naturalness compared to generic signal models. To estimate the Vocal Tract Filter (VTF), using spectral division like in GSS, we show that a glottal source model can be used with any envelope estimation method conversely to ARX approach where a least square AR solution is used. We therefore derive a VTF estimate which takes into account the amplitude spectra of both deterministic and random components of the glottal source. The proposed mixed source model is controlled by a small set of intuitive and independent parameters. The relevance of this voice production model is evaluated, through listening tests, in the context of resynthesis, HMM-based speech synthesis, breathiness modification and pitch transposition. © 2012 Elsevier B.V. All rights reserved.
Resumo:
This paper discusses the Cambridge University HTK (CU-HTK) system for the automatic transcription of conversational telephone speech. A detailed discussion of the most important techniques in front-end processing, acoustic modeling and model training, language and pronunciation modeling are presented. These include the use of conversation side based cepstral normalization, vocal tract length normalization, heteroscedastic linear discriminant analysis for feature projection, minimum phone error training and speaker adaptive training, lattice-based model adaptation, confusion network based decoding and confidence score estimation, pronunciation selection, language model interpolation, and class based language models. The transcription system developed for participation in the 2002 NIST Rich Transcription evaluations of English conversational telephone speech data is presented in detail. In this evaluation the CU-HTK system gave an overall word error rate of 23.9%, which was the best performance by a statistically significant margin. Further details on the derivation of faster systems with moderate performance degradation are discussed in the context of the 2002 CU-HTK 10 × RT conversational speech transcription system. © 2005 IEEE.
Resumo:
Pronunciation is an important part of speech acquisition, but little attention has been given to the mechanism or mechanisms by which it develops. Speech sound qualities, for example, have just been assumed to develop by simple imitation. In most accounts this is then assumed to be by acoustic matching, with the infant comparing his output to that of his caregiver. There are theoretical and empirical problems with both of these assumptions, and we present a computational model- Elija-that does not learn to pronounce speech sounds this way. Elija starts by exploring the sound making capabilities of his vocal apparatus. Then he uses the natural responses he gets from a caregiver to learn equivalence relations between his vocal actions and his caregiver's speech. We show that Elija progresses from a babbling stage to learning the names of objects. This demonstrates the viability of a non-imitative mechanism in learning to pronounce.
Resumo:
The use of a porous coating on prosthetic components to encourage bone ingrowth is an important way of improving uncemented implant fixation. Enhanced fixation may be achieved by the use of porous magneto-active layers on the surface of prosthetic implants, which would deform elastically on application of a magnetic field, generating internal stresses within the in-growing bone. This approach requires a ferromagnetic material able to support osteoblast attachment, proliferation, differentiation, and mineralization. In this study, the human osteoblast responses to ferromagnetic 444 stainless steel networks were considered alongside those to nonmagnetic 316L (medical grade) stainless steel networks. While both networks had similar porosities, 444 networks were made from coarser fibers, resulting in larger inter-fiber spaces. The networks were analyzed for cell morphology, distribution, proliferation, and differentiation, extracellular matrix production and the formation of mineralized nodules. Cell culture was performed in both the presence of osteogenic supplements, to encourage cell differentiation, and in their absence. It was found that fiber size affected osteoblast morphology, cytoskeleton organization and proliferation at the early stages of culture. The larger inter-fiber spaces in the 444 networks resulted in better spatial distribution of the extracellular matrix. The addition of osteogenic supplements enhanced cell differentiation and reduced cell proliferation thereby preventing the differences in proliferation observed in the absence of osteogenic supplements. The results demonstrated that 444 networks elicited favorable responses from human osteoblasts, and thus show potential for use as magnetically active porous coatings for advanced bone implant applications. © 2012 Wiley Periodicals, Inc.
Resumo:
Human listeners can identify vowels regardless of speaker size, although the sound waves for an adult and a child speaking the ’same’ vowel would differ enormously. The differences are mainly due to the differences in vocal tract length (VTL) and glottal pulse rate (GPR) which are both related to body size. Automatic speech recognition machines are notoriously bad at understanding children if they have been trained on the speech of an adult. In this paper, we propose that the auditory system adapts its analysis of speech sounds, dynamically and automatically to the GPR and VTL of the speaker on a syllable-to-syllable basis. We illustrate how this rapid adaptation might be performed with the aid of a computational version of the auditory image model, and we propose that an auditory preprocessor of this form would improve the robustness of speech recognisers.