Biblioteca Digital

17 resultados para FALSE VOCAL CORDS

em Cambridge University Engineering Department Publications Database

An investigation into vocal tract length normalisation

Relevância:

20.00% 20.00%

Publicador:

Veja mais

A statistical, formant-pattern model for segregating vowel type and vocal-tract length in developmental formant data

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Mixed source model and its adapted vocal tract filter estimate for voice transformation and synthesis

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In current methods for voice transformation and speech synthesis, the vocal tract filter is usually assumed to be excited by a flat amplitude spectrum. In this article, we present a method using a mixed source model defined as a mixture of the Liljencrants-Fant (LF) model and Gaussian noise. Using the LF model, the base approach used in this presented work is therefore close to a vocoder using exogenous input like ARX-based methods or the Glottal Spectral Separation (GSS) method. Such approaches are therefore dedicated to voice processing promising an improved naturalness compared to generic signal models. To estimate the Vocal Tract Filter (VTF), using spectral division like in GSS, we show that a glottal source model can be used with any envelope estimation method conversely to ARX approach where a least square AR solution is used. We therefore derive a VTF estimate which takes into account the amplitude spectra of both deterministic and random components of the glottal source. The proposed mixed source model is controlled by a small set of intuitive and independent parameters. The relevance of this voice production model is evaluated, through listening tests, in the context of resynthesis, HMM-based speech synthesis, breathiness modification and pitch transposition. © 2012 Elsevier B.V. All rights reserved.

Veja mais

Image-based localization and pose recovery using scale invariant features

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we propose a vision based mobile robot localization strategy. Local scale-invariant features are used as natural landmarks in unstructured and unmodified environment. The local characteristics of the features we use prove to be robust to occlusion and outliers. In addition, the invariance of the features to viewpoint change makes them suitable landmarks for mobile robot localization. Scale-invariant features detected in the first exploration are indexed into a location database. Indexing and voting allow efficient recognition of global localization. The localization result is verified by epipolar geometry between the representative view in database and the view to be localized, thus the probability of false localization will be decreased. The localization system can recover the pose of the camera mounted on the robot by essential matrix decomposition. Then the position of the robot can be computed easily. Both calibrated and un-calibrated cases are discussed and relative position estimation based on calibrated camera turns out to be the better choice. Experimental results show that our approach is effective and reliable in the case of illumination changes, similarity transformations and extraneous features. © 2004 IEEE.

Veja mais

Automatic transcription of conversational telephone speech

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper discusses the Cambridge University HTK (CU-HTK) system for the automatic transcription of conversational telephone speech. A detailed discussion of the most important techniques in front-end processing, acoustic modeling and model training, language and pronunciation modeling are presented. These include the use of conversation side based cepstral normalization, vocal tract length normalization, heteroscedastic linear discriminant analysis for feature projection, minimum phone error training and speaker adaptive training, lattice-based model adaptation, confusion network based decoding and confidence score estimation, pronunciation selection, language model interpolation, and class based language models. The transcription system developed for participation in the 2002 NIST Rich Transcription evaluations of English conversational telephone speech data is presented in detail. In this evaluation the CU-HTK system gave an overall word error rate of 23.9%, which was the best performance by a statistically significant margin. Further details on the derivation of faster systems with moderate performance degradation are discussed in the context of the 2002 CU-HTK 10 × RT conversational speech transcription system. © 2005 IEEE.

Veja mais

Probing biomolecular interaction forces using an anharmonic acoustic technique for selective detection of bacterial spores.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Receptor-based detection of pathogens often suffers from non-specific interactions, and as most detection techniques cannot distinguish between affinities of interactions, false positive responses remain a plaguing reality. Here, we report an anharmonic acoustic based method of detection that addresses the inherent weakness of current ligand dependant assays. Spores of Bacillus subtilis (Bacillus anthracis simulant) were immobilized on a thickness-shear mode AT-cut quartz crystal functionalized with anti-spore antibody and the sensor was driven by a pure sinusoidal oscillation at increasing amplitude. Biomolecular interaction forces between the coupled spores and the accelerating surface caused a nonlinear modulation of the acoustic response of the crystal. In particular, the deviation in the third harmonic of the transduced electrical response versus oscillation amplitude of the sensor (signal) was found to be significant. Signals from the specifically-bound spores were clearly distinguishable in shape from those of the physisorbed streptavidin-coated polystyrene microbeads. The analytical model presented here enables estimation of the biomolecular interaction forces from the measured response. Thus, probing biomolecular interaction forces using the described technique can quantitatively detect pathogens and distinguish specific from non-specific interactions, with potential applicability to rapid point-of-care detection. This also serves as a potential tool for rapid force-spectroscopy, affinity-based biomolecular screening and mapping of molecular interaction networks.

Veja mais

Modeling the development of pronunciation in infant speech acquisition.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Pronunciation is an important part of speech acquisition, but little attention has been given to the mechanism or mechanisms by which it develops. Speech sound qualities, for example, have just been assumed to develop by simple imitation. In most accounts this is then assumed to be by acoustic matching, with the infant comparing his output to that of his caregiver. There are theoretical and empirical problems with both of these assumptions, and we present a computational model- Elija-that does not learn to pronounce speech sounds this way. Elija starts by exploring the sound making capabilities of his vocal apparatus. Then he uses the natural responses he gets from a caregiver to learn equivalence relations between his vocal actions and his caregiver's speech. We show that Elija progresses from a babbling stage to learning the names of objects. This demonstrates the viability of a non-imitative mechanism in learning to pronounce.

Veja mais

Continuous F0 in the source-excitation generation for HMM-based TTS: Do we need voiced/unvoiced classification?

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Most HMM-based TTS systems use a hard voiced/unvoiced classification to produce a discontinuous F0 signal which is used for the generation of the source-excitation. When a mixed source excitation is used, this decision can be based on two different sources of information: the state-specific MSD-prior of the F0 models, and/or the frame-specific features generated by the aperiodicity model. This paper examines the meaning of these variables in the synthesis process, their interaction, and how they affect the perceived quality of the generated speech The results of several perceptual experiments show that when using mixed excitation, subjects consistently prefer samples with very few or no false unvoiced errors, whereas a reduction in the rate of false voiced errors does not produce any perceptual improvement. This suggests that rather than using any form of hard voiced/unvoiced classification, e.g., the MSD-prior, it is better for synthesis to use a continuous F0 signal and rely on the frame-level soft voiced/unvoiced decision of the aperiodicity model. © 2011 IEEE.

Veja mais

Probing biomolecular interaction forces using an anharmonic acoustic technique for selective detection of bacterial spores

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Veja mais

Dual-mode thin film bulk acoustic wave resonators for parallel sensing of temperature and mass loading.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Thin film bulk acoustic wave resonator (FBAR) devices supporting simultaneously multiple resonance modes have been designed for gravimetric sensing. The mechanism for dual-mode generation within a single device has been discussed, and theoretical calculations based on finite element analysis allowed the fabrication of FBARs whose resonance modes have opposite reactions to temperature changes; one of the modes exhibiting a positive frequency shift for a rise of temperature whilst the other mode exhibits a negative shift. Both modes exhibit negative frequency shift for a mass load and hence by monitoring simultaneously both modes it is possible to distinguish whether a change in the resonance frequency is due to a mass load or temperature variation (or a combination of both), avoiding false positive/negative responses in gravimetric sensing without the need of additional reference devices or complex electronics.

Veja mais

Dual-mode thin film bulk acoustic wave resonators for parallel sensing of temperature and mass loading

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Veja mais

Enhancement of Construction Equipment Detection in Video Frames by Combining with Tracking

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Vision-based object detection has been introduced in construction for recognizing and locating construction entities in on-site camera views. It can provide spatial locations of a large number of entities, which is beneficial in large-scale, congested construction sites. However, even a few false detections prevent its practical applications. In resolving this issue, this paper presents a novel hybrid method for locating construction equipment that fuses the function of detection and tracking algorithms. This method detects construction equipment in the video view by taking advantage of entities' motion, shape, and color distribution. Background subtraction, Haar-like features, and eigen-images are used for motion, shape, and color information, respectively. A tracking algorithm steps in the process to make up for the false detections. False detections are identified by catching drastic changes in object size and appearance. The identified false detections are replaced with tracking results. Preliminary experiments show that the combination with tracking has the potential to enhance the detection performance.

Veja mais

Fractal dimension, wavelet shrinkage and anomaly detection for mine hunting

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An anomaly detection approach is considered for the mine hunting in sonar imagery problem. The authors exploit previous work that used dual-tree wavelets and fractal dimension to adaptively suppress sand ripples and a matched filter as an initial detector. Here, lacunarity inspired features are extracted from the remaining false positives, again using dual-tree wavelets. A one-class support vector machine is then used to learn a decision boundary, based only on these false positives. The approach exploits the large quantities of 'normal' natural background data available but avoids the difficult requirement of collecting examples of targets in order to train a classifier. © 2012 The Institution of Engineering and Technology.

Veja mais

Vowel normalisation: Time-domain processing of the internal dynamics of speech

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Human listeners can identify vowels regardless of speaker size, although the sound waves for an adult and a child speaking the ’same’ vowel would differ enormously. The differences are mainly due to the differences in vocal tract length (VTL) and glottal pulse rate (GPR) which are both related to body size. Automatic speech recognition machines are notoriously bad at understanding children if they have been trained on the speech of an adult. In this paper, we propose that the auditory system adapts its analysis of speech sounds, dynamically and automatically to the GPR and VTL of the speaker on a syllable-to-syllable basis. We illustrate how this rapid adaptation might be performed with the aid of a computational version of the auditory image model, and we propose that an auditory preprocessor of this form would improve the robustness of speech recognisers.

Veja mais

Slow peaking and low-gain designs for global stabilization of nonlinear systems

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents an analysis of the slow-peaking phenomenon, a pitfall of low-gain designs that imposes basic limitations to large regions of attraction in nonlinear control systems. The phenomenon is best understood on a chain of integrators perturbed by a vector field up(x, u) that satisfies p(x, 0) = 0. Because small controls (or low-gain designs) are sufficient to stabilize the unperturbed chain of integrators, it may seem that smaller controls, which attenuate the perturbation up(x, u) in a large compact set, can be employed to achieve larger regions of attraction. This intuition is false, however, and peaking may cause a loss of global controllability unless severe growth restrictions are imposed on p(x, u). These growth restrictions are expressed as a higher order condition with respect to a particular weighted dilation related to the peaking exponents of the nominal system. When this higher order condition is satisfied, an explicit control law is derived that achieves global asymptotic stability of x = 0. This stabilization result is extended to more general cascade nonlinear systems in which the perturbation p(x, v) v, v = (ξ, u) T, contains the state ξ and the control u of a stabilizable subsystem ξ = a(ξ, u). As an illustration, a control law is derived that achieves global stabilization of the frictionless ball-and-beam model.

Veja mais

17 resultados para FALSE VOCAL CORDS

em Cambridge University Engineering Department Publications Database

Filtro por publicador