925 resultados para Compressed speech
Resumo:
This work addresses the problem of deriving F0 from distanttalking speech signals acquired by a microphone network. The method here proposed exploits the redundancy across the channels by jointly processing the different signals. To this purpose, a multi-microphone periodicity function is derived from the magnitude spectrum of all the channels. This function allows to estimate F0 reliably, even under reverberant conditions, without the need of any post-processing or smoothing technique. Experiments, conducted on real data, showed that the proposed frequency-domain algorithm is more suitable than other time-domain based ones.
Resumo:
The objective of this paper is to propose a signal processing scheme that employs subspace-based spectral analysis for the purpose of formant estimation of speech signals. Specifically, the scheme is based on decimative spectral estimation that uses Eigenanalysis and SVD (Singular Value Decomposition). The underlying model assumes a decomposition of the processed signal into complex damped sinusoids. In the case of formant tracking, the algorithm is applied on a small amount of the autocorrelation coefficients of a speech frame. The proposed scheme is evaluated on both artificial and real speech utterances from the TIMIT database. For the first case, comparative results to standard methods are provided which indicate that the proposed methodology successfully estimates formant trajectories.
Resumo:
This paper describes recent improvements to the Cambridge Arabic Large Vocabulary Continuous Speech Recognition (LVCSR) Speech-to-Text (STT) system. It is shown that wordboundary context markers provide a powerful method to enhance graphemic systems by implicit phonetic information, improving the modelling capability of graphemic systems. In addition, a robust technique for full covariance Gaussian modelling in the Minimum Phone Error (MPE) training framework is introduced. This reduces the full covariance training to a diagonal covariance training problem, thereby solving related robustness problems. The full system results show that the combined use of these and other techniques within a multi-branch combination framework reduces the Word Error Rate (WER) of the complete system by up to 5.9% relative. Copyright © 2011 ISCA.
Resumo:
Mandarin Chinese is based on characters which are syllabic in nature and morphological in meaning. All spoken languages have syllabiotactic rules which govern the construction of syllables and their allowed sequences. These constraints are not as restrictive as those learned from word sequences, but they can provide additional useful linguistic information. Hence, it is possible to improve speech recognition performance by appropriately combining these two types of constraints. For the Chinese language considered in this paper, character level language models (LMs) can be used as a first level approximation to allowed syllable sequences. To test this idea, word and character level n-gram LMs were trained on 2.8 billion words (equivalent to 4.3 billion characters) of texts from a wide collection of text sources. Both hypothesis and model based combination techniques were investigated to combine word and character level LMs. Significant character error rate reductions up to 7.3% relative were obtained on a state-of-the-art Mandarin Chinese broadcast audio recognition task using an adapted history dependent multi-level LM that performs a log-linearly combination of character and word level LMs. This supports the hypothesis that character or syllable sequence models are useful for improving Mandarin speech recognition performance.
Resumo:
Human listeners can identify vowels regardless of speaker size, although the sound waves for an adult and a child speaking the ’same’ vowel would differ enormously. The differences are mainly due to the differences in vocal tract length (VTL) and glottal pulse rate (GPR) which are both related to body size. Automatic speech recognition machines are notoriously bad at understanding children if they have been trained on the speech of an adult. In this paper, we propose that the auditory system adapts its analysis of speech sounds, dynamically and automatically to the GPR and VTL of the speaker on a syllable-to-syllable basis. We illustrate how this rapid adaptation might be performed with the aid of a computational version of the auditory image model, and we propose that an auditory preprocessor of this form would improve the robustness of speech recognisers.
Resumo:
A key challenge in achieving good transient performance of highly boosted engines is the difficulty of accelerating the turbocharger from low air flow conditions (“turbo lag”). Multi-stage turbocharging, electric turbocharger assistance, electric compressors and hybrid powertrains are helpful in the mitigation of this deficit, but these technologies add significant cost and integration effort. Air-assist systems have the potential to be more cost-effective. Injecting compressed air into the intake manifold has received considerable attention, but the performance improvement offered by this concept is severely constrained by the compressor surge limit. The literature describes many schemes for generating the compressed gas, often involving significant mechanical complexity and/or cost. In this paper we demonstrate a novel exhaust assist system in which a reservoir is charged during braking. Experiments have been conducted using a 2.0 litre light-duty Diesel engine equipped with exhaust gas recirculation (EGR) and variable geometry turbine (VGT) coupled to an AC transient dynamometer, which was controlled to mimic engine load during in-gear braking and acceleration. The experimental results confirm that the proposed system reduces the time to torque during the 3rd gear tip-in by around 60%. Such a significant improvement was possible due to the increased acceleration of turbocharger immediately after the tip-in. Injecting the compressed gas into the exhaust manifold circumvents the problem of compressor surge and is the key enabler of the superior performance of the proposed concept.
Resumo:
A key challenge in achieving good transient performance of highly boosted engines is the difficulty of accelerating the turbocharger from low air flow conditions (turbo lag). Multi-stage turbocharging, electric turbocharger assistance, electric compressors and hybrid powertrains are helpful in the mitigation of this deficit, but these technologies add significant cost and integration effort. Air-assist systems have the potential to be more cost-effective. Injecting compressed air into the intake manifold has received considerable attention, but the performance improvement offered by this concept is severely constrained by the compressor surge limit. The literature describes many schemes for generating the compressed gas, often involving significant mechanical complexity and/or cost. In this paper we demonstrate a novel exhaust assist system in which a reservoir is charged during braking. Experiments have been conducted using a 2.0 litre light-duty Diesel engine equipped with exhaust gas recirculation (EGR) and variable geometry turbine (VGT) coupled to an AC transient dynamometer, which was controlled to mimic engine load during in-gear braking and acceleration. The experimental results confirm that the proposed system reduces the time to torque during the 3rd gear tip-in by around 60%. Such a significant improvement was possible due to the increased acceleration of turbocharger immediately after the tip-in. Injecting the compressed gas into the exhaust manifold circumvents the problem of compressor surge and is the key enabler of the superior performance of the proposed concept. Copyright © 2013 SAE International.