Biblioteca Digital

935 resultados para Offensive speech

The Effect of Using Normalized Models in Statistical Speech Synthesis

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Speech development: Toddlers don't mind getting it wrong

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A recent study has found that toddlers do not compensate for an artificial alteration in a vowel they hear themselves producing. This raises questions about how young children learn speech sounds. © 2012 Elsevier Ltd.

Veja mais

A variational perspective on noise-robust speech recognition

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Model compensation methods for noise-robust speech recognition have shown good performance. Predictive linear transformations can approximate these methods to balance computational complexity and compensation accuracy. This paper examines both of these approaches from a variational perspective. Using a matched-pair approximation at the component level yields a number of standard forms of model compensation and predictive linear transformations. However, a tighter bound can be obtained by using variational approximations at the state level. Both model-based and predictive linear transform schemes can be implemented in this framework. Preliminary results show that the tighter bound obtained from the state-level variational approach can yield improved performance over standard schemes. © 2011 IEEE.

Veja mais

Improving reverberant VTS for hands-free robust speech recognition

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Model-based approaches to handling additive background noise and channel distortion, such as Vector Taylor Series (VTS), have been intensively studied and extended in a number of ways. In previous work, VTS has been extended to handle both reverberant and background noise, yielding the Reverberant VTS (RVTS) scheme. In this work, rather than assuming the observation vector is generated by the reverberation of a sequence of background noise corrupted speech vectors, as in RVTS, the observation vector is modelled as a superposition of the background noise and the reverberation of clean speech. This yields a new compensation scheme RVTS Joint (RVTSJ), which allows an easy formulation for joint estimation of both additive and reverberation noise parameters. These two compensation schemes were evaluated and compared on a simulated reverberant noise corrupted AURORA4 task. Both yielded large gains over VTS baseline system, with RVTSJ outperforming the previous RVTS scheme. © 2011 IEEE.

Veja mais

Statistical parametric speech synthesis based on speaker and language factorization

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An increasingly common scenario in building speech synthesis and recognition systems is training on inhomogeneous data. This paper proposes a new framework for estimating hidden Markov models on data containing both multiple speakers and multiple languages. The proposed framework, speaker and language factorization, attempts to factorize speaker-/language-specific characteristics in the data and then model them using separate transforms. Language-specific factors in the data are represented by transforms based on cluster mean interpolation with cluster-dependent decision trees. Acoustic variations caused by speaker characteristics are handled by transforms based on constrained maximum-likelihood linear regression. Experimental results on statistical parametric speech synthesis show that the proposed framework enables data from multiple speakers in different languages to be used to: train a synthesis system; synthesize speech in a language using speaker characteristics estimated in a different language; and adapt to a new language. © 2012 IEEE.

Veja mais

Compression techniques applied to multiple speech recognition systems

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Speech recognition systems typically contain many Gaussian distributions, and hence a large number of parameters. This makes them both slow to decode speech, and large to store. Techniques have been proposed to decrease the number of parameters. One approach is to share parameters between multiple Gaussians, thus reducing the total number of parameters and allowing for shared likelihood calculation. Gaussian tying and subspace clustering are two related techniques which take this approach to system compression. These techniques can decrease the number of parameters with no noticeable drop in performance for single systems. However, multiple acoustic models are often used in real speech recognition systems. This paper considers the application of Gaussian tying and subspace compression to multiple systems. Results show that two speech recognition systems can be modelled using the same number of Gaussians as just one system, with little effect on individual system performance. Copyright © 2009 ISCA.

Veja mais

Hidden Markov models in speech and language processing

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Compression Techniques Applied to Multiple Speech Recognition Systems

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Robust F0 estimation based on a multi-microphone periodicity function for distant-talking speech

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work addresses the problem of deriving F0 from distanttalking speech signals acquired by a microphone network. The method here proposed exploits the redundancy across the channels by jointly processing the different signals. To this purpose, a multi-microphone periodicity function is derived from the magnitude spectrum of all the channels. This function allows to estimate F0 reliably, even under reverberant conditions, without the need of any post-processing or smoothing technique. Experiments, conducted on real data, showed that the proposed frequency-domain algorithm is more suitable than other time-domain based ones.

Veja mais

Formant estimation of speech signals using subspace-based spectral analysis

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The objective of this paper is to propose a signal processing scheme that employs subspace-based spectral analysis for the purpose of formant estimation of speech signals. Specifically, the scheme is based on decimative spectral estimation that uses Eigenanalysis and SVD (Singular Value Decomposition). The underlying model assumes a decomposition of the processed signal into complex damped sinusoids. In the case of formant tracking, the algorithm is applied on a small amount of the autocorrelation coefficients of a speech frame. The proposed scheme is evaluated on both artificial and real speech utterances from the TIMIT database. For the first case, comparative results to standard methods are provided which indicate that the proposed methodology successfully estimates formant trajectories.

Veja mais

Speaker and Noise Factorization for Robust Speech Recognition

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Word boundary modelling and full covariance gaussians for Arabic Speech-to-Text systems

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes recent improvements to the Cambridge Arabic Large Vocabulary Continuous Speech Recognition (LVCSR) Speech-to-Text (STT) system. It is shown that wordboundary context markers provide a powerful method to enhance graphemic systems by implicit phonetic information, improving the modelling capability of graphemic systems. In addition, a robust technique for full covariance Gaussian modelling in the Minimum Phone Error (MPE) training framework is introduced. This reduces the full covariance training to a diagonal covariance training problem, thereby solving related robustness problems. The full system results show that the combined use of these and other techniques within a multi-branch combination framework reduces the Word Error Rate (WER) of the complete system by up to 5.9% relative. Copyright © 2011 ISCA.

Veja mais

Syllable language models for Mandarin speech recognition: exploiting character language models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mandarin Chinese is based on characters which are syllabic in nature and morphological in meaning. All spoken languages have syllabiotactic rules which govern the construction of syllables and their allowed sequences. These constraints are not as restrictive as those learned from word sequences, but they can provide additional useful linguistic information. Hence, it is possible to improve speech recognition performance by appropriately combining these two types of constraints. For the Chinese language considered in this paper, character level language models (LMs) can be used as a first level approximation to allowed syllable sequences. To test this idea, word and character level n-gram LMs were trained on 2.8 billion words (equivalent to 4.3 billion characters) of texts from a wide collection of text sources. Both hypothesis and model based combination techniques were investigated to combine word and character level LMs. Significant character error rate reductions up to 7.3% relative were obtained on a state-of-the-art Mandarin Chinese broadcast audio recognition task using an adapted history dependent multi-level LM that performs a log-linearly combination of character and word level LMs. This supports the hypothesis that character or syllable sequence models are useful for improving Mandarin speech recognition performance.

Veja mais

Importance sampling to compute likelihoods of noise-corrupted speech

Relevância:

20.00% 20.00%

Publicador:

Veja mais

935 resultados para Offensive speech

Filtro por publicador