2 resultados para Transformada KLT
em Indian Institute of Science - Bangalore - Índia
Resumo:
Traditional subspace based speech enhancement (SSE)methods use linear minimum mean square error (LMMSE) estimation that is optimal if the Karhunen Loeve transform (KLT) coefficients of speech and noise are Gaussian distributed. In this paper, we investigate the use of Gaussian mixture (GM) density for modeling the non-Gaussian statistics of the clean speech KLT coefficients. Using Gaussian mixture model (GMM), the optimum minimum mean square error (MMSE) estimator is found to be nonlinear and the traditional LMMSE estimator is shown to be a special case. Experimental results show that the proposed method provides better enhancement performance than the traditional subspace based methods.Index Terms: Subspace based speech enhancement, Gaussian mixture density, MMSE estimation.
Resumo:
Communication applications are usually delay restricted, especially for the instance of musicians playing over the Internet. This requires a one-way delay of maximum 25 msec and also a high audio quality is desired at feasible bit rates. The ultra low delay (ULD) audio coding structure is well suited to this application and we investigate further the application of multistage vector quantization (MSVQ) to reach a bit rate range below 64 Kb/s, in a scalable manner. Results at 32 Kb/s and 64 Kb/s show that the trained codebook MSVQ performs best, better than KLT normalization followed by a simulated Gaussian MSVQ or simulated Gaussian MSVQ alone. The results also show that there is only a weak dependence on the training data, and that we indeed converge to the perceptual quality of our previous ULD coder at 64 Kb/s.