8 resultados para Brisbane, Australia

em Indian Institute of Science - Bangalore - Índia


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Traditional subspace based speech enhancement (SSE)methods use linear minimum mean square error (LMMSE) estimation that is optimal if the Karhunen Loeve transform (KLT) coefficients of speech and noise are Gaussian distributed. In this paper, we investigate the use of Gaussian mixture (GM) density for modeling the non-Gaussian statistics of the clean speech KLT coefficients. Using Gaussian mixture model (GMM), the optimum minimum mean square error (MMSE) estimator is found to be nonlinear and the traditional LMMSE estimator is shown to be a special case. Experimental results show that the proposed method provides better enhancement performance than the traditional subspace based methods.Index Terms: Subspace based speech enhancement, Gaussian mixture density, MMSE estimation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We formulate a two-stage Iterative Wiener filtering (IWF) approach to speech enhancement, bettering the performance of constrained IWF, reported in literature. The codebook constrained IWF (CCIWF) has been shown to be effective in achieving convergence of IWF in the presence of both stationary and non-stationary noise. To this, we include a second stage of unconstrained IWF and show that the speech enhancement performance can be improved in terms of average segmental SNR (SSNR), Itakura-Saito (IS) distance and Linear Prediction Coefficients (LPC) parameter coincidence. We also explore the tradeoff between the number of CCIWF iterations and the second stage IWF iterations.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Effective feature extraction for robust speech recognition is a widely addressed topic and currently there is much effort to invoke non-stationary signal models instead of quasi-stationary signal models leading to standard features such as LPC or MFCC. Joint amplitude modulation and frequency modulation (AM-FM) is a classical non-parametric approach to non-stationary signal modeling and recently new feature sets for automatic speech recognition (ASR) have been derived based on a multi-band AM-FM representation of the signal. We consider several of these representations and compare their performances for robust speech recognition in noise, using the AURORA-2 database. We show that FEPSTRUM representation proposed is more effective than others. We also propose an improvement to FEPSTRUM based on the Teager energy operator (TEO) and show that it can selectively outperform even FEPSTRUM

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Segmental dynamic time warping (DTW) has been demonstrated to be a useful technique for finding acoustic similarity scores between segments of two speech utterances. Due to its high computational requirements, it had to be computed in an offline manner, limiting the applications of the technique. In this paper, we present results of parallelization of this task by distributing the workload in either a static or dynamic way on an 8-processor cluster and discuss the trade-offs among different distribution schemes. We show that online unsupervised pattern discovery using segmental DTW is plausible with as low as 8 processors. This brings the task within reach of today's general purpose multi-core servers. We also show results on a 32-processor system, and discuss factors affecting scalability of our methods.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Structural Support Vector Machines (SSVMs) have become a popular tool in machine learning for predicting structured objects like parse trees, Part-of-Speech (POS) label sequences and image segments. Various efficient algorithmic techniques have been proposed for training SSVMs for large datasets. The typical SSVM formulation contains a regularizer term and a composite loss term. The loss term is usually composed of the Linear Maximum Error (LME) associated with the training examples. Other alternatives for the loss term are yet to be explored for SSVMs. We formulate a new SSVM with Linear Summed Error (LSE) loss term and propose efficient algorithms to train the new SSVM formulation using primal cutting-plane method and sequential dual coordinate descent method. Numerical experiments on benchmark datasets demonstrate that the sequential dual coordinate descent method is faster than the cutting-plane method and reaches the steady-state generalization performance faster. It is thus a useful alternative for training SSVMs when linear summed error is used.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The predictability of a chaotic series is limited to a few future time steps due to its sensitivity to initial conditions and the exponential divergence of the trajectories. Over the years, streamflow has been considered as a stochastic system in many approaches. In this study, the chaotic nature of daily streamflow is investigated using autocorrelation function, Fourier spectrum, correlation dimension method (Grassberger-Procaccia algorithm) and false nearest neighbor method. Embedding dimensions of 6-7 obtained indicates the possible presence of low-dimensional chaotic behavior. The predictability of the system is estimated by calculating the system’s Lyapunov exponent. A positive maximum Lyapunov exponent of 0.167 indicates that the system is chaotic and unstable with a maximum predictability of only 6 days. These results give a positive indication towards considering streamflow as a low dimensional chaotic system than as a stochastic system.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

There is a drop in the flutter boundary of an aeroelastic system placed in a transonic flow due to compressibility effects and is known as the transonic dip. Viscous effects can shift the lo-cation of the shock and depending on the shock strength the boundary layer may separate leading to changes in the flutter speed. An unsteady Euler flow solver coupled with the structural dynamic equations is used to understand the effect of shock on the transonic dip. The effect of various system parameters such as mass ratio, location of the center of mass, position of the elastic axis, ratio of uncoupled natural frequencies in heave and pitch are also studied. Steady turbulent flow results are presented to demonstrate the effect of viscosity on the location and strength of the shock.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We formulate the problem of detecting the constituent instruments in a polyphonic music piece as a joint decoding problem. From monophonic data, parametric Gaussian Mixture Hidden Markov Models (GM-HMM) are obtained for each instrument. We propose a method to use the above models in a factorial framework, termed as Factorial GM-HMM (F-GM-HMM). The states are jointly inferred to explain the evolution of each instrument in the mixture observation sequence. The dependencies are decoupled using variational inference technique. We show that the joint time evolution of all instruments' states can be captured using F-GM-HMM. We compare performance of proposed method with that of Student's-t mixture model (tMM) and GM-HMM in an existing latent variable framework. Experiments on two to five polyphony with 8 instrument models trained on the RWC dataset, tested on RWC and TRIOS datasets show that F-GM-HMM gives an advantage over the other considered models in segments containing co-occurring instruments.