899 resultados para two-Gaussian mixture model
Resumo:
Grating Compression Transform (GCT) is a two-dimensional analysis of speech signal which has been shown to be effective in multi-pitch tracking in speech mixtures. Multi-pitch tracking methods using GCT apply Kalman filter framework to obtain pitch tracks which requires training of the filter parameters using true pitch tracks. We propose an unsupervised method for obtaining multiple pitch tracks. In the proposed method, multiple pitch tracks are modeled using time-varying means of a Gaussian mixture model (GMM), referred to as TVGMM. The TVGMM parameters are estimated using multiple pitch values at each frame in a given utterance obtained from different patches of the spectrogram using GCT. We evaluate the performance of the proposed method on all voiced speech mixtures as well as random speech mixtures having well separated and close pitch tracks. TVGMM achieves multi-pitch tracking with 51% and 53% multi-pitch estimates having error <= 20% for random mixtures and all-voiced mixtures respectively. TVGMM also results in lower root mean squared error in pitch track estimation compared to that by Kalman filtering.
Resumo:
Using analysis-by-synthesis (AbS) approach, we develop a soft decision based switched vector quantization (VQ) method for high quality and low complexity coding of wideband speech line spectral frequency (LSF) parameters. For each switching region, a low complexity transform domain split VQ (TrSVQ) is designed. The overall rate-distortion (R/D) performance optimality of new switched quantizer is addressed in the Gaussian mixture model (GMM) based parametric framework. In the AbS approach, the reduction of quantization complexity is achieved through the use of nearest neighbor (NN) TrSVQs and splitting the transform domain vector into higher number of subvectors. Compared to the current LSF quantization methods, the new method is shown to provide competitive or better trade-off between R/D performance and complexity.
Resumo:
We address the issue of rate-distortion (R/D) performance optimality of the recently proposed switched split vector quantization (SSVQ) method. The distribution of the source is modeled using Gaussian mixture density and thus, the non-parametric SSVQ is analyzed in a parametric model based framework for achieving optimum R/D performance. Using high rate quantization theory, we derive the optimum bit allocation formulae for the intra-cluster split vector quantizer (SVQ) and the inter-cluster switching. For the wide-band speech line spectrum frequency (LSF) parameter quantization, it is shown that the Gaussian mixture model (GMM) based parametric SSVQ method provides 1 bit/vector advantage over the non-parametric SSVQ method.
Resumo:
Traditional subspace based speech enhancement (SSE)methods use linear minimum mean square error (LMMSE) estimation that is optimal if the Karhunen Loeve transform (KLT) coefficients of speech and noise are Gaussian distributed. In this paper, we investigate the use of Gaussian mixture (GM) density for modeling the non-Gaussian statistics of the clean speech KLT coefficients. Using Gaussian mixture model (GMM), the optimum minimum mean square error (MMSE) estimator is found to be nonlinear and the traditional LMMSE estimator is shown to be a special case. Experimental results show that the proposed method provides better enhancement performance than the traditional subspace based methods.Index Terms: Subspace based speech enhancement, Gaussian mixture density, MMSE estimation.
Resumo:
Adaptive Gaussian Mixture Models (GMM) have been one of the most popular and successful approaches to perform foreground segmentation on multimodal background scenes. However, the good accuracy of the GMM algorithm comes at a high computational cost. An improved GMM technique was proposed by Zivkovic to reduce computational cost by minimizing the number of modes adaptively. In this paper, we propose a modification to his adaptive GMM algorithm that further reduces execution time by replacing expensive floating point computations with low cost integer operations. To maintain accuracy, we derive a heuristic that computes periodic floating point updates for the GMM weight parameter using the value of an integer counter. Experiments show speedups in the range of 1.33 - 1.44 on standard video datasets where a large fraction of pixels are multimodal.
Resumo:
We address the problem of robust formant tracking in continuous speech in the presence of additive noise. We propose a new approach based on mixture modeling of the formant contours. Our approach consists of two main steps: (i) Computation of a pyknogram based on multiband amplitude-modulation/frequency-modulation (AM/FM) decomposition of the input speech; and (ii) Statistical modeling of the pyknogram using mixture models. We experiment with both Gaussian mixture model (GMM) and Student's-t mixture model (tMM) and show that the latter is robust with respect to handling outliers in the pyknogram data, parameter selection, accuracy, and smoothness of the estimated formant contours. Experimental results on simulated data as well as noisy speech data show that the proposed tMM-based approach is also robust to additive noise. We present performance comparisons with a recently developed adaptive filterbank technique proposed in the literature and the classical Burg's spectral estimator technique, which show that the proposed technique is more robust to noise.
Resumo:
This paper investigates sub-integer implementations of the adaptive Gaussian mixture model (GMM) for background/foreground segmentation to allow the deployment of the method on low cost/low power processors that lack Floating Point Unit (FPU). We propose two novel integer computer arithmetic techniques to update Gaussian parameters. Specifically, the mean value and the variance of each Gaussian are updated by a redefined and generalised "round'' operation that emulates the original updating rules for a large set of learning rates. Weights are represented by counters that are updated following stochastic rules to allow a wider range of learning rates and the weight trend is approximated by a line or a staircase. We demonstrate that the memory footprint and computational cost of GMM are significantly reduced, without significantly affecting the performance of background/foreground segmentation.
Resumo:
We present a novel filtering algorithm for tracking multiple clusters of coordinated objects. Based on a Markov chain Monte Carlo (MCMC) mechanism, the new algorithm propagates a discrete approximation of the underlying filtering density. A dynamic Gaussian mixture model is utilized for representing the time-varying clustering structure. This involves point process formulations of typical behavioral moves such as birth and death of clusters as well as merging and splitting. For handling complex, possibly large scale scenarios, the sampling efficiency of the basic MCMC scheme is enhanced via the use of a Metropolis within Gibbs particle refinement step. As the proposed methodology essentially involves random set representations, a new type of estimator, termed the probability hypothesis density surface (PHDS), is derived for computing point estimates. It is further proved that this estimator is optimal in the sense of the mean relative entropy. Finally, the algorithm's performance is assessed and demonstrated in both synthetic and realistic tracking scenarios. © 2012 Elsevier Ltd. All rights reserved.
Resumo:
Cetacean respiration usually happen in bouts. The most widely applied quantitative method used to analyze the structure of these bouts is the log(e)-survivorship analysis, based on the assumption that the respiratory intervals are distributed as negative exponentials. However, for the data collected from three captive Yangtze finless porpoises (Neophocaena phocaenoides asiaeorientalis), we failed to obtain a convergent result with the application of log,survivorship analysis. However, the two-Gaussian model, which was recently proposed to analyze the feeding behavior of cows, was successfully fitted to the data. According to the fitting results, the overall respiratory pattern of the captive Yangtze finless porpoises can be described as a dive with a mean duration of around 30-40 s, followed by two or three ventilations with a mean interval of approximately 9 s. The average intra-bout intervals during both active and inactive periods are constant at 7.7-9.9 s for all individuals. However, when shifting from active to inactive states, the adult male and female decrease their mean numbers of respirations per bout and average length of inter-bout respiratory intervals, while the estimates of both parameters increase for the juvenile female. It was pointed out that the two-Gaussian model might be more adequate for cetacean respiratory-bout structure analyses than the log(e)-survivorship technique.
Resumo:
We formulate the problem of detecting the constituent instruments in a polyphonic music piece as a joint decoding problem. From monophonic data, parametric Gaussian Mixture Hidden Markov Models (GM-HMM) are obtained for each instrument. We propose a method to use the above models in a factorial framework, termed as Factorial GM-HMM (F-GM-HMM). The states are jointly inferred to explain the evolution of each instrument in the mixture observation sequence. The dependencies are decoupled using variational inference technique. We show that the joint time evolution of all instruments' states can be captured using F-GM-HMM. We compare performance of proposed method with that of Student's-t mixture model (tMM) and GM-HMM in an existing latent variable framework. Experiments on two to five polyphony with 8 instrument models trained on the RWC dataset, tested on RWC and TRIOS datasets show that F-GM-HMM gives an advantage over the other considered models in segments containing co-occurring instruments.