31 resultados para Binomial mixture model
em Indian Institute of Science - Bangalore - Índia
Resumo:
Using analysis-by-synthesis (AbS) approach, we develop a soft decision based switched vector quantization (VQ) method for high quality and low complexity coding of wideband speech line spectral frequency (LSF) parameters. For each switching region, a low complexity transform domain split VQ (TrSVQ) is designed. The overall rate-distortion (R/D) performance optimality of new switched quantizer is addressed in the Gaussian mixture model (GMM) based parametric framework. In the AbS approach, the reduction of quantization complexity is achieved through the use of nearest neighbor (NN) TrSVQs and splitting the transform domain vector into higher number of subvectors. Compared to the current LSF quantization methods, the new method is shown to provide competitive or better trade-off between R/D performance and complexity.
Resumo:
We address the issue of rate-distortion (R/D) performance optimality of the recently proposed switched split vector quantization (SSVQ) method. The distribution of the source is modeled using Gaussian mixture density and thus, the non-parametric SSVQ is analyzed in a parametric model based framework for achieving optimum R/D performance. Using high rate quantization theory, we derive the optimum bit allocation formulae for the intra-cluster split vector quantizer (SVQ) and the inter-cluster switching. For the wide-band speech line spectrum frequency (LSF) parameter quantization, it is shown that the Gaussian mixture model (GMM) based parametric SSVQ method provides 1 bit/vector advantage over the non-parametric SSVQ method.
Resumo:
Traditional subspace based speech enhancement (SSE)methods use linear minimum mean square error (LMMSE) estimation that is optimal if the Karhunen Loeve transform (KLT) coefficients of speech and noise are Gaussian distributed. In this paper, we investigate the use of Gaussian mixture (GM) density for modeling the non-Gaussian statistics of the clean speech KLT coefficients. Using Gaussian mixture model (GMM), the optimum minimum mean square error (MMSE) estimator is found to be nonlinear and the traditional LMMSE estimator is shown to be a special case. Experimental results show that the proposed method provides better enhancement performance than the traditional subspace based methods.Index Terms: Subspace based speech enhancement, Gaussian mixture density, MMSE estimation.
Resumo:
Sub-pixel classification is essential for the successful description of many land cover (LC) features with spatial resolution less than the size of the image pixels. A commonly used approach for sub-pixel classification is linear mixture models (LMM). Even though, LMM have shown acceptable results, pragmatically, linear mixtures do not exist. A non-linear mixture model, therefore, may better describe the resultant mixture spectra for endmember (pure pixel) distribution. In this paper, we propose a new methodology for inferring LC fractions by a process called automatic linear-nonlinear mixture model (AL-NLMM). AL-NLMM is a three step process where the endmembers are first derived from an automated algorithm. These endmembers are used by the LMM in the second step that provides abundance estimation in a linear fashion. Finally, the abundance values along with the training samples representing the actual proportions are fed to multi-layer perceptron (MLP) architecture as input to train the neurons which further refines the abundance estimates to account for the non-linear nature of the mixing classes of interest. AL-NLMM is validated on computer simulated hyperspectral data of 200 bands. Validation of the output showed overall RMSE of 0.0089±0.0022 with LMM and 0.0030±0.0001 with the MLP based AL-NLMM, when compared to actual class proportions indicating that individual class abundances obtained from AL-NLMM are very close to the real observations.
Resumo:
We address the problem of robust formant tracking in continuous speech in the presence of additive noise. We propose a new approach based on mixture modeling of the formant contours. Our approach consists of two main steps: (i) Computation of a pyknogram based on multiband amplitude-modulation/frequency-modulation (AM/FM) decomposition of the input speech; and (ii) Statistical modeling of the pyknogram using mixture models. We experiment with both Gaussian mixture model (GMM) and Student's-t mixture model (tMM) and show that the latter is robust with respect to handling outliers in the pyknogram data, parameter selection, accuracy, and smoothness of the estimated formant contours. Experimental results on simulated data as well as noisy speech data show that the proposed tMM-based approach is also robust to additive noise. We present performance comparisons with a recently developed adaptive filterbank technique proposed in the literature and the classical Burg's spectral estimator technique, which show that the proposed technique is more robust to noise.
Resumo:
Adaptive Gaussian Mixture Models (GMM) have been one of the most popular and successful approaches to perform foreground segmentation on multimodal background scenes. However, the good accuracy of the GMM algorithm comes at a high computational cost. An improved GMM technique was proposed by Zivkovic to reduce computational cost by minimizing the number of modes adaptively. In this paper, we propose a modification to his adaptive GMM algorithm that further reduces execution time by replacing expensive floating point computations with low cost integer operations. To maintain accuracy, we derive a heuristic that computes periodic floating point updates for the GMM weight parameter using the value of an integer counter. Experiments show speedups in the range of 1.33 - 1.44 on standard video datasets where a large fraction of pixels are multimodal.
Resumo:
We address the problem of multi-instrument recognition in polyphonic music signals. Individual instruments are modeled within a stochastic framework using Student's-t Mixture Models (tMMs). We impose a mixture of these instrument models on the polyphonic signal model. No a priori knowledge is assumed about the number of instruments in the polyphony. The mixture weights are estimated in a latent variable framework from the polyphonic data using an Expectation Maximization (EM) algorithm, derived for the proposed approach. The weights are shown to indicate instrument activity. The output of the algorithm is an Instrument Activity Graph (IAG), using which, it is possible to find out the instruments that are active at a given time. An average F-ratio of 0 : 7 5 is obtained for polyphonies containing 2-5 instruments, on a experimental test set of 8 instruments: clarinet, flute, guitar, harp, mandolin, piano, trombone and violin.
Resumo:
Grating Compression Transform (GCT) is a two-dimensional analysis of speech signal which has been shown to be effective in multi-pitch tracking in speech mixtures. Multi-pitch tracking methods using GCT apply Kalman filter framework to obtain pitch tracks which requires training of the filter parameters using true pitch tracks. We propose an unsupervised method for obtaining multiple pitch tracks. In the proposed method, multiple pitch tracks are modeled using time-varying means of a Gaussian mixture model (GMM), referred to as TVGMM. The TVGMM parameters are estimated using multiple pitch values at each frame in a given utterance obtained from different patches of the spectrogram using GCT. We evaluate the performance of the proposed method on all voiced speech mixtures as well as random speech mixtures having well separated and close pitch tracks. TVGMM achieves multi-pitch tracking with 51% and 53% multi-pitch estimates having error <= 20% for random mixtures and all-voiced mixtures respectively. TVGMM also results in lower root mean squared error in pitch track estimation compared to that by Kalman filtering.
Resumo:
Variable Endmember Constrained Least Square (VECLS) technique is proposed to account endmember variability in the linear mixture model by incorporating the variance for each class, the signals of which varies from pixel to pixel due to change in urban land cover (LC) structures. VECLS is first tested with a computer simulated three class endmember considering four bands having small, medium and large variability with three different spatial resolutions. The technique is next validated with real datasets of IKONOS, Landsat ETM+ and MODIS. The results show that correlation between actual and estimated proportion is higher by an average of 0.25 for the artificial datasets compared to a situation where variability is not considered. With IKONOS, Landsat ETM+ and MODIS data, the average correlation increased by 0.15 for 2 and 3 classes and by 0.19 for 4 classes, when compared to single endmember per class. (C) 2013 COSPAR. Published by Elsevier Ltd. All rights reserved.
Resumo:
We formulate the problem of detecting the constituent instruments in a polyphonic music piece as a joint decoding problem. From monophonic data, parametric Gaussian Mixture Hidden Markov Models (GM-HMM) are obtained for each instrument. We propose a method to use the above models in a factorial framework, termed as Factorial GM-HMM (F-GM-HMM). The states are jointly inferred to explain the evolution of each instrument in the mixture observation sequence. The dependencies are decoupled using variational inference technique. We show that the joint time evolution of all instruments' states can be captured using F-GM-HMM. We compare performance of proposed method with that of Student's-t mixture model (tMM) and GM-HMM in an existing latent variable framework. Experiments on two to five polyphony with 8 instrument models trained on the RWC dataset, tested on RWC and TRIOS datasets show that F-GM-HMM gives an advantage over the other considered models in segments containing co-occurring instruments.
Resumo:
Considering a general linear model of signal degradation, by modeling the probability density function (PDF) of the clean signal using a Gaussian mixture model (GMM) and additive noise by a Gaussian PDF, we derive the minimum mean square error (MMSE) estimator. The derived MMSE estimator is non-linear and the linear MMSE estimator is shown to be a special case. For speech signal corrupted by independent additive noise, by modeling the joint PDF of time-domain speech samples of a speech frame using a GMM, we propose a speech enhancement method based on the derived MMSE estimator. We also show that the same estimator can be used for transform-domain speech enhancement.
Resumo:
In this paper, we present a new speech enhancement approach, that is based on exploiting the intra-frame dependency of discrete cosine transform (DCT) domain coefficients. It can be noted that the existing enhancement techniques treat the transformdomain coefficients independently. Instead of this traditional approach of independently processing the scalars, we split the DCT domain noisy speech vector into sub-vectors and each sub-vector is enhanced independently. Through this sub-vector based approach, the higher dimensional enhancement advantage, viz. non-linear dependency, is exploited. In the developed method, each clean speech sub-vector is modeled using a Gaussian mixture (GM) density. We show that the proposed Gaussian mixture model (GMM) based DCT domain method, using sub-vector processing approach, provides better performance than the conventional approach of enhancing the transform domain scalar components independently. Performance improvement over the recently proposed GMM based time domain approach is also shown.
Resumo:
We develop a Gaussian mixture model (GMM) based vector quantization (VQ) method for coding wideband speech line spectrum frequency (LSF) parameters at low complexity. The PDF of LSF source vector is modeled using the Gaussian mixture (GM) density with higher number of uncorrelated Gaussian mixtures and an optimum scalar quantizer (SQ) is designed for each Gaussian mixture. The reduction of quantization complexity is achieved using the relevant subset of available optimum SQs. For an input vector, the subset of quantizers is chosen using nearest neighbor criteria. The developed method is compared with the recent VQ methods and shown to provide high quality rate-distortion (R/D) performance at lower complexity. In addition, the developed method also provides the advantages of bitrate scalability and rate-independent complexity.
Resumo:
Considering a general linear model of signal degradation, by modeling the probability density function (PDF) of the clean signal using a Gaussian mixture model (GMM) and additive noise by a Gaussian PDF, we derive the minimum mean square error (MMSE) estimator.The derived MMSE estimator is non-linear and the linear MMSE estimator is shown to be a special case. For speech signal corrupted by independent additive noise, by modeling the joint PDF of time-domain speech samples of a speech frame using a GMM, we propose a speech enhancement method based on the derived MMSE estimator. We also show that the same estimator can be used for transform-domain speech enhancement.
Resumo:
The effect of electromagnetic stirring of melt on the final macrosegregation in the continuous casting of an aluminium alloy billet is studied numerically. A continuum mixture model for solidification in presence of electromagnetic stirring is presented. As a case study, simulations are performed for direct chill (DC) casting of an Al-Cu alloy and the effect of electromagnetic stirring on macrosegregation is analysed. The model predicts the temperature, velocity, and species distribution in the mold. As a special case, we have also studied the case in which dendritic particles are fragmented at the interface due to vigorous electromagnetic stirring. For this case, an additional conservation equation for the transport of solid fraction is solved. For modeling the resistance offered by moving solid crystals, a switching function in the momentum equations is used for variation of viscosity. The fragmentation and transport of dendritic particles has a profound effect on the final macrosegregation and microstructure of the solidified billet. It is found that the application of electromagnetic stirring in continuous casting of billets results in better temperature uniformity and macrosegregation pattern.