55 resultados para Speech and pioneering sports Colima

em Indian Institute of Science - Bangalore - Índia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We are addressing the novel problem of jointly evaluating multiple speech patterns for automatic speech recognition and training. We propose solutions based on both the non-parametric dynamic time warping (DTW) algorithm, and the parametric hidden Markov model (HMM). We show that a hybrid approach is quite effective for the application of noisy speech recognition. We extend the concept to HMM training wherein some patterns may be noisy or distorted. Utilizing the concept of ``virtual pattern'' developed for joint evaluation, we propose selective iterative training of HMMs. Evaluating these algorithms for burst/transient noisy speech and isolated word recognition, significant improvement in recognition accuracy is obtained using the new algorithms over those which do not utilize the joint evaluation strategy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Non-uniform sampling of a signal is formulated as an optimization problem which minimizes the reconstruction signal error. Dynamic programming (DP) has been used to solve this problem efficiently for a finite duration signal. Further, the optimum samples are quantized to realize a speech coder. The quantizer and the DP based optimum search for non-uniform samples (DP-NUS) can be combined in a closed-loop manner, which provides distinct advantage over the open-loop formulation. The DP-NUS formulation provides a useful control over the trade-off between bitrate and performance (reconstruction error). It is shown that 5-10 dB SNR improvement is possible using DP-NUS compared to extrema sampling approach. In addition, the close-loop DP-NUS gives a 4-5 dB improvement in reconstruction error.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The design and operation of the minimum cost classifier, where the total cost is the sum of the measurement cost and the classification cost, is computationally complex. Noting the difficulties associated with this approach, decision tree design directly from a set of labelled samples is proposed in this paper. The feature space is first partitioned to transform the problem to one of discrete features. The resulting problem is solved by a dynamic programming algorithm over an explicitly ordered state space of all outcomes of all feature subsets. The solution procedure is very general and is applicable to any minimum cost pattern classification problem in which each feature has a finite number of outcomes. These techniques are applied to (i) voiced, unvoiced, and silence classification of speech, and (ii) spoken vowel recognition. The resulting decision trees are operationally very efficient and yield attractive classification accuracies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose apractical, feature-level and score-level fusion approach by combining acoustic and estimated articulatory information for both text independent and text dependent speaker verification. From a practical point of view, we study how to improve speaker verification performance by combining dynamic articulatory information with the conventional acoustic features. On text independent speaker verification, we find that concatenating articulatory features obtained from measured speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the performance dramatically. However, since directly measuring articulatory data is not feasible in many real world applications, we also experiment with estimated articulatory features obtained through acoustic-to-articulatory inversion. We explore both feature level and score level fusion methods and find that the overall system performance is significantly enhanced even with estimated articulatory features. Such a performance boost could be due to the inter-speaker variation information embedded in the estimated articulatory features. Since the dynamics of articulation contain important information, we included inverted articulatory trajectories in text dependent speaker verification. We demonstrate that the articulatory constraints introduced by inverted articulatory features help to reject wrong password trials and improve the performance after score level fusion. We evaluate the proposed methods on the X-ray Microbeam database and the RSR 2015 database, respectively, for the aforementioned two tasks. Experimental results show that we achieve more than 15% relative equal error rate reduction for both speaker verification tasks. (C) 2015 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Compressive sensing (CS) has been proposed for signals with sparsity in a linear transform domain. We explore a signal dependent unknown linear transform, namely the impulse response matrix operating on a sparse excitation, as in the linear model of speech production, for recovering compressive sensed speech. Since the linear transform is signal dependent and unknown, unlike the standard CS formulation, a codebook of transfer functions is proposed in a matching pursuit (MP) framework for CS recovery. It is found that MP is efficient and effective to recover CS encoded speech as well as jointly estimate the linear model. Moderate number of CS measurements and low order sparsity estimate will result in MP converge to the same linear transform as direct VQ of the LP vector derived from the original signal. There is also high positive correlation between signal domain approximation and CS measurement domain approximation for a large variety of speech spectra.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Considering a general linear model of signal degradation, by modeling the probability density function (PDF) of the clean signal using a Gaussian mixture model (GMM) and additive noise by a Gaussian PDF, we derive the minimum mean square error (MMSE) estimator. The derived MMSE estimator is non-linear and the linear MMSE estimator is shown to be a special case. For speech signal corrupted by independent additive noise, by modeling the joint PDF of time-domain speech samples of a speech frame using a GMM, we propose a speech enhancement method based on the derived MMSE estimator. We also show that the same estimator can be used for transform-domain speech enhancement.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a simple speech music discriminator that uses features based on HILN(Harmonics, Individual Lines and Noise) model. We have been able to test the strength of the feature set on a standard database of 66 files and get an accuracy of around 97%. We also have tested on sung queries and polyphonic music and have got very good results. The current algorithm is being used to discriminate between sung queries and played (using an instrument like flute) queries for a Query by Humming(QBH) system currently under development in the lab.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traditional subspace based speech enhancement (SSE)methods use linear minimum mean square error (LMMSE) estimation that is optimal if the Karhunen Loeve transform (KLT) coefficients of speech and noise are Gaussian distributed. In this paper, we investigate the use of Gaussian mixture (GM) density for modeling the non-Gaussian statistics of the clean speech KLT coefficients. Using Gaussian mixture model (GMM), the optimum minimum mean square error (MMSE) estimator is found to be nonlinear and the traditional LMMSE estimator is shown to be a special case. Experimental results show that the proposed method provides better enhancement performance than the traditional subspace based methods.Index Terms: Subspace based speech enhancement, Gaussian mixture density, MMSE estimation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Considering a general linear model of signal degradation, by modeling the probability density function (PDF) of the clean signal using a Gaussian mixture model (GMM) and additive noise by a Gaussian PDF, we derive the minimum mean square error (MMSE) estimator.The derived MMSE estimator is non-linear and the linear MMSE estimator is shown to be a special case. For speech signal corrupted by independent additive noise, by modeling the joint PDF of time-domain speech samples of a speech frame using a GMM, we propose a speech enhancement method based on the derived MMSE estimator. We also show that the same estimator can be used for transform-domain speech enhancement.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We introduce a novel temporal feature of a signal, namely extrema-based signal track length (ESTL) for the problem of speech segmentation. We show that ESTL measure is sensitive to both amplitude and frequency of the signal. The short-time ESTL (ST_ESTL) shows a promising way to capture the significant segments of speech signal, where the segments correspond to acoustic units of speech having distinct temporal waveforms. We compare ESTL based segmentation with ML and STM methods and find that it is as good as spectral feature based segmentation, but with lesser computational complexity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we develop a low-complexity message passing algorithm for joint support and signal recovery of approximately sparse signals. The problem of recovery of strictly sparse signals from noisy measurements can be viewed as a problem of recovery of approximately sparse signals from noiseless measurements, making the approach applicable to strictly sparse signal recovery from noisy measurements. The support recovery embedded in the approach makes it suitable for recovery of signals with same sparsity profiles, as in the problem of multiple measurement vectors (MMV). Simulation results show that the proposed algorithm, termed as JSSR-MP (joint support and signal recovery via message passing) algorithm, achieves performance comparable to that of sparse Bayesian learning (M-SBL) algorithm in the literature, at one order less complexity compared to the M-SBL algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Current scientific research is characterized by increasing specialization, accumulating knowledge at a high speed due to parallel advances in a multitude of sub-disciplines. Recent estimates suggest that human knowledge doubles every two to three years – and with the advances in information and communication technologies, this wide body of scientific knowledge is available to anyone, anywhere, anytime. This may also be referred to as ambient intelligence – an environment characterized by plentiful and available knowledge. The bottleneck in utilizing this knowledge for specific applications is not accessing but assimilating the information and transforming it to suit the needs for a specific application. The increasingly specialized areas of scientific research often have the common goal of converting data into insight allowing the identification of solutions to scientific problems. Due to this common goal, there are strong parallels between different areas of applications that can be exploited and used to cross-fertilize different disciplines. For example, the same fundamental statistical methods are used extensively in speech and language processing, in materials science applications, in visual processing and in biomedicine. Each sub-discipline has found its own specialized methodologies making these statistical methods successful to the given application. The unification of specialized areas is possible because many different problems can share strong analogies, making the theories developed for one problem applicable to other areas of research. It is the goal of this paper to demonstrate the utility of merging two disparate areas of applications to advance scientific research. The merging process requires cross-disciplinary collaboration to allow maximal exploitation of advances in one sub-discipline for that of another. We will demonstrate this general concept with the specific example of merging language technologies and computational biology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We address the problem of robust formant tracking in continuous speech in the presence of additive noise. We propose a new approach based on mixture modeling of the formant contours. Our approach consists of two main steps: (i) Computation of a pyknogram based on multiband amplitude-modulation/frequency-modulation (AM/FM) decomposition of the input speech; and (ii) Statistical modeling of the pyknogram using mixture models. We experiment with both Gaussian mixture model (GMM) and Student's-t mixture model (tMM) and show that the latter is robust with respect to handling outliers in the pyknogram data, parameter selection, accuracy, and smoothness of the estimated formant contours. Experimental results on simulated data as well as noisy speech data show that the proposed tMM-based approach is also robust to additive noise. We present performance comparisons with a recently developed adaptive filterbank technique proposed in the literature and the classical Burg's spectral estimator technique, which show that the proposed technique is more robust to noise.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Automated image segmentation techniques are useful tools in biological image analysis and are an essential step in tracking applications. Typically, snakes or active contours are used for segmentation and they evolve under the influence of certain internal and external forces. Recently, a new class of shape-specific active contours have been introduced, which are known as Snakuscules and Ovuscules. These contours are based on a pair of concentric circles and ellipses as the shape templates, and the optimization is carried out by maximizing a contrast function between the outer and inner templates. In this paper, we present a unified approach to the formulation and optimization of Snakuscules and Ovuscules by considering a specific form of affine transformations acting on a pair of concentric circles. We show how the parameters of the affine transformation may be optimized for, to generate either Snakuscules or Ovuscules. Our approach allows for a unified formulation and relies only on generic regularization terms and not shape-specific regularization functions. We show how the calculations of the partial derivatives may be made efficient thanks to the Green's theorem. Results on synthesized as well as real data are presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes an algorithm for joint data detection and tracking of the dominant singular mode of a time varying channel at the transmitter and receiver of a time division duplex multiple input multiple output beamforming system. The method proposed is a modified expectation maximization algorithm which utilizes an initial estimate to track the dominant modes of the channel at the transmitter and the receiver blindly; and simultaneously detects the un known data. Furthermore, the estimates are constrained to be within a confidence interval of the previous estimate in order to improve the tracking performance and mitigate the effect of error propagation. Monte-Carlo simulation results of the symbol error rate and the mean square inner product between the estimated and the true singular vector are plotted to show the performance benefits offered by the proposed method compared to existing techniques.