61 resultados para speech databases

em Indian Institute of Science - Bangalore - Índia


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Automatic and accurate detection of the closure-burst transition events of stops and affricates serves many applications in speech processing. A temporal measure named the plosion index is proposed to detect such events, which are characterized by an abrupt increase in energy. Using the maxima of the pitch-synchronous normalized cross correlation as an additional temporal feature, a rule-based algorithm is designed that aims at selecting only those events associated with the closure-burst transitions of stops and affricates. The performance of the algorithm, characterized by receiver operating characteristic curves and temporal accuracy, is evaluated using the labeled closure-burst transitions of stops and affricates of the entire TIMIT test and training databases. The robustness of the algorithm is studied with respect to global white and babble noise as well as local noise using the TIMIT test set and on telephone quality speech using the NTIMIT test set. For these experiments, the proposed algorithm, which does not require explicit statistical training and is based on two one-dimensional temporal measures, gives a performance comparable to or better than the state-of-the-art methods. In addition, to test the scalability, the algorithm is applied on the Buckeye conversational speech corpus and databases of two Indian languages. (C) 2014 Acoustical Society of America.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Speech polarity detection is a crucial first step in many speech processing techniques. In this paper, an algorithm is proposed that improvises the existing technique using the skewness of the voice source (VS) signal. Here, the integrated linear prediction residual (ILPR) is used as the VS estimate, which is obtained using linear prediction on long-term frames of the low-pass filtered speech signal. This excludes the unvoiced regions from analysis and also reduces the computation. Further, a modified skewness measure is proposed for decision, which also considers the magnitude of the skewness of the ILPR along with its sign. With the detection error rate (DER) as the performance metric, the algorithm is tested on 8 large databases and its performance (DER=0.20%) is found to be comparable to that of the best technique (DER=0.06%) on both clean and noisy speech. Further, the proposed method is found to be ten times faster than the best technique.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Computation of the dependency basis is the fundamental step in solving the membership problem for functional dependencies (FDs) and multivalued dependencies (MVDs) in relational database theory. We examine this problem from an algebraic perspective. We introduce the notion of the inference basis of a set M of MVDs and show that it contains the maximum information about the logical consequences of M. We propose the notion of a dependency-lattice and develop an algebraic characterization of inference basis using simple notions from lattice theory. We also establish several interesting properties of dependency-lattices related to the implication problem. Founded on our characterization, we synthesize efficient algorithms for (a): computing the inference basis of a given set M of MVDs; (b): computing the dependency basis of a given attribute set w.r.t. M; and (c): solving the membership problem for MVDs. We also show that our results naturally extend to incorporate FDs also in a way that enables the solution of the membership problem for both FDs and MVDs put together. We finally show that our algorithms are more efficient than existing ones, when used to solve what we term the ‘generalized membership problem’.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Functional dependencies in relational databases are investigated. Eight binary relations, viz., (1) dependency relation, (2) equipotence relation, (3) dissidence relation, (4) completion relation, and dual relations of each of them are described. Any one of these eight relations can be used to represent the functional dependencies in a database. Results from linear graph theory are found helpful in obtaining these representations. The dependency relation directly gives the functional dependencies. The equipotence relation specifies the dependencies in terms of attribute sets which functionally determine each other. The dissidence relation specifies the dependencies in terms of saturated sets in a very indirect way. Completion relation represents the functional dependencies as a function, the range of which turns out to be a lattice. Depletion relation which is the dual of the completion relation can also represent functional dependencies and similarly can the duals of dependency, equipotence, and dissidence relations. The class of depleted sets, which is the dual of saturated sets, is defined and used in the study of depletion relations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We are addressing the problem of jointly using multiple noisy speech patterns for automatic speech recognition (ASR), given that they come from the same class. If the user utters a word K times, the ASR system should try to use the information content in all the K patterns of the word simultaneously and improve its speech recognition accuracy compared to that of the single pattern based speech recognition. T address this problem, recently we proposed a Multi Pattern Dynamic Time Warping (MPDTW) algorithm to align the K patterns by finding the least distortion path between them. A Constrained Multi Pattern Viterbi algorithm was used on this aligned path for isolated word recognition (IWR). In this paper, we explore the possibility of using only the MPDTW algorithm for IWR. We also study the properties of the MPDTW algorithm. We show that using only 2 noisy test patterns (10 percent burst noise at -5 dB SNR) reduces the noisy speech recognition error rate by 37.66 percent when compared to the single pattern recognition using the Dynamic Time Warping algorithm.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Functional dependencies in relational databases are investigated. Eight binary relations, viz., (1) dependency relation, (2) equipotence relation, (3) dissidence relation, (4) completion relation, and dual relations of each of them are described. Any one of these eight relations can be used to represent the functional dependencies in a database. Results from linear graph theory are found helpful in obtaining these representations. The dependency relation directly gives the functional dependencies. The equipotence relation specifies the dependencies in terms of attribute sets which functionally determine each other. The dissidence relation specifies the dependencies in terms of saturated sets in a very indirect way. Completion relation represents the functional dependencies as a function, the range of which turns out to be a lattice. Depletion relation which is the dual of the completion relation can also represent functional dependencies and similarly can the duals of dependency, equipotence, and dissidence relations. The class of depleted sets, which is the dual of saturated sets, is defined and used in the study of depletion relations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Compressive sensing (CS) has been proposed for signals with sparsity in a linear transform domain. We explore a signal dependent unknown linear transform, namely the impulse response matrix operating on a sparse excitation, as in the linear model of speech production, for recovering compressive sensed speech. Since the linear transform is signal dependent and unknown, unlike the standard CS formulation, a codebook of transfer functions is proposed in a matching pursuit (MP) framework for CS recovery. It is found that MP is efficient and effective to recover CS encoded speech as well as jointly estimate the linear model. Moderate number of CS measurements and low order sparsity estimate will result in MP converge to the same linear transform as direct VQ of the LP vector derived from the original signal. There is also high positive correlation between signal domain approximation and CS measurement domain approximation for a large variety of speech spectra.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a novel technique for robust voiced/unvoiced segment detection in noisy speech, based on local polynomial regression. The local polynomial model is well-suited for voiced segments in speech. The unvoiced segments are noise-like and do not exhibit any smooth structure. This property of smoothness is used for devising a new metric called the variance ratio metric, which, after thresholding, indicates the voiced/unvoiced boundaries with 75% accuracy for 0dB global signal-to-noise ratio (SNR). A novelty of our algorithm is that it processes the signal continuously, sample-by-sample rather than frame-by-frame. Simulation results on TIMIT speech database (downsampled to 8kHz) for various SNRs are presented to illustrate the performance of the new algorithm. Results indicate that the algorithm is robust even in high noise levels.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We investigate the use of a two stage transform vector quantizer (TSTVQ) for coding of line spectral frequency (LSF) parameters in wideband speech coding. The first stage quantizer of TSTVQ, provides better matching of source distribution and the second stage quantizer provides additional coding gain through using an individual cluster specific decorrelating transform and variance normalization. Further coding gain is shown to be achieved by exploiting the slow time-varying nature of speech spectra and thus using inter-frame cluster continuity (ICC) property in the first stage of TSTVQ method. The proposed method saves 3-4 bits and reduces the computational complexity by 58-66%, compared to the traditional split vector quantizer (SVQ), but at the expense of 1.5-2.5 times of memory.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We are addressing the novel problem of jointly evaluating multiple speech patterns for automatic speech recognition and training. We propose solutions based on both the non-parametric dynamic time warping (DTW) algorithm, and the parametric hidden Markov model (HMM). We show that a hybrid approach is quite effective for the application of noisy speech recognition. We extend the concept to HMM training wherein some patterns may be noisy or distorted. Utilizing the concept of ``virtual pattern'' developed for joint evaluation, we propose selective iterative training of HMMs. Evaluating these algorithms for burst/transient noisy speech and isolated word recognition, significant improvement in recognition accuracy is obtained using the new algorithms over those which do not utilize the joint evaluation strategy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We are addressing a new problem of improving automatic speech recognition performance, given multiple utterances of patterns from the same class. We have formulated the problem of jointly decoding K multiple patterns given a single Hidden Markov Model. It is shown that such a solution is possible by aligning the K patterns using the proposed Multi Pattern Dynamic Time Warping algorithm followed by the Constrained Multi Pattern Viterbi Algorithm The new formulation is tested in the context of speaker independent isolated word recognition for both clean and noisy patterns. When 10 percent of speech is affected by a burst noise at -5 dB Signal to Noise Ratio (local), it is shown that joint decoding using only two noisy patterns reduces the noisy speech recognition error rate to about 51 percent, when compared to the single pattern decoding using the Viterbi Algorithm. In contrast a simple maximization of individual pattern likelihoods, provides only about 7 percent reduction in error rate.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Considering a general linear model of signal degradation, by modeling the probability density function (PDF) of the clean signal using a Gaussian mixture model (GMM) and additive noise by a Gaussian PDF, we derive the minimum mean square error (MMSE) estimator. The derived MMSE estimator is non-linear and the linear MMSE estimator is shown to be a special case. For speech signal corrupted by independent additive noise, by modeling the joint PDF of time-domain speech samples of a speech frame using a GMM, we propose a speech enhancement method based on the derived MMSE estimator. We also show that the same estimator can be used for transform-domain speech enhancement.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a simple speech music discriminator that uses features based on HILN(Harmonics, Individual Lines and Noise) model. We have been able to test the strength of the feature set on a standard database of 66 files and get an accuracy of around 97%. We also have tested on sung queries and polyphonic music and have got very good results. The current algorithm is being used to discriminate between sung queries and played (using an instrument like flute) queries for a Query by Humming(QBH) system currently under development in the lab.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Non-uniform sampling of a signal is formulated as an optimization problem which minimizes the reconstruction signal error. Dynamic programming (DP) has been used to solve this problem efficiently for a finite duration signal. Further, the optimum samples are quantized to realize a speech coder. The quantizer and the DP based optimum search for non-uniform samples (DP-NUS) can be combined in a closed-loop manner, which provides distinct advantage over the open-loop formulation. The DP-NUS formulation provides a useful control over the trade-off between bitrate and performance (reconstruction error). It is shown that 5-10 dB SNR improvement is possible using DP-NUS compared to extrema sampling approach. In addition, the close-loop DP-NUS gives a 4-5 dB improvement in reconstruction error.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes a method of automated segmentation of speech assuming the signal is continuously time varying rather than the traditional short time stationary model. It has been shown that this representation gives comparable if not marginally better results than the other techniques for automated segmentation. A formulation of the 'Bach' (music semitonal) frequency scale filter-bank is proposed. A comparative study has been made of the performances using Mel, Bark and Bach scale filter banks considering this model. The preliminary results show up to 80 % matches within 20 ms of the manually segmented data, without any information of the content of the text and without any language dependence. 'Bach' filters are seen to marginally outperform the other filters.