90 resultados para Speech articulation tests

em Indian Institute of Science - Bangalore - Índia


Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper describes a spatio-temporal registration approach for speech articulation data obtained from electromagnetic articulography (EMA) and real-time Magnetic Resonance Imaging (rtMRI). This is motivated by the potential for combining the complementary advantages of both types of data. The registration method is validated on EMA and rtMRI datasets obtained at different times, but using the same stimuli. The aligned corpus offers the advantages of high temporal resolution (from EMA) and a complete mid-sagittal view (from rtMRI). The co-registration also yields optimum placement of EMA sensors as articulatory landmarks on the magnetic resonance images, thus providing richer spatio-temporal information about articulatory dynamics. (C) 2014 Acoustical Society of America

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose apractical, feature-level and score-level fusion approach by combining acoustic and estimated articulatory information for both text independent and text dependent speaker verification. From a practical point of view, we study how to improve speaker verification performance by combining dynamic articulatory information with the conventional acoustic features. On text independent speaker verification, we find that concatenating articulatory features obtained from measured speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the performance dramatically. However, since directly measuring articulatory data is not feasible in many real world applications, we also experiment with estimated articulatory features obtained through acoustic-to-articulatory inversion. We explore both feature level and score level fusion methods and find that the overall system performance is significantly enhanced even with estimated articulatory features. Such a performance boost could be due to the inter-speaker variation information embedded in the estimated articulatory features. Since the dynamics of articulation contain important information, we included inverted articulatory trajectories in text dependent speaker verification. We demonstrate that the articulatory constraints introduced by inverted articulatory features help to reject wrong password trials and improve the performance after score level fusion. We evaluate the proposed methods on the X-ray Microbeam database and the RSR 2015 database, respectively, for the aforementioned two tasks. Experimental results show that we achieve more than 15% relative equal error rate reduction for both speaker verification tasks. (C) 2015 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

By using small scale model tests, the interference effect on the ultimate bearing capacity of two closely spaced strip footings, placed on the surface of dry sand, was investigated. At any time, the footings were assumed to (1) carry exactly the same magnitude of load; and (2) settle to the same extent. No tilt of the footing was allowed. The effect of clear spacing (s) between two footings was explicitly studied. An interference of footings leads to a significant increase in their bearing capacity; the interference effect becomes even more substantial with an increase in the relative density of sand. The bearing capacity attains a peak magnitude at a certain (critical) spacing between two footings. The experimental observations presented in this technical note were similar to those given by different available theories. However, in a quantitative sense, the difference between the experiments and theories was seen to be still significant and it emphasizes the need of doing a further rigorous analysis in which the effect of stress level on the shear strength parameters of soil mass can be incorporated properly.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We are addressing the problem of jointly using multiple noisy speech patterns for automatic speech recognition (ASR), given that they come from the same class. If the user utters a word K times, the ASR system should try to use the information content in all the K patterns of the word simultaneously and improve its speech recognition accuracy compared to that of the single pattern based speech recognition. T address this problem, recently we proposed a Multi Pattern Dynamic Time Warping (MPDTW) algorithm to align the K patterns by finding the least distortion path between them. A Constrained Multi Pattern Viterbi algorithm was used on this aligned path for isolated word recognition (IWR). In this paper, we explore the possibility of using only the MPDTW algorithm for IWR. We also study the properties of the MPDTW algorithm. We show that using only 2 noisy test patterns (10 percent burst noise at -5 dB SNR) reduces the noisy speech recognition error rate by 37.66 percent when compared to the single pattern recognition using the Dynamic Time Warping algorithm.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

analysis of a complex physical problem and the close agreement they achieved with observations. However, the following points need to be clarified. First of all the authors assume that during the initial phases of expansion, the Tayior's instability sets in due to the acceleraacceleration of lighter fluid against the more dense cold water.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Various field test (namely vibration tests on blocks or plates, steady-state vibration or Rayleigh wave tests, wave propagation tests, and cyclic load tests) were conducted at a number of sites in India to determine the dynamic shear modulus, G. Data obtained at different sites are described. The values of G obtained from the different tests at a given site vary widely. The rational approach for selecting the value of G from field tests for use in the analysis and design of soil-structure interaction problems under dynamic loads must account for the factors affecting G. The suggested approach, which provides a possible answer, is suitable in cohesionless soils below the water table where it is rather difficult, if not impossible, to obtain undisturbed samples.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Compressive sensing (CS) has been proposed for signals with sparsity in a linear transform domain. We explore a signal dependent unknown linear transform, namely the impulse response matrix operating on a sparse excitation, as in the linear model of speech production, for recovering compressive sensed speech. Since the linear transform is signal dependent and unknown, unlike the standard CS formulation, a codebook of transfer functions is proposed in a matching pursuit (MP) framework for CS recovery. It is found that MP is efficient and effective to recover CS encoded speech as well as jointly estimate the linear model. Moderate number of CS measurements and low order sparsity estimate will result in MP converge to the same linear transform as direct VQ of the LP vector derived from the original signal. There is also high positive correlation between signal domain approximation and CS measurement domain approximation for a large variety of speech spectra.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a novel technique for robust voiced/unvoiced segment detection in noisy speech, based on local polynomial regression. The local polynomial model is well-suited for voiced segments in speech. The unvoiced segments are noise-like and do not exhibit any smooth structure. This property of smoothness is used for devising a new metric called the variance ratio metric, which, after thresholding, indicates the voiced/unvoiced boundaries with 75% accuracy for 0dB global signal-to-noise ratio (SNR). A novelty of our algorithm is that it processes the signal continuously, sample-by-sample rather than frame-by-frame. Simulation results on TIMIT speech database (downsampled to 8kHz) for various SNRs are presented to illustrate the performance of the new algorithm. Results indicate that the algorithm is robust even in high noise levels.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We investigate the use of a two stage transform vector quantizer (TSTVQ) for coding of line spectral frequency (LSF) parameters in wideband speech coding. The first stage quantizer of TSTVQ, provides better matching of source distribution and the second stage quantizer provides additional coding gain through using an individual cluster specific decorrelating transform and variance normalization. Further coding gain is shown to be achieved by exploiting the slow time-varying nature of speech spectra and thus using inter-frame cluster continuity (ICC) property in the first stage of TSTVQ method. The proposed method saves 3-4 bits and reduces the computational complexity by 58-66%, compared to the traditional split vector quantizer (SVQ), but at the expense of 1.5-2.5 times of memory.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We are addressing the novel problem of jointly evaluating multiple speech patterns for automatic speech recognition and training. We propose solutions based on both the non-parametric dynamic time warping (DTW) algorithm, and the parametric hidden Markov model (HMM). We show that a hybrid approach is quite effective for the application of noisy speech recognition. We extend the concept to HMM training wherein some patterns may be noisy or distorted. Utilizing the concept of ``virtual pattern'' developed for joint evaluation, we propose selective iterative training of HMMs. Evaluating these algorithms for burst/transient noisy speech and isolated word recognition, significant improvement in recognition accuracy is obtained using the new algorithms over those which do not utilize the joint evaluation strategy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We are addressing a new problem of improving automatic speech recognition performance, given multiple utterances of patterns from the same class. We have formulated the problem of jointly decoding K multiple patterns given a single Hidden Markov Model. It is shown that such a solution is possible by aligning the K patterns using the proposed Multi Pattern Dynamic Time Warping algorithm followed by the Constrained Multi Pattern Viterbi Algorithm The new formulation is tested in the context of speaker independent isolated word recognition for both clean and noisy patterns. When 10 percent of speech is affected by a burst noise at -5 dB Signal to Noise Ratio (local), it is shown that joint decoding using only two noisy patterns reduces the noisy speech recognition error rate to about 51 percent, when compared to the single pattern decoding using the Viterbi Algorithm. In contrast a simple maximization of individual pattern likelihoods, provides only about 7 percent reduction in error rate.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Considering a general linear model of signal degradation, by modeling the probability density function (PDF) of the clean signal using a Gaussian mixture model (GMM) and additive noise by a Gaussian PDF, we derive the minimum mean square error (MMSE) estimator. The derived MMSE estimator is non-linear and the linear MMSE estimator is shown to be a special case. For speech signal corrupted by independent additive noise, by modeling the joint PDF of time-domain speech samples of a speech frame using a GMM, we propose a speech enhancement method based on the derived MMSE estimator. We also show that the same estimator can be used for transform-domain speech enhancement.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider the problem of detecting statistically significant sequential patterns in multineuronal spike trains. These patterns are characterized by ordered sequences of spikes from different neurons with specific delays between spikes. We have previously proposed a data-mining scheme to efficiently discover such patterns, which occur often enough in the data. Here we propose a method to determine the statistical significance of such repeating patterns. The novelty of our approach is that we use a compound null hypothesis that not only includes models of independent neurons but also models where neurons have weak dependencies. The strength of interaction among the neurons is represented in terms of certain pair-wise conditional probabilities. We specify our null hypothesis by putting an upper bound on all such conditional probabilities. We construct a probabilistic model that captures the counting process and use this to derive a test of significance for rejecting such a compound null hypothesis. The structure of our null hypothesis also allows us to rank-order different significant patterns. We illustrate the effectiveness of our approach using spike trains generated with a simulator.