244 resultados para Speech Processing


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Processing maps have been developed for hot deformation of Mg-2Zn-1Mn alloy in as-cast condition and after homogenization with a view to evaluate the influence of homogenization. Hot compression data in the temperature range 300-500degreesC and strain rate range 0.001-100 s(-1) were used for generating the processing map. In the map for the as-cast alloy the domain of dynamic recrystallization occurring, at 450degreesC and 0.1 s(-1) has merged with another domain occurring at 500degreesC and 0.001 s(-1) representing grain boundary cracking. The latter domain is eliminated by homogenization and the dynamic recrystallization domain expanded with a higher peak efficiency occurring at 500 degreesC and 0.05 s(-1). The flow localization occurring at strain rates higher than 5 s(-1) is unaffected by homogenization.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The processing maps are being developed for use in optimising hot workability and controlling the microstructure of the product. The present investigation deals with the examination to assess the prediction of the processing maps for a 15Cr-15Ni-2.2Mo-0.3Ti austenitic stainless steel using forging and rolling tests at different temperatures in the range of 600-1200 degreesC. The tensile properties of these deformed products were evaluated at room temperature. The influence of the processing conditions, i.e. strain rate and temperature on the tensile properties of the deformed product were analysed to identify the optimum processing parameters. The results have shown good agreement between the regimes exhibited by the map and the properties of the rolled or forged product. The optimum parameters for processing of this steel were identified as rolling or press forging at temperatures above 1050 degreesC to obtain optimum product properties. (C) 2002 Elsevier Science B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The paper describes a modular, unit selection based TTS framework, which can be used as a research bed for developing TTS in any new language, as well as studying the effect of changing any parameter during synthesis. Using this framework, TTS has been developed for Tamil. Synthesis database consists of 1027 phonetically rich prerecorded sentences. This framework has already been tested for Kannada. Our TTS synthesizes intelligible and acceptably natural speech, as supported by high mean opinion scores. The framework is further optimized to suit embedded applications like mobiles and PDAs. We compressed the synthesis speech database with standard speech compression algorithms used in commercial GSM phones and evaluated the quality of the resultant synthesized sentences. Even with a highly compressed database, the synthesized output is perceptually close to that with uncompressed database. Through experiments, we explored the ambiguities in human perception when listening to Tamil phones and syllables uttered in isolation,thus proposing to exploit the misperception to substitute for missing phone contexts in the database. Listening experiments have been conducted on sentences synthesized by deliberately replacing phones with their confused ones.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Traditional subspace based speech enhancement (SSE)methods use linear minimum mean square error (LMMSE) estimation that is optimal if the Karhunen Loeve transform (KLT) coefficients of speech and noise are Gaussian distributed. In this paper, we investigate the use of Gaussian mixture (GM) density for modeling the non-Gaussian statistics of the clean speech KLT coefficients. Using Gaussian mixture model (GMM), the optimum minimum mean square error (MMSE) estimator is found to be nonlinear and the traditional LMMSE estimator is shown to be a special case. Experimental results show that the proposed method provides better enhancement performance than the traditional subspace based methods.Index Terms: Subspace based speech enhancement, Gaussian mixture density, MMSE estimation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We formulate a two-stage Iterative Wiener filtering (IWF) approach to speech enhancement, bettering the performance of constrained IWF, reported in literature. The codebook constrained IWF (CCIWF) has been shown to be effective in achieving convergence of IWF in the presence of both stationary and non-stationary noise. To this, we include a second stage of unconstrained IWF and show that the speech enhancement performance can be improved in terms of average segmental SNR (SSNR), Itakura-Saito (IS) distance and Linear Prediction Coefficients (LPC) parameter coincidence. We also explore the tradeoff between the number of CCIWF iterations and the second stage IWF iterations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Effective feature extraction for robust speech recognition is a widely addressed topic and currently there is much effort to invoke non-stationary signal models instead of quasi-stationary signal models leading to standard features such as LPC or MFCC. Joint amplitude modulation and frequency modulation (AM-FM) is a classical non-parametric approach to non-stationary signal modeling and recently new feature sets for automatic speech recognition (ASR) have been derived based on a multi-band AM-FM representation of the signal. We consider several of these representations and compare their performances for robust speech recognition in noise, using the AURORA-2 database. We show that FEPSTRUM representation proposed is more effective than others. We also propose an improvement to FEPSTRUM based on the Teager energy operator (TEO) and show that it can selectively outperform even FEPSTRUM

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Segmental dynamic time warping (DTW) has been demonstrated to be a useful technique for finding acoustic similarity scores between segments of two speech utterances. Due to its high computational requirements, it had to be computed in an offline manner, limiting the applications of the technique. In this paper, we present results of parallelization of this task by distributing the workload in either a static or dynamic way on an 8-processor cluster and discuss the trade-offs among different distribution schemes. We show that online unsupervised pattern discovery using segmental DTW is plausible with as low as 8 processors. This brings the task within reach of today's general purpose multi-core servers. We also show results on a 32-processor system, and discuss factors affecting scalability of our methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We develop a Gaussian mixture model (GMM) based vector quantization (VQ) method for coding wideband speech line spectrum frequency (LSF) parameters at low complexity. The PDF of LSF source vector is modeled using the Gaussian mixture (GM) density with higher number of uncorrelated Gaussian mixtures and an optimum scalar quantizer (SQ) is designed for each Gaussian mixture. The reduction of quantization complexity is achieved using the relevant subset of available optimum SQs. For an input vector, the subset of quantizers is chosen using nearest neighbor criteria. The developed method is compared with the recent VQ methods and shown to provide high quality rate-distortion (R/D) performance at lower complexity. In addition, the developed method also provides the advantages of bitrate scalability and rate-independent complexity.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Considering a general linear model of signal degradation, by modeling the probability density function (PDF) of the clean signal using a Gaussian mixture model (GMM) and additive noise by a Gaussian PDF, we derive the minimum mean square error (MMSE) estimator.The derived MMSE estimator is non-linear and the linear MMSE estimator is shown to be a special case. For speech signal corrupted by independent additive noise, by modeling the joint PDF of time-domain speech samples of a speech frame using a GMM, we propose a speech enhancement method based on the derived MMSE estimator. We also show that the same estimator can be used for transform-domain speech enhancement.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

High-rate analysis of channel-optimized vector quantizationThis paper considers the high-rate performance of channel optimized source coding for noisy discrete symmetric channels with random index assignment. Specifically, with mean squared error (MSE) as the performance metric, an upper bound on the asymptotic (i.e., high-rate) distortion is derived by assuming a general structure on the codebook. This structure enables extension of the analysis of the channel optimized source quantizer to one with a singular point density: for channels with small errors, the point density that minimizes the upper bound is continuous, while as the error rate increases, the point density becomes singular. The extent of the singularity is also characterized. The accuracy of the expressions obtained are verified through Monte Carlo simulations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We address the problem of estimating the fundamental frequency of voiced speech. We present a novel solution motivated by the importance of amplitude modulation in sound processing and speech perception. The new algorithm is based on a cumulative spectrum computed from the temporal envelope of various subbands. We provide theoretical analysis to derive the new pitch estimator based on the temporal envelope of the bandpass speech signal. We report extensive experimental performance for synthetic as well as natural vowels for both realworld noisy and noise-free data. Experimental results show that the new technique performs accurate pitch estimation and is robust to noise. We also show that the technique is superior to the autocorrelation technique for pitch estimation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The properties of widely used Ni-Ti-based shape memory alloys (SMAs) are highly sensitive to the underlying microstructure. Hence, controlling the evolution of microstructure during high-temperature deformation becomes important. In this article, the ``processing maps'' approach is utilized to identify the combination of temperature and strain rate for thermomechanical processing of a Ni(42)Ti(50)Cu(8) SMA. Uniaxial compression experiments were conducted in the temperature range of 800-1050 degrees C and at strain rate range of 10(-3) and 10(2) s(-1). Two-dimensional power dissipation efficiency and instability maps have been generated and various deformation mechanisms, which operate in different temperature and strain rate regimes, were identified with the aid of the maps and complementary microstructural analysis of the deformed specimens. Results show that the safe window for industrial processing of this alloy is in the range of 800-850 degrees C and at 0.1 s(-1), which leads to grain refinement and strain-free grains. Regions of the instability were identified, which result in strained microstructure, which in turn can affect the performance of the SMA.