244 resultados para Speech Processing


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This correspondence describes a method for automated segmentation of speech. The method proposed in this paper uses a specially designed filter-bank called Bach filter-bank which makes use of 'music' related perception criteria. The speech signal is treated as continuously time varying signal as against a short time stationary model. A comparative study has been made of the performances using Mel, Bark and Bach scale filter banks. The preliminary results show up to 80 % matches within 20 ms of the manually segmented data, without any information of the content of the text and without any language dependence. The Bach filters are seen to marginally outperform the other filters.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Joint decoding of multiple speech patterns so as to improve speech recognition performance is important, especially in the presence of noise. In this paper, we propose a Multi-Pattern Viterbi algorithm (MPVA) to jointly decode and recognize multiple speech patterns for automatic speech recognition (ASR). The MPVA is a generalization of the Viterbi Algorithm to jointly decode multiple patterns given a Hidden Markov Model (HMM). Unlike the previously proposed two stage Constrained Multi-Pattern Viterbi Algorithm (CMPVA),the MPVA is a single stage algorithm. MPVA has the advantage that it cart be extended to connected word recognition (CWR) and continuous speech recognition (CSR) problems. MPVA is shown to provide better speech recognition performance than the earlier techniques: using only two repetitions of noisy speech patterns (-5 dB SNR, 10% burst noise), the word error rate using MPVA decreased by 28.5%, when compared to using individual decoding. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A new method based on unit continuity metric (UCM) is proposed for optimal unit selection in text-to-speech (TTS) synthesis. UCM employs two features, namely, pitch continuity metric and spectral continuity metric. The methods have been implemented and tested on our test bed called MILE-TTS and it is available as web demo. After verification by a self selection test, the algorithms are evaluated on 8 paragraphs each for Kannada and Tamil by native users of the languages. Mean-opinion-score (MOS) shows that naturalness and comprehension are better with UCM based algorithm than the non-UCM based ones. The naturalness of the TTS output is further enhanced by a new rule based algorithm for pause prediction for Tamil language. The pauses between the words are predicted based on parts-of-speech information obtained from the input text.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we present a new speech enhancement approach, that is based on exploiting the intra-frame dependency of discrete cosine transform (DCT) domain coefficients. It can be noted that the existing enhancement techniques treat the transformdomain coefficients independently. Instead of this traditional approach of independently processing the scalars, we split the DCT domain noisy speech vector into sub-vectors and each sub-vector is enhanced independently. Through this sub-vector based approach, the higher dimensional enhancement advantage, viz. non-linear dependency, is exploited. In the developed method, each clean speech sub-vector is modeled using a Gaussian mixture (GM) density. We show that the proposed Gaussian mixture model (GMM) based DCT domain method, using sub-vector processing approach, provides better performance than the conventional approach of enhancing the transform domain scalar components independently. Performance improvement over the recently proposed GMM based time domain approach is also shown.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We introduce a novel temporal feature of a signal, namely extrema-based signal track length (ESTL) for the problem of speech segmentation. We show that ESTL measure is sensitive to both amplitude and frequency of the signal. The short-time ESTL (ST_ESTL) shows a promising way to capture the significant segments of speech signal, where the segments correspond to acoustic units of speech having distinct temporal waveforms. We compare ESTL based segmentation with ML and STM methods and find that it is as good as spectral feature based segmentation, but with lesser computational complexity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper considers the high-rate performance of source coding for noisy discrete symmetric channels with random index assignment (IA). Accurate analytical models are developed to characterize the expected distortion performance of vector quantization (VQ) for a large class of distortion measures. It is shown that when the point density is continuous, the distortion can be approximated as the sum of the source quantization distortion and the channel-error induced distortion. Expressions are also derived for the continuous point density that minimizes the expected distortion. Next, for the case of mean squared error distortion, a more accurate analytical model for the distortion is derived by allowing the point density to have a singular component. The extent of the singularity is also characterized. These results provide analytical models for the expected distortion performance of both conventional VQ as well as for channel-optimized VQ. As a practical example, compression of the linear predictive coding parameters in the wideband speech spectrum is considered, with the log spectral distortion as performance metric. The theory is able to correctly predict the channel error rate that is permissible for operation at a particular level of distortion.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We analyze the spectral zero-crossing rate (SZCR) properties of transient signals and show that SZCR contains accurate localization information about the transient. For a train of pulses containing transient events, the SZCR computed on a sliding window basis is useful in locating the impulse locations accurately. We present the properties of SZCR on standard stylized signal models and then show how it may be used to estimate the epochs in speech signals. We also present comparisons with some state-of-the-art techniques that are based on the group-delay function. Experiments on real speech show that the proposed SZCR technique is better than other group-delay-based epoch detectors. In the presence of noise, a comparison with the zero-frequency filtering technique (ZFF) and Dynamic programming projected Phase-Slope Algorithm (DYPSA) showed that performance of the SZCR technique is better than DYPSA and inferior to that of ZFF. For highpass-filtered speech, where ZFF performance suffers drastically, the identification rates of SZCR are better than those of DYPSA.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The goal of speech enhancement algorithms is to provide an estimate of clean speech starting from noisy observations. The often-employed cost function is the mean square error (MSE). However, the MSE can never be computed in practice. Therefore, it becomes necessary to find practical alternatives to the MSE. In image denoising problems, the cost function (also referred to as risk) is often replaced by an unbiased estimator. Motivated by this approach, we reformulate the problem of speech enhancement from the perspective of risk minimization. Some recent contributions in risk estimation have employed Stein's unbiased risk estimator (SURE) together with a parametric denoising function, which is a linear expansion of threshold/bases (LET). We show that the first-order case of SURE-LET results in a Wiener-filter type solution if the denoising function is made frequency-dependent. We also provide enhancement results obtained with both techniques and characterize the improvement by means of local as well as global SNR calculations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Narrowband spectrograms of voiced speech can be modeled as an outcome of two-dimensional (2-D) modulation process. In this paper, we develop a demodulation algorithm to estimate the 2-D amplitude modulation (AM) and carrier of a given spectrogram patch. The demodulation algorithm is based on the Riesz transform, which is a unitary, shift-invariant operator and is obtained as a 2-D extension of the well known 1-D Hilbert transform operator. Existing methods for spectrogram demodulation rely on extension of sinusoidal demodulation method from the communications literature and require precise estimate of the 2-D carrier. On the other hand, the proposed method based on Riesz transform does not require a carrier estimate. The proposed method and the sinusoidal demodulation scheme are tested on real speech data. Experimental results show that the demodulated AM and carrier from Riesz demodulation represent the spectrogram patch more accurately compared with those obtained using the sinusoidal demodulation. The signal-to-reconstruction error ratio was found to be about 2 to 6 dB higher in case of the proposed demodulation approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Binaural hearing studies show that the auditory system uses the phase-difference information in the auditory stimuli for localization of a sound source. Motivated by this finding, we present a method for demodulation of amplitude-modulated-frequency-modulated (AM-FM) signals using a ignal and its arbitrary phase-shifted version. The demodulation is achieved using two allpass filters, whose impulse responses are related through the fractional Hilbert transform (FrHT). The allpass filters are obtained by cosine-modulation of a zero-phase flat-top prototype halfband lowpass filter. The outputs of the filters are combined to construct an analytic signal (AS) from which the AM and FM are estimated. We show that, under certain assumptions on the signal and the filter structures, the AM and FM can be obtained exactly. The AM-FM calculations are based on the quasi-eigenfunction approximation. We then extend the concept to the demodulation of multicomponent signals using uniform and non-uniform cosine-modulated filterbank (FB) structures consisting of flat bandpass filters, including the uniform cosine-modulated, equivalent rectangular bandwidth (ERB), and constant-Q filterbanks. We validate the theoretical calculations by considering application on synthesized AM-FM signals and compare the performance in presence of noise with three other multiband demodulation techniques, namely, the Teager-energy-based approach, the Gabor's AS approach, and the linear transduction filter approach. We also show demodulation results for real signals.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose a two-dimensional (2-D) multicomponent amplitude-modulation, frequency-modulation (AM-FM) model for a spectrogram patch corresponding to voiced speech, and develop a new demodulation algorithm to effectively separate the AM, which is related to the vocal tract response, and the carrier, which is related to the excitation. The demodulation algorithm is based on the Riesz transform and is developed along the lines of Hilbert-transform-based demodulation for 1-D AM-FM signals. We compare the performance of the Riesz transform technique with that of the sinusoidal demodulation technique on real speech data. Experimental results show that the Riesz-transform-based demodulation technique represents spectrogram patches accurately. The spectrograms reconstructed from the demodulated AM and carrier are inverted and the corresponding speech signal is synthesized. The signal-to-noise ratio (SNR) of the reconstructed speech signal, with respect to clean speech, was found to be 2 to 4 dB higher in case of the Riesz transform technique than the sinusoidal demodulation technique.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We address the problem of separating a speech signal into its excitation and vocal-tract filter components, which falls within the framework of blind deconvolution. Typically, the excitation in case of voiced speech is assumed to be sparse and the vocal-tract filter stable. We develop an alternating l(p) - l(2) projections algorithm (ALPA) to perform deconvolution taking into account these constraints. The algorithm is iterative, and alternates between two solution spaces. The initialization is based on the standard linear prediction decomposition of a speech signal into an autoregressive filter and prediction residue. In every iteration, a sparse excitation is estimated by optimizing an l(p)-norm-based cost and the vocal-tract filter is derived as a solution to a standard least-squares minimization problem. We validate the algorithm on voiced segments of natural speech signals and show applications to epoch estimation. We also present comparisons with state-of-the-art techniques and show that ALPA gives a sparser impulse-like excitation, where the impulses directly denote the epochs or instants of significant excitation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The development of a microstructure in 304L stainless steel during industrial hot-forming operations, including press forging (mean strain rate of 0.15 s(-1)), rolling/extrusion (2-5 s(-1)), and hammer forging (100 s(-1)) at different temperatures in the range 600-1200 degrees C, was studied with a view to validating the predictions of the processing map. The results have shown that excellent correlation exists between the regimes exhibited by the map and the product microstructures. 304L stainless steel exhibits instability bands when hammer forged at temperatures below 1100 degrees C, rolled/extruded below 1000 degrees C, or press forged below 800 degrees C. All of these conditions must be avoided in mechanical processing of the material. On the other hand, ideally, the material may be rolled, extruded, or press forged at 1200 degrees C to obtain a defect-free microstructure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The hot deformation behavior of hot isostatically pressed (HIPd) P/M IN-100 superalloy has been studied in the temperature range 1000-1200 degrees C and strain rate range 0.0003-10 s(-1) using hot compression testing. A processing map has been developed on the basis of these data and using the principles of dynamic materials modelling. The map exhibited three domains: one at 1050 degrees C and 0.01 s(-1), with a peak efficiency of power dissipation of approximate to 32%, the second at 1150 degrees C and 10 s(-1), with a peak efficiency of approximate to 36% and the third at 1200 degrees C and 0.1 s(-1), with a similar efficiency. On the basis of optical and electron microscopic observations, the first domain was interpreted to represent dynamic recovery of the gamma phase, the second domain represents dynamic recrystallization (DRX) of gamma in the presence of softer gamma', while the third domain represents DRX of the gamma phase only. The gamma' phase is stable upto 1150 degrees C, gets deformed below this temperature and the chunky gamma' accumulates dislocations, which at larger strains cause cracking of this phase. At temperatures lower than 1080 degrees C and strain rates higher than 0.1 s(-1), the material exhibits flow instability, manifested in the form of adiabatic shear bands. The material may be subjected to mechanical processing without cracking or instabilities at 1200 degrees C and 0.1 s(-1), which are the conditions for DRX of the gamma phase.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An overview of the synthesis of materials under microwave irradiation has been presented based on the work performed recently. A variety of reactions such as direct combination, carbothermal reduction, carbidation and nitridation have been described. Examples of microwave preparation of glasses are also presented. Great advantages of fast, clean and reduced reaction temperature of microwave methods are emphasized. The example of ZrO2-CeO2 ceramics has been used show the extraordinarily fast and effective sintering which occurs in microwave irradiation.