105 resultados para Speech in Noise
em Indian Institute of Science - Bangalore - Índia
Resumo:
It has been shown that the conventional practice of designing a compensated hot wire amplifier with a fixed ceiling to floor ratio results in considerable and unnecessary increase in noise level at compensation settings other than optimum (which is at the maximum compensation at the highest frequency of interest). The optimum ceiling to floor ratio has been estimated to be between 1.5-2.0 ωmaxM. Application of the above considerations to an amplifier in which the ceiling to floor ratio is optimized at each compensation setting (for a given amplifier band-width), shows the usefulness of the method in improving the signal to noise ratio.
Resumo:
Traditional subspace based speech enhancement (SSE)methods use linear minimum mean square error (LMMSE) estimation that is optimal if the Karhunen Loeve transform (KLT) coefficients of speech and noise are Gaussian distributed. In this paper, we investigate the use of Gaussian mixture (GM) density for modeling the non-Gaussian statistics of the clean speech KLT coefficients. Using Gaussian mixture model (GMM), the optimum minimum mean square error (MMSE) estimator is found to be nonlinear and the traditional LMMSE estimator is shown to be a special case. Experimental results show that the proposed method provides better enhancement performance than the traditional subspace based methods.Index Terms: Subspace based speech enhancement, Gaussian mixture density, MMSE estimation.
Resumo:
Effective feature extraction for robust speech recognition is a widely addressed topic and currently there is much effort to invoke non-stationary signal models instead of quasi-stationary signal models leading to standard features such as LPC or MFCC. Joint amplitude modulation and frequency modulation (AM-FM) is a classical non-parametric approach to non-stationary signal modeling and recently new feature sets for automatic speech recognition (ASR) have been derived based on a multi-band AM-FM representation of the signal. We consider several of these representations and compare their performances for robust speech recognition in noise, using the AURORA-2 database. We show that FEPSTRUM representation proposed is more effective than others. We also propose an improvement to FEPSTRUM based on the Teager energy operator (TEO) and show that it can selectively outperform even FEPSTRUM
Resumo:
In this paper we propose a postprocessing technique for a spectrogram diffusion based harmonic/percussion decom- position algorithm. The proposed technique removes har- monic instrument leakages in the percussion enhanced out- puts of the baseline algorithm. The technique uses median filtering and an adaptive detection of percussive segments in subbands followed by piecewise signal reconstruction using envelope properties to ensure that percussion is enhanced while harmonic leakages are suppressed. A new binary mask is created for the percussion signal which upon applying on the original signal improves harmonic versus percussion separation. We compare our algorithm with two recent techniques and show that on a database of polyphonic Indian music, the postprocessing algorithm improves the harmonic versus percussion decomposition significantly.
Resumo:
In this paper, we propose a new sub-band approach to estimate the glottal activity. The method is based on the spectral harmonicity and the sub-band temporal properties of voiced speech. We propose a method to represent glottal excitation signal using sub-band temporal envelope. Instants of maximum glottal excitation or Glottal Closure Instants (GCI) are extracted from the estimated glottal excitation pattern and the result is compared with a standard GCI computation method, DYPSA [1]. The performance of the algorithm is also compared for the noisy signal and it is shown that the proposed method is less variant to GCI estimation under noisy conditions compared to DYPSA. The algorithm is evaluated on the CMU-ARCTIC database.
Resumo:
We present the study of low-frequency noise, or 1/f noise, in degenerately doped Si: P and Ge: P delta-layers at low temperatures. For the Si: P d-layers we find that the noise is several orders of magnitude lower than that of bulk Si: P systems in the metallic regime and is one of the lowest values reported for doped semiconductors. Ge: P d-layers as a function of perpendicular magnetic field, shows a factor of two reduction in noise magnitude at the scale of B-phi, where B-phi is phase breaking field. We show that this is a characteristic feature of universal conductance fluctuations.
Resumo:
The authors report a detailed investigation of the flicker noise (1/f noise) in graphene films obtained from chemical vapour deposition (CVD) and chemical reduction of graphene oxide. The authors find that in the case of polycrystalline graphene films grown by CVD, the grain boundaries and other structural defects are the dominant source of noise by acting as charged trap centres resulting in huge increase in noise as compared with that of exfoliated graphene. A study of the kinetics of defects in hydrazine-reduced graphene oxide (RGO) films as a function of the extent of reduction showed that for longer hydrazine treatment time strong localised crystal defects are introduced in RGO, whereas the RGO with shorter hydrazine treatment showed the presence of large number of mobile defects leading to higher noise amplitude.
Resumo:
Speech enhancement in stationary noise is addressed using the ideal channel selection framework. In order to estimate the binary mask, we propose to classify each time-frequency (T-F) bin of the noisy signal as speech or noise using Discriminative Random Fields (DRF). The DRF function contains two terms - an enhancement function and a smoothing term. On each T-F bin, we propose to use an enhancement function based on likelihood ratio test for speech presence, while Ising model is used as smoothing function for spectro-temporal continuity in the estimated binary mask. The effect of the smoothing function over successive iterations is found to reduce musical noise as opposed to using only enhancement function. The binary mask is inferred from the noisy signal using Iterated Conditional Modes (ICM) algorithm. Sentences from NOIZEUS corpus are evaluated from 0 dB to 15 dB Signal to Noise Ratio (SNR) in 4 kinds of additive noise settings: additive white Gaussian noise, car noise, street noise and pink noise. The reconstructed speech using the proposed technique is evaluated in terms of average segmental SNR, Perceptual Evaluation of Speech Quality (PESQ) and Mean opinion Score (MOS).
Resumo:
We present low-frequency electrical resistance fluctuations, or noise, in graphene-based field-effect devices with varying number of layers. In single-layer devices, the noise magnitude decreases with increasing carrier density, which behaved oppositely in the devices with two or larger number of layers accompanied by a suppression in noise magnitude by more than two orders in the latter case. This behavior can be explained from the influence of external electric field on graphene band structure, and provides a simple transport-based route to isolate single-layer graphene devices from those with multiple layers. ©2009 American Institute of Physics
Resumo:
One of the most important applications of adaptive systems is in noise cancellation using adaptive filters. Ln this paper, we propose adaptive noise cancellation schemes for the enhancement of EEG signals in the presence of EOG artifacts. The effect of two reference inputs is studied on simulated as well as recorded EEG signals and it is found that one reference input is enough to get sufficient minimization of EOG artifacts. This has been verified through correlation analysis also. We use signal to noise ratio and linear prediction spectra, along with time plots, for comparing the performance of the proposed schemes for minimizing EOG artifacts from contaminated EEG signals. Results show that the proposed schemes are very effective (especially the one which employs Newton's method) in minimizing the EOG artifacts from contaminated EEG signals.
Resumo:
This paper describes a predictive model for breakout noise from an elliptical duct or shell of finite length. The transmission mechanism is essentially that of ``mode coupling'', whereby higher structural modes in the duct walls get excited because of non-circularity of the wall. Effect of geometry has been taken care of by evaluating Fourier coefficients of the radius of curvature. The noise radiated from the duct walls is represented by that from a finite vibrating length of a semi infinite cylinder in a free field. Emphasis is on understanding the physics of the problem as well as analytical modeling. The analytical model is validated with 3-D FEM. Effects of the ovality, curvature, and axial terminations of the duct have been demonstrated. (C) 2010 Institute of Noise Control Engineering.
Resumo:
We address the problem of robust formant tracking in continuous speech in the presence of additive noise. We propose a new approach based on mixture modeling of the formant contours. Our approach consists of two main steps: (i) Computation of a pyknogram based on multiband amplitude-modulation/frequency-modulation (AM/FM) decomposition of the input speech; and (ii) Statistical modeling of the pyknogram using mixture models. We experiment with both Gaussian mixture model (GMM) and Student's-t mixture model (tMM) and show that the latter is robust with respect to handling outliers in the pyknogram data, parameter selection, accuracy, and smoothness of the estimated formant contours. Experimental results on simulated data as well as noisy speech data show that the proposed tMM-based approach is also robust to additive noise. We present performance comparisons with a recently developed adaptive filterbank technique proposed in the literature and the classical Burg's spectral estimator technique, which show that the proposed technique is more robust to noise.
Resumo:
The classical approach to A/D conversion has been uniform sampling and we get perfect reconstruction for bandlimited signals by satisfying the Nyquist Sampling Theorem. We propose a non-uniform sampling scheme based on level crossing (LC) time information. We show stable reconstruction of bandpass signals with correct scale factor and hence a unique reconstruction from only the non-uniform time information. For reconstruction from the level crossings we make use of the sparse reconstruction based optimization by constraining the bandpass signal to be sparse in its frequency content. While overdetermined system of equations is resorted to in the literature we use an undetermined approach along with sparse reconstruction formulation. We could get a reconstruction SNR > 20dB and perfect support recovery with probability close to 1, in noise-less case and with lower probability in the noisy case. Random picking of LC from different levels over the same limited signal duration and for the same length of information, is seen to be advantageous for reconstruction.
Resumo:
A joint analysis-synthesis framework is developed for the compressive sensing (CS) recovery of speech signals. The signal is assumed to be sparse in the residual domain with the linear prediction filter used as the sparse transformation. Importantly this transform is not known apriori, since estimating the predictor filter requires the knowledge of the signal. Two prediction filters, one comb filter for pitch and another all pole formant filter are needed to induce maximum sparsity. An iterative method is proposed for the estimation of both the prediction filters and the signal itself. Formant prediction filter is used as the synthesis transform, while the pitch filter is used to model the periodicity in the residual excitation signal, in the analysis mode. Significant improvement in the LLR measure is seen over the previously reported formant filter estimation.