92 resultados para Speech Acoustics


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The problem of human detection is challenging, more so, when faced with adverse conditions such as occlusion and background clutter. This paper addresses the problem of human detection by representing an extracted feature of an image using a sparse linear combination of chosen dictionary atoms. The detection along with the scale finding, is done by using the coefficients obtained from sparse representation. This is of particular interest as we address the problem of scale using a scale-embedded dictionary where the conventional methods detect the object by running the detection window at all scales.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The classical approach to A/D conversion has been uniform sampling and we get perfect reconstruction for bandlimited signals by satisfying the Nyquist Sampling Theorem. We propose a non-uniform sampling scheme based on level crossing (LC) time information. We show stable reconstruction of bandpass signals with correct scale factor and hence a unique reconstruction from only the non-uniform time information. For reconstruction from the level crossings we make use of the sparse reconstruction based optimization by constraining the bandpass signal to be sparse in its frequency content. While overdetermined system of equations is resorted to in the literature we use an undetermined approach along with sparse reconstruction formulation. We could get a reconstruction SNR > 20dB and perfect support recovery with probability close to 1, in noise-less case and with lower probability in the noisy case. Random picking of LC from different levels over the same limited signal duration and for the same length of information, is seen to be advantageous for reconstruction.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Acoustic modeling using mixtures of multivariate Gaussians is the prevalent approach for many speech processing problems. Computing likelihoods against a large set of Gaussians is required as a part of many speech processing systems and it is the computationally dominant phase for Large Vocabulary Continuous Speech Recognition (LVCSR) systems. We express the likelihood computation as a multiplication of matrices representing augmented feature vectors and Gaussian parameters. The computational gain of this approach over traditional methods is by exploiting the structure of these matrices and efficient implementation of their multiplication. In particular, we explore direct low-rank approximation of the Gaussian parameter matrix and indirect derivation of low-rank factors of the Gaussian parameter matrix by optimum approximation of the likelihood matrix. We show that both the methods lead to similar speedups but the latter leads to far lesser impact on the recognition accuracy. Experiments on 1,138 work vocabulary RM1 task and 6,224 word vocabulary TIMIT task using Sphinx 3.7 system show that, for a typical case the matrix multiplication based approach leads to overall speedup of 46 % on RM1 task and 115 % for TIMIT task. Our low-rank approximation methods provide a way for trading off recognition accuracy for a further increase in computational performance extending overall speedups up to 61 % for RM1 and 119 % for TIMIT for an increase of word error rate (WER) from 3.2 to 3.5 % for RM1 and for no increase in WER for TIMIT. We also express pairwise Euclidean distance computation phase in Dynamic Time Warping (DTW) in terms of matrix multiplication leading to saving of approximately of computational operations. In our experiments using efficient implementation of matrix multiplication, this leads to a speedup of 5.6 in computing the pairwise Euclidean distances and overall speedup up to 3.25 for DTW.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes an algorithm for joint data detection and tracking of the dominant singular mode of a time varying channel at the transmitter and receiver of a time division duplex multiple input multiple output beamforming system. The method proposed is a modified expectation maximization algorithm which utilizes an initial estimate to track the dominant modes of the channel at the transmitter and the receiver blindly; and simultaneously detects the un known data. Furthermore, the estimates are constrained to be within a confidence interval of the previous estimate in order to improve the tracking performance and mitigate the effect of error propagation. Monte-Carlo simulation results of the symbol error rate and the mean square inner product between the estimated and the true singular vector are plotted to show the performance benefits offered by the proposed method compared to existing techniques.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper considers the problem of weak signal detection in the presence of navigation data bits for Global Navigation Satellite System (GNSS) receivers. Typically, a set of partial coherent integration outputs are non-coherently accumulated to combat the effects of model uncertainties such as the presence of navigation data-bits and/or frequency uncertainty, resulting in a sub-optimal test statistic. In this work, the test-statistic for weak signal detection is derived in the presence of navigation data-bits from the likelihood ratio. It is highlighted that averaging the likelihood ratio based test-statistic over the prior distributions of the unknown data bits and the carrier phase uncertainty leads to the conventional Post Detection Integration (PDI) technique for detection. To improve the performance in the presence of model uncertainties, a novel cyclostationarity based sub-optimal PDI technique is proposed. The test statistic is analytically characterized, and shown to be robust to the presence of navigation data-bits, frequency, phase and noise uncertainties. Monte Carlo simulation results illustrate the validity of the theoretical results and the superior performance offered by the proposed detector in the presence of model uncertainties.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The notion of the 1-D analytic signal is well understood and has found many applications. At the heart of the analytic signal concept is the Hilbert transform. The problem in extending the concept of analytic signal to higher dimensions is that there is no unique multidimensional definition of the Hilbert transform. Also, the notion of analyticity is not so well under stood in higher dimensions. Of the several 2-D extensions of the Hilbert transform, the spiral-phase quadrature transform or the Riesz transform seems to be the natural extension and has attracted a lot of attention mainly due to its isotropic properties. From the Riesz transform, Larkin et al. constructed a vortex operator, which approximates the quadratures based on asymptotic stationary-phase analysis. In this paper, we show an alternative proof for the quadrature approximation property by invoking the quasi-eigenfunction property of linear, shift-invariant systems. We show that the vortex operator comes up as a natural consequence of applying this property. We also characterize the quadrature approximation error in terms of its energy as well as the peak spatial-domain error. Such results are available for 1-D signals, but their counter part for 2-D signals have not been provided. We also provide simulation results to supplement the analytical calculations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a novel approach to represent transients using spectral-domain amplitude-modulated/frequency -modulated (AM-FM) functions. The model is applied to the real and imaginary parts of the Fourier transform (FT) of the transient. The suitability of the model lies in the observation that since transients are well-localized in time, the real and imaginary parts of the Fourier spectrum have a modulation structure. The spectral AM is the envelope and the spectral FM is the group delay function. The group delay is estimated using spectral zero-crossings and the spectral envelope is estimated using a coherent demodulator. We show that the proposed technique is robust to additive noise. We present applications of the proposed technique to castanets and stop-consonants in speech.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider the speech production mechanism and the asso- ciated linear source-filter model. For voiced speech sounds in particular, the source/glottal excitation is modeled as a stream of impulses and the filter as a cascade of second-order resonators. We show that the process of sampling speech signals can be modeled as filtering a stream of Dirac impulses (a model for the excitation) with a kernel function (the vocal tract response),and then sampling uniformly. We show that the problem of esti- mating the excitation is equivalent to the problem of recovering a stream of Dirac impulses from samples of a filtered version. We present associated algorithms based on the annihilating filter and also make a comparison with the classical linear prediction technique, which is well known in speech analysis. Results on synthesized as well as natural speech data are presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We address the problem of speech enhancement in real-world noisy scenarios. We propose to solve the problem in two stages, the first comprising a generalized spectral subtraction technique, followed by a sequence of perceptually-motivated post-processing algorithms. The role of the post-processing algorithms is to compensate for the effects of noise as well as to suppress any artifacts created by the first-stage processing. The key post-processing mechanisms are aimed at suppressing musical noise and to enhance the formant structure of voiced speech as well as to denoise the linear-prediction residual. The parameter values in the techniques are fixed optimally by experimentally evaluating the enhancement performance as a function of the parameters. We used the Carnegie-Mellon university Arctic database for our experiments. We considered three real-world noise types: fan noise, car noise, and motorbike noise. The enhancement performance was evaluated by conducting listening experiments on 12 subjects. The listeners reported a clear improvement (MOS improvement of 0.5 on an average) over the noisy signal in the perceived quality (increase in the mean-opinion score (MOS)) for positive signal-to-noise-ratios (SNRs). For negative SNRs, however, the improvement was found to be marginal.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we propose a postprocessing technique for a spectrogram diffusion based harmonic/percussion decom- position algorithm. The proposed technique removes har- monic instrument leakages in the percussion enhanced out- puts of the baseline algorithm. The technique uses median filtering and an adaptive detection of percussive segments in subbands followed by piecewise signal reconstruction using envelope properties to ensure that percussion is enhanced while harmonic leakages are suppressed. A new binary mask is created for the percussion signal which upon applying on the original signal improves harmonic versus percussion separation. We compare our algorithm with two recent techniques and show that on a database of polyphonic Indian music, the postprocessing algorithm improves the harmonic versus percussion decomposition significantly.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We analyze the spectral zero-crossing rate (SZCR) properties of transient signals and show that SZCR contains accurate localization information about the transient. For a train of pulses containing transient events, the SZCR computed on a sliding window basis is useful in locating the impulse locations accurately. We present the properties of SZCR on standard stylized signal models and then show how it may be used to estimate the epochs in speech signals. We also present comparisons with some state-of-the-art techniques that are based on the group-delay function. Experiments on real speech show that the proposed SZCR technique is better than other group-delay-based epoch detectors. In the presence of noise, a comparison with the zero-frequency filtering technique (ZFF) and Dynamic programming projected Phase-Slope Algorithm (DYPSA) showed that performance of the SZCR technique is better than DYPSA and inferior to that of ZFF. For highpass-filtered speech, where ZFF performance suffers drastically, the identification rates of SZCR are better than those of DYPSA.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The goal of speech enhancement algorithms is to provide an estimate of clean speech starting from noisy observations. The often-employed cost function is the mean square error (MSE). However, the MSE can never be computed in practice. Therefore, it becomes necessary to find practical alternatives to the MSE. In image denoising problems, the cost function (also referred to as risk) is often replaced by an unbiased estimator. Motivated by this approach, we reformulate the problem of speech enhancement from the perspective of risk minimization. Some recent contributions in risk estimation have employed Stein's unbiased risk estimator (SURE) together with a parametric denoising function, which is a linear expansion of threshold/bases (LET). We show that the first-order case of SURE-LET results in a Wiener-filter type solution if the denoising function is made frequency-dependent. We also provide enhancement results obtained with both techniques and characterize the improvement by means of local as well as global SNR calculations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We address the problem of speech enhancement using a risk- estimation approach. In particular, we propose the use the Stein’s unbiased risk estimator (SURE) for solving the problem. The need for a suitable finite-sample risk estimator arises because the actual risks invariably depend on the unknown ground truth. We consider the popular mean-squared error (MSE) criterion first, and then compare it against the perceptually-motivated Itakura-Saito (IS) distortion, by deriving unbiased estimators of the corresponding risks. We use a generalized SURE (GSURE) development, recently proposed by Eldar for MSE. We consider dependent observation models from the exponential family with an additive noise model,and derive an unbiased estimator for the risk corresponding to the IS distortion, which is non-quadratic. This serves to address the speech enhancement problem in a more general setting. Experimental results illustrate that the IS metric is efficient in suppressing musical noise, which affects the MSE-enhanced speech. However, in terms of global signal-to-noise ratio (SNR), the minimum MSE solution gives better results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we propose a new sub-band approach to estimate the glottal activity. The method is based on the spectral harmonicity and the sub-band temporal properties of voiced speech. We propose a method to represent glottal excitation signal using sub-band temporal envelope. Instants of maximum glottal excitation or Glottal Closure Instants (GCI) are extracted from the estimated glottal excitation pattern and the result is compared with a standard GCI computation method, DYPSA [1]. The performance of the algorithm is also compared for the noisy signal and it is shown that the proposed method is less variant to GCI estimation under noisy conditions compared to DYPSA. The algorithm is evaluated on the CMU-ARCTIC database.