871 resultados para Testes de audição
Resumo:
We address the problem of temporal envelope modeling for transient audio signals. We propose the Gamma distribution function (GDF) as a suitable candidate for modeling the envelope keeping in view some of its interesting properties such as asymmetry, causality, near-optimal time-bandwidth product, controllability of rise and decay, etc. The problem of finding the parameters of the GDF becomes a nonlinear regression problem. We overcome the hurdle by using a logarithmic envelope fit, which reduces the problem to one of linear regression. The logarithmic transformation also has the feature of dynamic range compression. Since temporal envelopes of audio signals are not uniformly distributed, in order to compute the amplitude, we investigate the importance of various loss functions for regression. Based on synthesized data experiments, wherein we have a ground truth, and real-world signals, we observe that the least-squares technique gives reasonably accurate amplitude estimates compared with other loss functions.
Resumo:
In this paper, we propose a new state transition based embedding (STBE) technique for audio watermarking with high fidelity. Furthermore, we propose a new correlation based encoding (CBE) scheme for binary logo image in order to enhance the payload capacity. The result of CBE is also compared with standard run-length encoding (RLE) compression and Huffman schemes. Most of the watermarking algorithms are based on modulating selected transform domain feature of an audio segment in order to embed given watermark bit. In the proposed STBE method instead of modulating feature of each and every segment to embed data, our aim is to retain the default value of this feature for most of the segments. Thus, a high quality of watermarked audio is maintained. Here, the difference between the mean values (Mdiff) of insignificant complex cepstrum transform (CCT) coefficients of down-sampled subsets is selected as a robust feature for embedding. Mdiff values of the frames are changed only when certain conditions are met. Hence, almost 50% of the times, segments are not changed and still STBE can convey watermark information at receiver side. STBE also exhibits a partial restoration feature by which the watermarked audio can be restored partially after extraction of the watermark at detector side. The psychoacoustic model analysis showed that the noise-masking ratio (NMR) of our system is less than -10dB. As amplitude scaling in time domain does not affect selected insignificant CCT coefficients, strong invariance towards amplitude scaling attacks is also proved theoretically. Experimental results reveal that the proposed watermarking scheme maintains high audio quality and are simultaneously robust to general attacks like MP3 compression, amplitude scaling, additive noise, re-quantization, etc.
Resumo:
This paper presents speaker normalization approaches for audio search task. Conventional state-of-the-art feature set, viz., Mel Frequency Cepstral Coefficients (MFCC) is known to contain speaker-specific and linguistic information implicitly. This might create problem for speaker-independent audio search task. In this paper, universal warping-based approach is used for vocal tract length normalization in audio search. In particular, features such as scale transform and warped linear prediction are used to compensate speaker variability in audio matching. The advantage of these features over conventional feature set is that they apply universal frequency warping for both the templates to be matched during audio search. The performance of Scale Transform Cepstral Coefficients (STCC) and Warped Linear Prediction Cepstral Coefficients (WLPCC) are about 3% higher than the state-of-the-art MFCC feature sets on TIMIT database.
Resumo:
In this paper we derive the a posteriori probability for the location of bursts of noise additively superimposed on a Gaussian AR process. The theory is developed to give a sequentially based restoration algorithm suitable for real-time applications. The algorithm is particularly appropriate for digital audio restoration, where clicks and scratches may be modelled as additive bursts of noise. Experiments are carried out on both real audio data and synthetic AR processes and Significant improvements are demonstrated over existing restoration techniques. © 1995 IEEE
Resumo:
Statistical model-based methods are presented for the reconstruction of autocorrelated signals in impulsive plus continuous noise environments. Signals are modelled as autoregressive and noise sources as discrete and continuous mixtures of Gaussians, allowing for robustness in highly impulsive and non-Gaussian environments. Markov Chain Monte Carlo methods are used for reconstruction of the corrupted waveforms within a Bayesian probabilistic framework and results are presented for contaminated voice and audio signals.
Resumo:
We present a statistical model-based approach to signal enhancement in the case of additive broadband noise. Because broadband noise is localised in neither time nor frequency, its removal is one of the most pervasive and difficult signal enhancement tasks. In order to improve perceived signal quality, we take advantage of human perception and define a best estimate of the original signal in terms of a cost function incorporating perceptual optimality criteria. We derive the resultant signal estimator and implement it in a short-time spectral attenuation framework. Audio examples, references, and further information may be found at http://www-sigproc.eng.cam.ac.uk/~pjw47.