127 resultados para speech signals
Resumo:
Considering a general linear model of signal degradation, by modeling the probability density function (PDF) of the clean signal using a Gaussian mixture model (GMM) and additive noise by a Gaussian PDF, we derive the minimum mean square error (MMSE) estimator. The derived MMSE estimator is non-linear and the linear MMSE estimator is shown to be a special case. For speech signal corrupted by independent additive noise, by modeling the joint PDF of time-domain speech samples of a speech frame using a GMM, we propose a speech enhancement method based on the derived MMSE estimator. We also show that the same estimator can be used for transform-domain speech enhancement.
Resumo:
With the increased utilization of advanced composites in strategic industries, the concept of Structural Health Monitoring (SHM) with its inherent advantages is gaining ground over the conventional methods of NDE and NDI. The most attractive feature of this concept is on-line evaluation using embedded sensors. Consequently, development of methodologies with identification of appropriate sensors such as PVDF films becomes the key for exploiting the new concept. And, of the methods used for on-line evaluation acoustic emission has been most effective. Thus, Acoustic Emission (AE) generated during static tensile loading of glass fiber reinforced plastic composites was monitored using a Polyvinylidene fluoride (PVDF) film sensor. The frequency response of the film sensor was obtained with pencil lead breakage tests to choose the appropriate band of operation. The specimen considered for the experiments were chosen to characterize the differences in the operation of the failure mechanisms through AE parametric analysis. The results of the investigations can be characterized using AE parameter indicating that a PVDF film sensor was effective as an AE sensor used in structural health monitoring on-line.
Resumo:
We consider the problem of signal estimation where the observed time series is modeled as y(i) = x(i) + s(i) with {x(i)} being an orbit of a chaotic self-map on a compact subset of R-d and {s(i)} a sequence in R-d converging to zero. This model is motivated by experimental results in the literature where the ocean ambient noise and the ocean clutter are found to be chaotic. Making use of observations up to time n, we propose an estimate of s(i) for i < n and show that it approaches s(i) as n -> infinity for typical asymptotic behaviors of orbits. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Non-Gaussianity of signals/noise often results in significant performance degradation for systems, which are designed using the Gaussian assumption. So non-Gaussian signals/noise require a different modelling and processing approach. In this paper, we discuss a new Bayesian estimation technique for non-Gaussian signals corrupted by colored non Gaussian noise. The method is based on using zero mean finite Gaussian Mixture Models (GMMs) for signal and noise. The estimation is done using an adaptive non-causal nonlinear filtering technique. The method involves deriving an estimator in terms of the GMM parameters, which are in turn estimated using the EM algorithm. The proposed filter is of finite length and offers computational feasibility. The simulations show that the proposed method gives a significant improvement compared to the linear filter for a wide variety of noise conditions, including impulsive noise. We also claim that the estimation of signal using the correlation with past and future samples leads to reduced mean squared error as compared to signal estimation based on past samples only.
Resumo:
We propose a simple speech music discriminator that uses features based on HILN(Harmonics, Individual Lines and Noise) model. We have been able to test the strength of the feature set on a standard database of 66 files and get an accuracy of around 97%. We also have tested on sung queries and polyphonic music and have got very good results. The current algorithm is being used to discriminate between sung queries and played (using an instrument like flute) queries for a Query by Humming(QBH) system currently under development in the lab.
Resumo:
Non-uniform sampling of a signal is formulated as an optimization problem which minimizes the reconstruction signal error. Dynamic programming (DP) has been used to solve this problem efficiently for a finite duration signal. Further, the optimum samples are quantized to realize a speech coder. The quantizer and the DP based optimum search for non-uniform samples (DP-NUS) can be combined in a closed-loop manner, which provides distinct advantage over the open-loop formulation. The DP-NUS formulation provides a useful control over the trade-off between bitrate and performance (reconstruction error). It is shown that 5-10 dB SNR improvement is possible using DP-NUS compared to extrema sampling approach. In addition, the close-loop DP-NUS gives a 4-5 dB improvement in reconstruction error.
Resumo:
This correspondence describes a method for automated segmentation of speech. The method proposed in this paper uses a specially designed filter-bank called Bach filter-bank which makes use of 'music' related perception criteria. The speech signal is treated as continuously time varying signal as against a short time stationary model. A comparative study has been made of the performances using Mel, Bark and Bach scale filter banks. The preliminary results show up to 80 % matches within 20 ms of the manually segmented data, without any information of the content of the text and without any language dependence. The Bach filters are seen to marginally outperform the other filters.
Resumo:
Joint decoding of multiple speech patterns so as to improve speech recognition performance is important, especially in the presence of noise. In this paper, we propose a Multi-Pattern Viterbi algorithm (MPVA) to jointly decode and recognize multiple speech patterns for automatic speech recognition (ASR). The MPVA is a generalization of the Viterbi Algorithm to jointly decode multiple patterns given a Hidden Markov Model (HMM). Unlike the previously proposed two stage Constrained Multi-Pattern Viterbi Algorithm (CMPVA),the MPVA is a single stage algorithm. MPVA has the advantage that it cart be extended to connected word recognition (CWR) and continuous speech recognition (CSR) problems. MPVA is shown to provide better speech recognition performance than the earlier techniques: using only two repetitions of noisy speech patterns (-5 dB SNR, 10% burst noise), the word error rate using MPVA decreased by 28.5%, when compared to using individual decoding. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Inverse filters are conventionally used for resolving overlapping signals of identical waveshape. However, the inverse filtering approach is shown to be useful for resolving overlapping signals, identical or otherwise, of unknown waveshapes. Digital inverse filter design based on autocorrelation formulation of linear prediction is known to perform optimum spectral flattening of the input signal for which the filter is designed. This property of the inverse filter is used to accomplish composite signal decomposition. The theory has been presented assuming constituent signals to be responses of all-pole filters. However, the approach may be used for a general situation.
Resumo:
Homomorphic analysis and pole-zero modeling of electrocardiogram (ECG) signals are presented in this paper. Four typical ECG signals are considered and deconvolved into their minimum and maximum phase components through cepstral filtering, with a view to study the possibility of more efficient feature selection from the component signals for diagnostic purposes. The complex cepstra of the signals are linearly filtered to extract the basic wavelet and the excitation function. The ECG signals are, in general, mixed phase and hence, exponential weighting is done to aid deconvolution of the signals. The basic wavelet for normal ECG approximates the action potential of the muscle fiber of the heart and the excitation function corresponds to the excitation pattern of the heart muscles during a cardiac cycle. The ECG signals and their components are pole-zero modeled and the pole-zero pattern of the models can give a clue to classify the normal and abnormal signals. Besides, storing only the parameters of the model can result in a data reduction of more than 3:1 for normal signals sampled at a moderate 128 samples/s
Resumo:
Fractal Dimensions (FD) are one of the popular measures used for characterizing signals. They have been used as complexity measures of signals in various fields including speech and biomedical applications. However, proper interpretation of such analyses has not been thoroughly addressed. In this paper, we study the effect of various signal properties on FD and interpret results in terms of classical signal processing concepts such as amplitude, frequency, number of harmonics, noise power and signal bandwidth. We have used Higuchi's method for estimating FDs. This study may help in gaining a better understanding of the FD complexity measure itself, and for interpreting changing structural complexity of signals in terms of FD. Our results indicate that FD is a useful measure in quantifying structural changes in signal properties.
Resumo:
We consider the possibility of fingerprinting the presence of heavy additional Z' bosons that arise naturally in extensions of the standard model such as E-6 models and left-right symmetric models, through their mixing with the standard model Z boson. By considering a class of observables including total cross sections, energy distributions and angular distributions of decay leptons we find significant deviation from the standard model predictions for these quantities with right-handed electrons and left-handed positrons at root s= 800GeV. The deviations being less pronounced at smaller centre of mass energies as the models are already tightly constrained. Our work suggests that the ILC should have a strong beam polarization physics program particularly with these configurations. On the other hand, a forward backward asymmetry and lepton fraction in the backward direction are more sensitive to new physics with realistic polarization due to interesting interplay with the neutrino t-channel diagram. This process complements the study of fermion pair production processes that have been considered for discrimination between these models.
Resumo:
A new method based on unit continuity metric (UCM) is proposed for optimal unit selection in text-to-speech (TTS) synthesis. UCM employs two features, namely, pitch continuity metric and spectral continuity metric. The methods have been implemented and tested on our test bed called MILE-TTS and it is available as web demo. After verification by a self selection test, the algorithms are evaluated on 8 paragraphs each for Kannada and Tamil by native users of the languages. Mean-opinion-score (MOS) shows that naturalness and comprehension are better with UCM based algorithm than the non-UCM based ones. The naturalness of the TTS output is further enhanced by a new rule based algorithm for pause prediction for Tamil language. The pauses between the words are predicted based on parts-of-speech information obtained from the input text.
Resumo:
The EEG time series has been subjected to various formalisms of analysis to extract meaningful information regarding the underlying neural events. In this paper the linear prediction (LP) method has been used for analysis and presentation of spectral array data for the better visualisation of background EEG activity. It has also been used for signal generation, efficient data storage and transmission of EEG. The LP method is compared with the standard Fourier method of compressed spectral array (CSA) of the multichannel EEG data. The autocorrelation autoregressive (AR) technique is used for obtaining the LP coefficients with a model order of 15. While the Fourier method reduces the data only by half, the LP method just requires the storage of signal variance and LP coefficients. The signal generated using white Gaussian noise as the input to the LP filter has a high correlation coefficient of 0.97 with that of original signal, thus making LP as a useful tool for storage and transmission of EEG. The biological significance of Fourier method and the LP method in respect to the microstructure of neuronal events in the generation of EEG is discussed.
Resumo:
We propose F-norm of the cross-correlation part of the array covariance matrix as a measure of correlation between the impinging signals and study the performance of different decorrelation methods in the broadband case using this measure. We first show that dimensionality of the composite signal subspace, defined as the number of significant eigenvectors of the source sample covariance matrix, collapses in the presence of multipath and the spatial smoothing recovers this dimensionality. Using an upper bound on the proposed measure, we then study the decorrelation of the broadband signals with spatial smoothing and the effect of spacing and directions of the sources on the rate of decorrelation with progressive smoothing. Next, we introduce a weighted smoothing method based on Toeplitz-block-Toeplitz (TBT) structuring of the data covariance matrix which decorrelates the signals much faster than the spatial smoothing. Computer simulations are included to demonstrate the performance of the two methods.