986 resultados para speech signals
Resumo:
Automatic Speech Recognition (ASR) has matured into a technology which is becoming more common in our everyday lives, and is emerging as a necessity to minimise driver distraction when operating in-car systems such as navigation and infotainment. In “noise-free” environments, word recognition performance of these systems has been shown to approach 100%, however this performance degrades rapidly as the level of background noise is increased. Speech enhancement is a popular method for making ASR systems more ro- bust. Single-channel spectral subtraction was originally designed to improve hu- man speech intelligibility and many attempts have been made to optimise this algorithm in terms of signal-based metrics such as maximised Signal-to-Noise Ratio (SNR) or minimised speech distortion. Such metrics are used to assess en- hancement performance for intelligibility not speech recognition, therefore mak- ing them sub-optimal ASR applications. This research investigates two methods for closely coupling subtractive-type enhancement algorithms with ASR: (a) a computationally-efficient Mel-filterbank noise subtraction technique based on likelihood-maximisation (LIMA), and (b) in- troducing phase spectrum information to enable spectral subtraction in the com- plex frequency domain. Likelihood-maximisation uses gradient-descent to optimise parameters of the enhancement algorithm to best fit the acoustic speech model given a word se- quence known a priori. Whilst this technique is shown to improve the ASR word accuracy performance, it is also identified to be particularly sensitive to non-noise mismatches between the training and testing data. Phase information has long been ignored in spectral subtraction as it is deemed to have little effect on human intelligibility. In this work it is shown that phase information is important in obtaining highly accurate estimates of clean speech magnitudes which are typically used in ASR feature extraction. Phase Estimation via Delay Projection is proposed based on the stationarity of sinusoidal signals, and demonstrates the potential to produce improvements in ASR word accuracy in a wide range of SNR. Throughout the dissertation, consideration is given to practical implemen- tation in vehicular environments which resulted in two novel contributions – a LIMA framework which takes advantage of the grounding procedure common to speech dialogue systems, and a resource-saving formulation of frequency-domain spectral subtraction for realisation in field-programmable gate array hardware. The techniques proposed in this dissertation were evaluated using the Aus- tralian English In-Car Speech Corpus which was collected as part of this work. This database is the first of its kind within Australia and captures real in-car speech of 50 native Australian speakers in seven driving conditions common to Australian environments.
Resumo:
A new method for decomposition of compo,.~itsei gnals is presented. It is shown that high freyuency portion of composite signal spectrum possesses information on echo structure. The proposed technique does not assume the shape of basic wavelet and does not place any restrictions on the amplitudes and arrival times of echoes inm the composite signal. In the absence of noise any desirrd resolution can he obtained The effect of sampling rate and jFequency window function on echo resolutio.~ are di.wussed. Voiced speech segment is considered as an example of conzpxite sigrnl to demonstrate the application of the decomposition technique.
Resumo:
Many languages exploit suprasegmental devices in signaling word meaning. Tone languages exploit fundamental frequency whereas quantity languages rely on segmental durations to distinguish otherwise similar words. Traditionally, duration and tone have been taken as mutually exclusive. However, some evidence suggests that, in addition to durational cues, phonological quantity is associated with and co-signaled by changes in fundamental frequency in quantity languages such as Finnish, Estonian, and Serbo-Croat. The results from the present experiment show that the structure of disyllabic word stems in Finnish are indeed signaled tonally and that the phonological length of the stressed syllable is further tonally distinguished within the disyllabic sequence. The results further indicate that the observed association of tone and duration in perception is systematically exploited in speech production in Finnish.
Resumo:
In this paper, we develop a low-complexity message passing algorithm for joint support and signal recovery of approximately sparse signals. The problem of recovery of strictly sparse signals from noisy measurements can be viewed as a problem of recovery of approximately sparse signals from noiseless measurements, making the approach applicable to strictly sparse signal recovery from noisy measurements. The support recovery embedded in the approach makes it suitable for recovery of signals with same sparsity profiles, as in the problem of multiple measurement vectors (MMV). Simulation results show that the proposed algorithm, termed as JSSR-MP (joint support and signal recovery via message passing) algorithm, achieves performance comparable to that of sparse Bayesian learning (M-SBL) algorithm in the literature, at one order less complexity compared to the M-SBL algorithm.
Resumo:
In this work, we address the recovery of block sparse vectors with intra-block correlation, i.e., the recovery of vectors in which the correlated nonzero entries are constrained to lie in a few clusters, from noisy underdetermined linear measurements. Among Bayesian sparse recovery techniques, the cluster Sparse Bayesian Learning (SBL) is an efficient tool for block-sparse vector recovery, with intra-block correlation. However, this technique uses a heuristic method to estimate the intra-block correlation. In this paper, we propose the Nested SBL (NSBL) algorithm, which we derive using a novel Bayesian formulation that facilitates the use of the monotonically convergent nested Expectation Maximization (EM) and a Kalman filtering based learning framework. Unlike the cluster-SBL algorithm, this formulation leads to closed-form EMupdates for estimating the correlation coefficient. We demonstrate the efficacy of the proposed NSBL algorithm using Monte Carlo simulations.
Resumo:
We propose a two-dimensional (2-D) multicomponent amplitude-modulation, frequency-modulation (AM-FM) model for a spectrogram patch corresponding to voiced speech, and develop a new demodulation algorithm to effectively separate the AM, which is related to the vocal tract response, and the carrier, which is related to the excitation. The demodulation algorithm is based on the Riesz transform and is developed along the lines of Hilbert-transform-based demodulation for 1-D AM-FM signals. We compare the performance of the Riesz transform technique with that of the sinusoidal demodulation technique on real speech data. Experimental results show that the Riesz-transform-based demodulation technique represents spectrogram patches accurately. The spectrograms reconstructed from the demodulated AM and carrier are inverted and the corresponding speech signal is synthesized. The signal-to-noise ratio (SNR) of the reconstructed speech signal, with respect to clean speech, was found to be 2 to 4 dB higher in case of the Riesz transform technique than the sinusoidal demodulation technique.
Resumo:
In this paper we derive the a posteriori probability for the location of bursts of noise additively superimposed on a Gaussian AR process. The theory is developed to give a sequentially based restoration algorithm suitable for real-time applications. The algorithm is particularly appropriate for digital audio restoration, where clicks and scratches may be modelled as additive bursts of noise. Experiments are carried out on both real audio data and synthetic AR processes and Significant improvements are demonstrated over existing restoration techniques. © 1995 IEEE
Resumo:
In this paper methods are developed for enhancement and analysis of autoregressive moving average (ARMA) signals observed in additive noise which can be represented as mixtures of heavy-tailed non-Gaussian sources and a Gaussian background component. Such models find application in systems such as atmospheric communications channels or early sound recordings which are prone to intermittent impulse noise. Markov Chain Monte Carlo (MCMC) simulation techniques are applied to the joint problem of signal extraction, model parameter estimation and detection of impulses within a fully Bayesian framework. The algorithms require only simple linear iterations for all of the unknowns, including the MA parameters, which is in contrast with existing MCMC methods for analysis of noise-free ARMA models. The methods are illustrated using synthetic data and noise-degraded sound recordings.