967 resultados para spectral subtraction
Resumo:
In this paper we propose a new method for utilising phase information by complementing it with traditional magnitude-only spectral subtraction speech enhancement through Complex Spectrum Subtraction (CSS). The proposed approach has the following advantages over traditional magnitude-only spectral subtraction: (a) it introduces complementary information to the enhancement algorithm; (b) it reduces the total number of algorithmic parameters, and; (c) is designed for improving clean speech magnitude spectra and is therefore suitable for both automatic speech recognition (ASR) and speech perception applications. Oracle-based ASR experiments verify this approach, showing an average of 20% relative word accuracy improvements when accurate estimates of the phase spectrum are available. Based on sinusoidal analysis and assuming stationarity between observations (which is shown to be better approximated as the frame rate is increased), this paper also proposes a novel method for acquiring the phase information called Phase Estimation via Delay Projection (PEDEP). Further oracle ASR experiments validate the potential for the proposed PEDEP technique in ideal conditions. Realistic implementation of CSS with PEDEP shows performance comparable to state of the art spectral subtraction techniques in a range of 15-20 dB signal-to-noise ratio environments. These results clearly demonstrate the potential for using phase spectra in spectral subtractive enhancement applications, and at the same time highlight the need for deriving more accurate phase estimates in a wider range of noise conditions.
Resumo:
Speech recognition in car environments has been identified as a valuable means for reducing driver distraction when operating non-critical in-car systems. Likelihood-maximising (LIMA) frameworks optimise speech enhancement algorithms based on recognised state sequences rather than traditional signal-level criteria such as maximising signal-to-noise ratio. Previously presented LIMA frameworks require calibration utterances to generate optimised enhancement parameters which are used for all subsequent utterances. Sub-optimal recognition performance occurs in noise conditions which are significantly different from that present during the calibration session - a serious problem in rapidly changing noise environments. We propose a dialog-based design which allows regular optimisation iterations in order to track the changing noise conditions. Experiments using Mel-filterbank spectral subtraction are performed to determine the optimisation requirements for vehicular environments and show that minimal optimisation assists real-time operation with improved speech recognition accuracy. It is also shown that the proposed design is able to provide improved recognition performance over frameworks incorporating a calibration session.
Resumo:
Automatic Speech Recognition (ASR) has matured into a technology which is becoming more common in our everyday lives, and is emerging as a necessity to minimise driver distraction when operating in-car systems such as navigation and infotainment. In “noise-free” environments, word recognition performance of these systems has been shown to approach 100%, however this performance degrades rapidly as the level of background noise is increased. Speech enhancement is a popular method for making ASR systems more ro- bust. Single-channel spectral subtraction was originally designed to improve hu- man speech intelligibility and many attempts have been made to optimise this algorithm in terms of signal-based metrics such as maximised Signal-to-Noise Ratio (SNR) or minimised speech distortion. Such metrics are used to assess en- hancement performance for intelligibility not speech recognition, therefore mak- ing them sub-optimal ASR applications. This research investigates two methods for closely coupling subtractive-type enhancement algorithms with ASR: (a) a computationally-efficient Mel-filterbank noise subtraction technique based on likelihood-maximisation (LIMA), and (b) in- troducing phase spectrum information to enable spectral subtraction in the com- plex frequency domain. Likelihood-maximisation uses gradient-descent to optimise parameters of the enhancement algorithm to best fit the acoustic speech model given a word se- quence known a priori. Whilst this technique is shown to improve the ASR word accuracy performance, it is also identified to be particularly sensitive to non-noise mismatches between the training and testing data. Phase information has long been ignored in spectral subtraction as it is deemed to have little effect on human intelligibility. In this work it is shown that phase information is important in obtaining highly accurate estimates of clean speech magnitudes which are typically used in ASR feature extraction. Phase Estimation via Delay Projection is proposed based on the stationarity of sinusoidal signals, and demonstrates the potential to produce improvements in ASR word accuracy in a wide range of SNR. Throughout the dissertation, consideration is given to practical implemen- tation in vehicular environments which resulted in two novel contributions – a LIMA framework which takes advantage of the grounding procedure common to speech dialogue systems, and a resource-saving formulation of frequency-domain spectral subtraction for realisation in field-programmable gate array hardware. The techniques proposed in this dissertation were evaluated using the Aus- tralian English In-Car Speech Corpus which was collected as part of this work. This database is the first of its kind within Australia and captures real in-car speech of 50 native Australian speakers in seven driving conditions common to Australian environments.
Resumo:
Voice recognition is one of the key enablers to reduce driver distraction as in-vehicle systems become more and more complex. With the integration of voice recognition in vehicles, safety and usability are improved as the driver’s eyes and hands are not required to operate system controls. Whilst speaker independent voice recognition is well developed, performance in high noise environments (e.g. vehicles) is still limited. La Trobe University and Queensland University of Technology have developed a low-cost hardware-based speech enhancement system for automotive environments based on spectral subtraction and delay–sum beamforming techniques. The enhancement algorithms have been optimised using authentic Australian English collected under typical driving conditions. Performance tests conducted using speech data collected under variety of vehicle noise conditions demonstrate a word recognition rate improvement in the order of 10% or more under the noisiest conditions. Currently developed to a proof of concept stage there is potential for even greater performance improvement.
Resumo:
The use of visual features in the form of lip movements to improve the performance of acoustic speech recognition has been shown to work well, particularly in noisy acoustic conditions. However, whether this technique can outperform speech recognition incorporating well-known acoustic enhancement techniques, such as spectral subtraction, or multi-channel beamforming is not known. This is an important question to be answered especially in an automotive environment, for the design of an efficient human-vehicle computer interface. We perform a variety of speech recognition experiments on a challenging automotive speech dataset and results show that synchronous HMM-based audio-visual fusion can outperform traditional single as well as multi-channel acoustic speech enhancement techniques. We also show that further improvement in recognition performance can be obtained by fusing speech-enhanced audio with the visual modality, demonstrating the complementary nature of the two robust speech recognition approaches.
Resumo:
We address the problem of speech enhancement in real-world noisy scenarios. We propose to solve the problem in two stages, the first comprising a generalized spectral subtraction technique, followed by a sequence of perceptually-motivated post-processing algorithms. The role of the post-processing algorithms is to compensate for the effects of noise as well as to suppress any artifacts created by the first-stage processing. The key post-processing mechanisms are aimed at suppressing musical noise and to enhance the formant structure of voiced speech as well as to denoise the linear-prediction residual. The parameter values in the techniques are fixed optimally by experimentally evaluating the enhancement performance as a function of the parameters. We used the Carnegie-Mellon university Arctic database for our experiments. We considered three real-world noise types: fan noise, car noise, and motorbike noise. The enhancement performance was evaluated by conducting listening experiments on 12 subjects. The listeners reported a clear improvement (MOS improvement of 0.5 on an average) over the noisy signal in the perceived quality (increase in the mean-opinion score (MOS)) for positive signal-to-noise-ratios (SNRs). For negative SNRs, however, the improvement was found to be marginal.
Resumo:
This paper describes a speech enhancement system (SES) based on a TMS320C31 digital signal processor (DSP) for real-time application. The SES algorithm is based on a modified spectral subtraction method and a new speech activity detector (SAD) is used. The system presents a medium computational load and a sampling rate up to 18 kHz can be used. The goal is load and a sampling rate up to 18 kHz can be used. The goal is to use it to reduce noise in an analog telephone line.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Pós-graduação em Engenharia Elétrica - FEIS
Resumo:
Context. During the course of a large spectroscopic survey of X-ray active late-type stars in the solar neighbourhood, we discovered four lithium-rich stars packed within just a few degrees on the sky. Although located in a sky area rich in CO molecular regions and dark clouds, the Cepheus-Cassiopeia complex, these very young stars are projected several degrees away from clouds in front of an area void of interstellar matter. As such, they are very good "isolated" T Tauri star candidates. Aims. We present optical observations of these stars conducted with 1-2 m class telescopes. We acquired high-resolution optical spectra as well as photometric data allowing us to investigate in detail their nature and physical parameters with the aim of testing the "runaway" and "in-situ" formation scenarios. Their kinematical properties are also analyzed to investigate their possible connection to already known stellar kinematic groups. Methods. We use the cross-correlation technique and other tools developed by us to derive accurate radial and rotational velocities and perform an automatic spectral classification. The spectral subtraction technique is used to infer chromospheric activity level in the Hα line core and clean the spectra of photospheric lines before measuring the equivalent width of the lithium absorption line. Results. Both physical (lithium content, chromospheric, and coronal activities) and kinematical indicators show that all stars are very young, with ages probably in the range 10-30 Myr. In particular, the spectral energy distribution of TYC4496-780-1 displays a strong near-and far-infrared excess, typical of T Tauri stars still surrounded by an accretion disc. They also share the same Galactic motion, proving that they form a homogeneous moving group of stars with the same origin. Conclusions. The most plausible explanation of how these "isolated" T Tauri stars formed is the "in-situ" model, although accurate distances are needed to clarify their connection with the Cepheus-Cassiopeia complex. The discovery of this loose association of "isolated" T Tauri stars can help to shed light on atypical formation processes of stars and planets in low-mass clouds.