997 resultados para audio processing


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Acoustically, car cabins are extremely noisy and as a consequence audio-only, in-car voice recognition systems perform poorly. As the visual modality is immune to acoustic noise, using the visual lip information from the driver is seen as a viable strategy in circumventing this problem by using audio visual automatic speech recognition (AVASR). However, implementing AVASR requires a system being able to accurately locate and track the drivers face and lip area in real-time. In this paper we present such an approach using the Viola-Jones algorithm. Using the AVICAR [1] in-car database, we show that the Viola- Jones approach is a suitable method of locating and tracking the driver’s lips despite the visual variability of illumination and head pose for audio-visual speech recognition system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Acoustically, car cabins are extremely noisy and as a consequence, existing audio-only speech recognition systems, for voice-based control of vehicle functions such as the GPS based navigator, perform poorly. Audio-only speech recognition systems fail to make use of the visual modality of speech (eg: lip movements). As the visual modality is immune to acoustic noise, utilising this visual information in conjunction with an audio only speech recognition system has the potential to improve the accuracy of the system. The field of recognising speech using both auditory and visual inputs is known as Audio Visual Speech Recognition (AVSR). Continuous research in AVASR field has been ongoing for the past twenty-five years with notable progress being made. However, the practical deployment of AVASR systems for use in a variety of real-world applications has not yet emerged. The main reason is due to most research to date neglecting to address variabilities in the visual domain such as illumination and viewpoint in the design of the visual front-end of the AVSR system. In this paper we present an AVASR system in a real-world car environment using the AVICAR database [1], which is publicly available in-car database and we show that the use of visual speech conjunction with the audio modality is a better approach to improve the robustness and effectiveness of voice-only recognition systems in car cabin environments.

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The cascading appearance-based (CAB) feature extraction technique has established itself as the state-of-the-art in extracting dynamic visual speech features for speech recognition. In this paper, we will focus on investigating the effectiveness of this technique for the related speaker verification application. By investigating the speaker verification ability of each stage of the cascade we will demonstrate that the same steps taken to reduce static speaker and environmental information for the visual speech recognition application also provide similar improvements for visual speaker recognition. A further study is conducted comparing synchronous HMM (SHMM) based fusion of CAB visual features and traditional perceptual linear predictive (PLP) acoustic features to show that higher complexity inherit in the SHMM approach does not appear to provide any improvement in the final audio-visual speaker verification system over simpler utterance level score fusion.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Interacting with technology within a vehicle environment using a voice interface can greatly reduce the effects of driver distraction. Most current approaches to this problem only utilise the audio signal, making them susceptible to acoustic noise. An obvious approach to circumvent this is to use the visual modality in addition. However, capturing, storing and distributing audio-visual data in a vehicle environment is very costly and difficult. One current dataset available for such research is the AVICAR [1] database. Unfortunately this database is largely unusable due to timing mismatch between the two streams and in addition, no protocol is available. We have overcome this problem by re-synchronising the streams on the phone-number portion of the dataset and established a protocol for further research. This paper presents the first audio-visual results on this dataset for speaker-independent speech recognition. We hope this will serve as a catalyst for future research in this area.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Signal Processing (SP) is a subject of central importance in engineering and the applied sciences. Signals are information-bearing functions, and SP deals with the analysis and processing of signals (by dedicated systems) to extract or modify information. Signal processing is necessary because signals normally contain information that is not readily usable or understandable, or which might be disturbed by unwanted sources such as noise. Although many signals are non-electrical, it is common to convert them into electrical signals for processing. Most natural signals (such as acoustic and biomedical signals) are continuous functions of time, with these signals being referred to as analog signals. Prior to the onset of digital computers, Analog Signal Processing (ASP) and analog systems were the only tool to deal with analog signals. Although ASP and analog systems are still widely used, Digital Signal Processing (DSP) and digital systems are attracting more attention, due in large part to the significant advantages of digital systems over the analog counterparts. These advantages include superiority in performance,s peed, reliability, efficiency of storage, size and cost. In addition, DSP can solve problems that cannot be solved using ASP, like the spectral analysis of multicomonent signals, adaptive filtering, and operations at very low frequencies. Following the recent developments in engineering which occurred in the 1980's and 1990's, DSP became one of the world's fastest growing industries. Since that time DSP has not only impacted on traditional areas of electrical engineering, but has had far reaching effects on other domains that deal with information such as economics, meteorology, seismology, bioengineering, oceanology, communications, astronomy, radar engineering, control engineering and various other applications. This book is based on the Lecture Notes of Associate Professor Zahir M. Hussain at RMIT University (Melbourne, 2001-2009), the research of Dr. Amin Z. Sadik (at QUT & RMIT, 2005-2008), and the Note of Professor Peter O'Shea at Queensland University of Technology. Part I of the book addresses the representation of analog and digital signals and systems in the time domain and in the frequency domain. The core topics covered are convolution, transforms (Fourier, Laplace, Z. Discrete-time Fourier, and Discrete Fourier), filters, and random signal analysis. There is also a treatment of some important applications of DSP, including signal detection in noise, radar range estimation, banking and financial applications, and audio effects production. Design and implementation of digital systems (such as integrators, differentiators, resonators and oscillators are also considered, along with the design of conventional digital filters. Part I is suitable for an elementary course in DSP. Part II (which is suitable for an advanced signal processing course), considers selected signal processing systems and techniques. Core topics covered are the Hilbert transformer, binary signal transmission, phase-locked loops, sigma-delta modulation, noise shaping, quantization, adaptive filters, and non-stationary signal analysis. Part III presents some selected advanced DSP topics. We hope that this book will contribute to the advancement of engineering education and that it will serve as a general reference book on digital signal processing.