914 resultados para robust speech recognition


Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is considerable interest in creating embedded, speech recognition hardware using the weighted finite state transducer (WFST) technique but there are performance and memory usage challenges. Two system optimization techniques are presented to address this; one approach improves token propagation by removing the WFST epsilon input arcs; another one-pass, adaptive pruning algorithm gives a dramatic reduction in active nodes to be computed. Results for memory and bandwidth are given for a 5,000 word vocabulary giving a better practical performance than conventional WFST; this is then exploited in an adaptive pruning algorithm that reduces the active nodes from 30,000 down to 4,000 with only a 2 percent sacrifice in speech recognition accuracy; these optimizations lead to a more simplified design with deterministic performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While current speech recognisers give acceptable performance in carefully controlled environments, their performance degrades rapidly when they are applied in more realistic situations. Generally, the environmental noise may be classified into two classes: the wide-band noise and narrow band noise. While the multi-band model has been shown to be capable of dealing with speech corrupted by narrow-band noise, it is ineffective for wide-band noise. In this paper, we suggest a combination of the frequency-filtering technique with the probabilistic union model in the multi-band approach. The new system has been tested on the TIDIGITS database, corrupted by white noise, noise collected from a railway station, and narrow-band noise, respectively. The results have shown that this approach is capable of dealing with noise of narrow-band or wide-band characteristics, assuming no knowledge about the noisy environment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a new approach to speech enhancement from single-channel measurements involving both noise and channel distortion (i.e., convolutional noise), and demonstrates its applications for robust speech recognition and for improving noisy speech quality. The approach is based on finding longest matching segments (LMS) from a corpus of clean, wideband speech. The approach adds three novel developments to our previous LMS research. First, we address the problem of channel distortion as well as additive noise. Second, we present an improved method for modeling noise for speech estimation. Third, we present an iterative algorithm which updates the noise and channel estimates of the corpus data model. In experiments using speech recognition as a test with the Aurora 4 database, the use of our enhancement approach as a preprocessor for feature extraction significantly improved the performance of a baseline recognition system. In another comparison against conventional enhancement algorithms, both the PESQ and the segmental SNR ratings of the LMS algorithm were superior to the other methods for noisy speech enhancement.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we introduce a novel approach to face recognition which simultaneously tackles three combined challenges: 1) uneven illumination; 2) partial occlusion; and 3) limited training data. The new approach performs lighting normalization, occlusion de-emphasis and finally face recognition, based on finding the largest matching area (LMA) at each point on the face, as opposed to traditional fixed-size local area-based approaches. Robustness is achieved with novel approaches for feature extraction, LMA-based face image comparison and unseen data modeling. On the extended YaleB and AR face databases for face identification, our method using only a single training image per person, outperforms other methods using a single training image, and matches or exceeds methods which require multiple training images. On the labeled faces in the wild face verification database, our method outperforms comparable unsupervised methods. We also show that the new method performs competitively even when the training images are corrupted.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech processing and consequent recognition are important areas of Digital Signal Processing since speech allows people to communicate more natu-rally and efficiently. In this work, a speech recognition system is developed for re-cognizing digits in Malayalam. For recognizing speech, features are to be ex-tracted from speech and hence feature extraction method plays an important role in speech recognition. Here, front end processing for extracting the features is per-formed using two wavelet based methods namely Discrete Wavelet Transforms (DWT) and Wavelet Packet Decomposition (WPD). Naive Bayes classifier is used for classification purpose. After classification using Naive Bayes classifier, DWT produced a recognition accuracy of 83.5% and WPD produced an accuracy of 80.7%. This paper is intended to devise a new feature extraction method which produces improvements in the recognition accuracy. So, a new method called Dis-crete Wavelet Packet Decomposition (DWPD) is introduced which utilizes the hy-brid features of both DWT and WPD. The performance of this new approach is evaluated and it produced an improved recognition accuracy of 86.2% along with Naive Bayes classifier.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech is the most natural means of communication among human beings and speech processing and recognition are intensive areas of research for the last five decades. Since speech recognition is a pattern recognition problem, classification is an important part of any speech recognition system. In this work, a speech recognition system is developed for recognizing speaker independent spoken digits in Malayalam. Voice signals are sampled directly from the microphone. The proposed method is implemented for 1000 speakers uttering 10 digits each. Since the speech signals are affected by background noise, the signals are tuned by removing the noise from it using wavelet denoising method based on Soft Thresholding. Here, the features from the signals are extracted using Discrete Wavelet Transforms (DWT) because they are well suitable for processing non-stationary signals like speech. This is due to their multi- resolutional, multi-scale analysis characteristics. Speech recognition is a multiclass classification problem. So, the feature vector set obtained are classified using three classifiers namely, Artificial Neural Networks (ANN), Support Vector Machines (SVM) and Naive Bayes classifiers which are capable of handling multiclasses. During classification stage, the input feature vector data is trained using information relating to known patterns and then they are tested using the test data set. The performances of all these classifiers are evaluated based on recognition accuracy. All the three methods produced good recognition accuracy. DWT and ANN produced a recognition accuracy of 89%, SVM and DWT combination produced an accuracy of 86.6% and Naive Bayes and DWT combination produced an accuracy of 83.5%. ANN is found to be better among the three methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Digit speech recognition is important in many applications such as automatic data entry, PIN entry, voice dialing telephone, automated banking system, etc. This paper presents speaker independent speech recognition system for Malayalam digits. The system employs Mel frequency cepstrum coefficient (MFCC) as feature for signal processing and Hidden Markov model (HMM) for recognition. The system is trained with 21 male and female voices in the age group of 20 to 40 years and there was 98.5% word recognition accuracy (94.8% sentence recognition accuracy) on a test set of continuous digit recognition task.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A connected digit speech recognition is important in many applications such as automated banking system, catalogue-dialing, automatic data entry, automated banking system, etc. This paper presents an optimum speaker-independent connected digit recognizer forMalayalam language. The system employs Perceptual Linear Predictive (PLP) cepstral coefficient for speech parameterization and continuous density Hidden Markov Model (HMM) in the recognition process. Viterbi algorithm is used for decoding. The training data base has the utterance of 21 speakers from the age group of 20 to 40 years and the sound is recorded in the normal office environment where each speaker is asked to read 20 set of continuous digits. The system obtained an accuracy of 99.5 % with the unseen data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The ability for individuals with hearing loss to accurately recognize correct versus incorrect verbal responses during traditional word recognition testing across four different listening conditions was assessed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Little is known about the way speech in noise is processed along the auditory pathway. The purpose of this study was to evaluate the relation between listening in noise using the R-Space system and the neurophysiologic response of the speech-evoked auditory brainstem when recorded in quiet and noise in adult participants with mild to moderate hearing loss and normal hearing.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of the present study was to evaluate the effects of bimodal (implant plus hearing aid) listening on speech recognition in four different environment conditions. Results indicate that there was little difference in the cochlear implant only and bimodal conditions.