944 resultados para Visual Speech Recognition, Multiple Views, Frontal View, Profile View


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A scalable large vocabulary, speaker independent speech recognition system is being developed using Hidden Markov Models (HMMs) for acoustic modeling and a Weighted Finite State Transducer (WFST) to compile sentence, word, and phoneme models. The system comprises a software backend search and an FPGA-based Gaussian calculation which are covered here. In this paper, we present an efficient pipelined design implemented both as an embedded peripheral and as a scalable, parallel hardware accelerator. Both architectures have been implemented on an Alpha Data XRC-5T1, reconfigurable computer housing a Virtex 5 SX95T FPGA. The core has been tested and is capable of calculating a full set of Gaussian results from 3825 acoustic models in 9.03 ms which coupled with a backend search of 5000 words has provided an accuracy of over 80%. Parallel implementations have been designed with up to 32 cores and have been successfully implemented with a clock frequency of 133?MHz.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is considerable interest in creating embedded, speech recognition hardware using the weighted finite state transducer (WFST) technique but there are performance and memory usage challenges. Two system optimization techniques are presented to address this; one approach improves token propagation by removing the WFST epsilon input arcs; another one-pass, adaptive pruning algorithm gives a dramatic reduction in active nodes to be computed. Results for memory and bandwidth are given for a 5,000 word vocabulary giving a better practical performance than conventional WFST; this is then exploited in an adaptive pruning algorithm that reduces the active nodes from 30,000 down to 4,000 with only a 2 percent sacrifice in speech recognition accuracy; these optimizations lead to a more simplified design with deterministic performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While current speech recognisers give acceptable performance in carefully controlled environments, their performance degrades rapidly when they are applied in more realistic situations. Generally, the environmental noise may be classified into two classes: the wide-band noise and narrow band noise. While the multi-band model has been shown to be capable of dealing with speech corrupted by narrow-band noise, it is ineffective for wide-band noise. In this paper, we suggest a combination of the frequency-filtering technique with the probabilistic union model in the multi-band approach. The new system has been tested on the TIDIGITS database, corrupted by white noise, noise collected from a railway station, and narrow-band noise, respectively. The results have shown that this approach is capable of dealing with noise of narrow-band or wide-band characteristics, assuming no knowledge about the noisy environment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study explored changes in scalp electrophysiology across two Working Memory (WM) tasks and two age groups. Continuous electroencephalography (EEG) was recorded from 18 healthy adults (18-34 years) and 12 healthy adolescents (14-17) during the performance of two Oculomotor Delayed Response (ODR) WM tasks; (i.e. eye movements were the metric of motor response). Delay-period, EEG data in the alpha frequency was sampled from anterior and parietal scalp sites to achieve a general measure of frontal and parietal activity, respectively. Frontal-parietal, alpha coherence was calculated for each participant for each ODR-WM task. Coherence significantly decreased in adults moving across the two ODR tasks, whereas, coherence significantly increased in adolescents moving across the two ODR tasks. The effects of task in the adolescent and adult groups were large and medium, respectively. Within the limits of this study, the results provide empirical support that WM development during adolescence include complex, qualitative, change.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech processing and consequent recognition are important areas of Digital Signal Processing since speech allows people to communicate more natu-rally and efficiently. In this work, a speech recognition system is developed for re-cognizing digits in Malayalam. For recognizing speech, features are to be ex-tracted from speech and hence feature extraction method plays an important role in speech recognition. Here, front end processing for extracting the features is per-formed using two wavelet based methods namely Discrete Wavelet Transforms (DWT) and Wavelet Packet Decomposition (WPD). Naive Bayes classifier is used for classification purpose. After classification using Naive Bayes classifier, DWT produced a recognition accuracy of 83.5% and WPD produced an accuracy of 80.7%. This paper is intended to devise a new feature extraction method which produces improvements in the recognition accuracy. So, a new method called Dis-crete Wavelet Packet Decomposition (DWPD) is introduced which utilizes the hy-brid features of both DWT and WPD. The performance of this new approach is evaluated and it produced an improved recognition accuracy of 86.2% along with Naive Bayes classifier.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech is the most natural means of communication among human beings and speech processing and recognition are intensive areas of research for the last five decades. Since speech recognition is a pattern recognition problem, classification is an important part of any speech recognition system. In this work, a speech recognition system is developed for recognizing speaker independent spoken digits in Malayalam. Voice signals are sampled directly from the microphone. The proposed method is implemented for 1000 speakers uttering 10 digits each. Since the speech signals are affected by background noise, the signals are tuned by removing the noise from it using wavelet denoising method based on Soft Thresholding. Here, the features from the signals are extracted using Discrete Wavelet Transforms (DWT) because they are well suitable for processing non-stationary signals like speech. This is due to their multi- resolutional, multi-scale analysis characteristics. Speech recognition is a multiclass classification problem. So, the feature vector set obtained are classified using three classifiers namely, Artificial Neural Networks (ANN), Support Vector Machines (SVM) and Naive Bayes classifiers which are capable of handling multiclasses. During classification stage, the input feature vector data is trained using information relating to known patterns and then they are tested using the test data set. The performances of all these classifiers are evaluated based on recognition accuracy. All the three methods produced good recognition accuracy. DWT and ANN produced a recognition accuracy of 89%, SVM and DWT combination produced an accuracy of 86.6% and Naive Bayes and DWT combination produced an accuracy of 83.5%. ANN is found to be better among the three methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Digit speech recognition is important in many applications such as automatic data entry, PIN entry, voice dialing telephone, automated banking system, etc. This paper presents speaker independent speech recognition system for Malayalam digits. The system employs Mel frequency cepstrum coefficient (MFCC) as feature for signal processing and Hidden Markov model (HMM) for recognition. The system is trained with 21 male and female voices in the age group of 20 to 40 years and there was 98.5% word recognition accuracy (94.8% sentence recognition accuracy) on a test set of continuous digit recognition task.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Performance of any continuous speech recognition system is dependent on the accuracy of its acoustic model. Hence, preparation of a robust and accurate acoustic model lead to satisfactory recognition performance for a speech recognizer. In acoustic modeling of phonetic unit, context information is of prime importance as the phonemes are found to vary according to the place of occurrence in a word. In this paper we compare and evaluate the effect of context dependent tied (CD tied) models, context dependent (CD) and context independent (CI) models in the perspective of continuous speech recognition of Malayalam language. The database for the speech recognition system has utterance from 21 speakers including 11 female and 10 males. Our evaluation results show that CD tied models outperforms CI models over 21%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A connected digit speech recognition is important in many applications such as automated banking system, catalogue-dialing, automatic data entry, automated banking system, etc. This paper presents an optimum speaker-independent connected digit recognizer forMalayalam language. The system employs Perceptual Linear Predictive (PLP) cepstral coefficient for speech parameterization and continuous density Hidden Markov Model (HMM) in the recognition process. Viterbi algorithm is used for decoding. The training data base has the utterance of 21 speakers from the age group of 20 to 40 years and the sound is recorded in the normal office environment where each speaker is asked to read 20 set of continuous digits. The system obtained an accuracy of 99.5 % with the unseen data.