954 resultados para Digit speech recognition


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the light of descriptive geometry and notions in set theory, this paper re-defines the basic elements in space such as curve and surface and so on, presents some fundamental notions with respect to the point cover based on the High-dimension space (HDS) point covering theory, finally takes points from mapping part of speech signals to HDS, so as to analyze distribution information of these speech points in HDS, and various geometric covering objects for speech points and their relationship. Besides, this paper also proposes a new algorithm for speaker independent continuous digit speech recognition based on the HDS point dynamic searching theory without end-points detection and segmentation. First from the different digit syllables in real continuous digit speech, we establish the covering area in feature space for continuous speech. During recognition, we make use of the point covering dynamic searching theory in HDS to do recognition, and then get the satisfying recognized results. At last, compared to HMM (Hidden Markov models)-based method, from the development trend of the comparing results, as sample amount increasing, the difference of recognition rate between two methods will decrease slowly, while sample amount approaching to be very large, two recognition rates all close to 100% little by little. As seen from the results, the recognition rate of HDS point covering method is higher than that of in HMM (Hidden Markov models) based method, because, the point covering describes the morphological distribution for speech in HDS, whereas HMM-based method is only a probability distribution, whose accuracy is certainly inferior to point covering.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A connected digit speech recognition is important in many applications such as automated banking system, catalogue-dialing, automatic data entry, automated banking system, etc. This paper presents an optimum speaker-independent connected digit recognizer forMalayalam language. The system employs Perceptual Linear Predictive (PLP) cepstral coefficient for speech parameterization and continuous density Hidden Markov Model (HMM) in the recognition process. Viterbi algorithm is used for decoding. The training data base has the utterance of 21 speakers from the age group of 20 to 40 years and the sound is recorded in the normal office environment where each speaker is asked to read 20 set of continuous digits. The system obtained an accuracy of 99.5 % with the unseen data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Digit speech recognition is important in many applications such as automatic data entry, PIN entry, voice dialing telephone, automated banking system, etc. This paper presents speaker independent speech recognition system for Malayalam digits. The system employs Mel frequency cepstrum coefficient (MFCC) as feature for signal processing and Hidden Markov model (HMM) for recognition. The system is trained with 21 male and female voices in the age group of 20 to 40 years and there was 98.5% word recognition accuracy (94.8% sentence recognition accuracy) on a test set of continuous digit recognition task.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Large margin criteria and discriminative models are two effective improvements for HMM-based speech recognition. This paper proposed a large margin trained log linear model with kernels for CSR. To avoid explicitly computing in the high dimensional feature space and to achieve the nonlinear decision boundaries, a kernel based training and decoding framework is proposed in this work. To make the system robust to noise a kernel adaptation scheme is also presented. Previous work in this area is extended in two directions. First, most kernels for CSR focus on measuring the similarity between two observation sequences. The proposed joint kernels defined a similarity between two observation-label sequence pairs on the sentence level. Second, this paper addresses how to efficiently employ kernels in large margin training and decoding with lattices. To the best of our knowledge, this is the first attempt at using large margin kernel-based log linear models for CSR. The model is evaluated on a noise corrupted continuous digit task: AURORA 2.0. © 2013 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present the application of Hidden Conditional Random Fields (HCRFs) to modelling speech for visual speech recognition. HCRFs may be easily adapted to model long range dependencies across an observation sequence. As a result visual word recognition performance can be improved as the model is able to take more of a contextual approach to generating state sequences. Results are presented from a speaker-dependent, isolated digit, visual speech recognition task using comparisons with a baseline HMM system. We firstly illustrate that word recognition rates on clean video using HCRFs can be improved by increasing the number of past and future observations being taken into account by each state. Secondly we compare model performances using various levels of video compression on the test set. As far as we are aware this is the first attempted use of HCRFs for visual speech recognition.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we present a new approach to visual speech recognition which improves contextual modelling by combining Inter-Frame Dependent and Hidden Markov Models. This approach captures contextual information in visual speech that may be lost using a Hidden Markov Model alone. We apply contextual modelling to a large speaker independent isolated digit recognition task, and compare our approach to two commonly adopted feature based techniques for incorporating speech dynamics. Results are presented from baseline feature based systems and the combined modelling technique. We illustrate that both of these techniques achieve similar levels of performance when used independently. However significant improvements in performance can be achieved through a combination of the two. In particular we report an improvement in excess of 17% relative Word Error Rate in comparison to our best baseline system.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech recognition in car environments has been identified as a valuable means for reducing driver distraction when operating non-critical in-car systems. Likelihood-maximising (LIMA) frameworks optimise speech enhancement algorithms based on recognised state sequences rather than traditional signal-level criteria such as maximising signal-to-noise ratio. Previously presented LIMA frameworks require calibration utterances to generate optimised enhancement parameters which are used for all subsequent utterances. Sub-optimal recognition performance occurs in noise conditions which are significantly different from that present during the calibration session - a serious problem in rapidly changing noise environments. We propose a dialog-based design which allows regular optimisation iterations in order to track the changing noise conditions. Experiments using Mel-filterbank spectral subtraction are performed to determine the optimisation requirements for vehicular environments and show that minimal optimisation assists real-time operation with improved speech recognition accuracy. It is also shown that the proposed design is able to provide improved recognition performance over frameworks incorporating a calibration session.

Relevância:

100.00% 100.00%

Publicador: