14 resultados para Digit speech recognition

em Chinese Academy of Sciences Institutional Repositories Grid Portal


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the light of descriptive geometry and notions in set theory, this paper re-defines the basic elements in space such as curve and surface and so on, presents some fundamental notions with respect to the point cover based on the High-dimension space (HDS) point covering theory, finally takes points from mapping part of speech signals to HDS, so as to analyze distribution information of these speech points in HDS, and various geometric covering objects for speech points and their relationship. Besides, this paper also proposes a new algorithm for speaker independent continuous digit speech recognition based on the HDS point dynamic searching theory without end-points detection and segmentation. First from the different digit syllables in real continuous digit speech, we establish the covering area in feature space for continuous speech. During recognition, we make use of the point covering dynamic searching theory in HDS to do recognition, and then get the satisfying recognized results. At last, compared to HMM (Hidden Markov models)-based method, from the development trend of the comparing results, as sample amount increasing, the difference of recognition rate between two methods will decrease slowly, while sample amount approaching to be very large, two recognition rates all close to 100% little by little. As seen from the results, the recognition rate of HDS point covering method is higher than that of in HMM (Hidden Markov models) based method, because, the point covering describes the morphological distribution for speech in HDS, whereas HMM-based method is only a probability distribution, whose accuracy is certainly inferior to point covering.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We investigate the use of independent component analysis (ICA) for speech feature extraction in digits speech recognition systems.We observe that this may be true for a recognition tasks based on geometrical learning with little training data. In contrast to image processing, phase information is not essential for digits speech recognition. We therefore propose a new scheme that shows how the phase sensitivity can be removed by using an analytical description of the ICA-adapted basis functions via the Hilbert transform. Furthermore, since the basis functions are not shift invariant, we extend the method to include a frequency-based ICA stage that removes redundant time shift information. The digits speech recognition results show promising accuracy, Experiments show method based on ICA and geometrical learning outperforms HMM in different number of train samples.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Based on biomimetic pattern recognition theory, we proposed a novel speaker-independent continuous speech keyword-spotting algorithm. Without endpoint detection and division, we can get the minimum distance curve between continuous speech samples and every keyword-training net through the dynamic searching to the feature-extracted continuous speech. Then we can count the number of the keywords by investigating the vale-value and the numbers of the vales in the curve. Experiments of small vocabulary continuous speech with various speaking rate have got good recognition results and proved the validity of the algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In speaker-independent speech recognition, the disadvantage of the most diffused technology (HMMs, or Hidden Markov models) is not only the need of many more training samples, but also long train time requirement. This paper describes the use of Biomimetic pattern recognition (BPR) in recognizing some mandarin continuous speech in a speaker-independent manner. A speech database was developed for the course of study. The vocabulary of the database consists of 15 Chinese dish's names, the length of each name is 4 Chinese words. Neural networks (NNs) based on Multi-weight neuron (MWN) model are used to train and recognize the speech sounds. The number of MWN was investigated to achieve the optimal performance of the NNs-based BPR. This system, which is based on BPR and can carry out real time recognition reaches a recognition rate of 98.14% for the first option and 99.81% for the first two options to the persons from different provinces of China speaking common Chinese speech. Experiments were also carried on to evaluate Continuous density hidden Markov models (CDHMM), Dynamic time warping (DTW) and BPR for speech recognition. The Experiment results show that BPR outperforms CDHMM and DTW especially in the cases of samples of a finite size.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We investigate the use of independent component analysis (ICA) for speech feature extraction in digits speech recognition systems. We observe that this may be true for recognition tasks based on Geometrical Learning with little training data. In contrast to image processing, phase information is not essential for digits speech recognition. We therefore propose a new scheme that shows how the phase sensitivity can be removed by using an analytical description of the ICA-adapted basis functions. Furthermore, since the basis functions are not shift invariant, we extend the method to include a frequency-based ICA stage that removes redundant time shift information. The digits speech recognition results show promising accuracy. Experiments show that the method based on ICA and Geometrical Learning outperforms HMM in a different number of training samples.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we presents HyperSausage Neuron based on the High-Dimension Space(HDS), and proposes a new algorithm for speaker independent continuous digit speech recognition. At last, compared to HMM-based method, the recognition rate of HyperSausage Neuron method is higher than that of in HMM-based method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In speaker-independent speech recognition, the disadvantage of the most diffused technology ( Hidden Markov Models) is not only the need of many more training samples, but also long train time requirement. This paper describes the use of Biomimetic Pattern Recognition (BPR) in recognizing some Mandarin Speech in a speaker-independent manner. The vocabulary of the system consists of 15 Chinese dish's names. Neural networks based on Multi-Weight Neuron (MWN) model are used to train and recognize the speech sounds. Experimental results are presented to show that the system, which can carry out real time recognition of the persons from different provinces speaking common Chinese speech, outperforms HMMs especially in the cases of samples of a finite size.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We investigate the use of independent component analysis (ICA) for speech feature extraction in digits speech recognition systems. We observe that this may be true for recognition tasks based on Geometrical Learning with little training data. In contrast to image processing, phase information is not essential for digits speech recognition. We therefore propose a new scheme that shows how the phase sensitivity can be removed by using an analytical description of the ICA-adapted basis functions. Furthermore, since the basis functions are not shift invariant, we extend the method to include a frequency-based ICA stage that removes redundant time shift information. The digits speech recognition results show promising accuracy. Experiments show that the method based on ICA and Geometrical Learning outperforms HMM in a different number of training samples.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we presents HyperSausage Neuron based on the High-Dimension Space(HDS), and proposes a new algorithm for speaker independent continuous digit speech recognition. At last, compared to HMM-based method, the recognition rate of HyperSausage Neuron method is higher than that of in HMM-based method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recognition-based user interface, users’ satisfaction is determined not only by recognition accuracy but also by effort to correct recognition errors. In this paper, we introduce a crossmodal error correction technique, which allows users to correct errors of Chinese handwriting recognition by speech. The focus of the paper is a multimodal fusion algorithm supporting the crossmodal error correction. By fusing handwriting and speech recognition, the algorithm can correct errors in both character extraction and recognition of handwriting. The experimental result indicates that the algorithm is effective and efficient. Moreover, the evaluation also shows the correction technique can help users to correct errors in handwriting recognition more efficiently than the other two error correction techniques.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Recently,Handheld Communication Devices is developing very fast, extending in users and spreading in application fields, and has an promising future. This study investigated the acceptance of the multimodal text entry method and the behavioral characteristics when using it. Based on the general information process model of a bimodal system and the human factor studies about the multimodal map system, the present study mainly focused on the hand-speech bimodal text entry method. For acceptance, the study investigated the subjective perception of the accuracy of speech recognition by Wizard of Oz (WOz) experiment and a questionnaire. Results showed that there was a linear relationship between the speech recognition accuracy and the subjective accuracy. Furthermore, as the familiarity increasing, the difference between the acceptable accuracy and the subjective accuracy gradually decreased. In addition, the similarity of meaning between the outcome of speech recognition and the correct sentences was an important referential criterion. The second study investigated three aspects of the bimodal text entry method, including input, error recovery and modal shifts. The first experiment aimed to find the behavioral characteristics of user when doing error recovery task. Results indicated that participants preferred to correct the error by handwriting, which had no relationship with the input modality. The second experiment aimed to discover the behavioral characteristics of users when doing text entry in various types of text. Results showed that users preferred to speech input in both words and sentences conditions, which was highly consistent among individuals, while no significant difference was found between handwriting and speech input in the character condition. Participants used more direct strategy than jumping strategy to deal with mixed text, especially for the Chinese-English mixed type. The third experiment examined the cognitive load in the different modal shifts, results suggesting that there were significant differences between different shifts. Moreover, relevant little time was needed in the Shift from speech input to hand input. Based on the main findings, implications were discussed as follows: Firstly, when evaluating a speech recognition system, attention should be paid to the fact that the speech recognition accuracy was not equal to the subjective accuracy. Secondly, in order to make a speech input system more acceptable, a good method is to train and supply the feedback for the accuracy in training, which improving the familiarity and sensitivity to the system. Thirdly, both the universal and individual behavioral patterns were taken into consideration to improve the error recovery method. Fourthly, easing the study and the use of speech input, the operations of speech input should be simpler. Fifthly, more convenient text input method for non-Chinese text entry should be provided. Finally, the shifting time between hand input and speech input provides an important parameter for the design of automatic-evoked speech recognition system.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper, a novel approach for mandarin speech emotion recognition, that is mandarin speech emotion recognition based on high dimensional geometry theory, is proposed. The human emotions are classified into 6 archetypal classes: fear, anger, happiness, sadness, surprise and disgust. According to the characteristics of these emotional speech signals, the amplitude, pitch frequency and formant are used as the feature parameters for speech emotion recognition. The new method called high dimensional geometry theory is applied for recognition. Compared with traditional GSVM model, the new method has some advantages. It is noted that this method has significant values for researches and applications henceforth.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, a new classifier of speaker identification has been proposed, which is based on Biomimetic pattern recognition (BPR). Distinguished from traditional speaker recognition methods, such as DWT, HMM, GMM, SVM and so on, the proposed classifier is constructed by some finite sub-space which is reasonable covering of the points in high dimensional space according to distributing characteristic of speech feature points. It has been used in the system of speaker identification. Experiment results show that better effect could be obtained especially with lesser samples. Furthermore, the proposed classifier employs a much simpler modeling structure as compared to the GMM. In addition, the basic idea "cognition" of Biomimetic pattern recognition (BPR) results in no requirement of retraining the old system for enrolling new speakers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The mandarin keyword spotting system was investigated, and a new approach was proposed based on the principle of homology continuity and point location analysis in high-dimensional space geometry theory which are both parts of biomimetic pattern recognition theory. This approach constructed a hyper-polyhedron with sample points in the training set and calculated the distance between each test point and the hyper-polyhedron. The classification resulted from the value of those distances. The approach was tested by a speech database which was created by ourselves. The performance was compared with the classic HMM approach and the results show that the new approach is much better than HMM approach when the training data is not sufficient.