936 resultados para SPEECH
Resumo:
This paper presents a complete system for expressive visual text-to-speech (VTTS), which is capable of producing expressive output, in the form of a 'talking head', given an input text and a set of continuous expression weights. The face is modeled using an active appearance model (AAM), and several extensions are proposed which make it more applicable to the task of VTTS. The model allows for normalization with respect to both pose and blink state which significantly reduces artifacts in the resulting synthesized sequences. We demonstrate quantitative improvements in terms of reconstruction error over a million frames, as well as in large-scale user studies, comparing the output of different systems. © 2013 IEEE.
Resumo:
Large margin criteria and discriminative models are two effective improvements for HMM-based speech recognition. This paper proposed a large margin trained log linear model with kernels for CSR. To avoid explicitly computing in the high dimensional feature space and to achieve the nonlinear decision boundaries, a kernel based training and decoding framework is proposed in this work. To make the system robust to noise a kernel adaptation scheme is also presented. Previous work in this area is extended in two directions. First, most kernels for CSR focus on measuring the similarity between two observation sequences. The proposed joint kernels defined a similarity between two observation-label sequence pairs on the sentence level. Second, this paper addresses how to efficiently employ kernels in large margin training and decoding with lattices. To the best of our knowledge, this is the first attempt at using large margin kernel-based log linear models for CSR. The model is evaluated on a noise corrupted continuous digit task: AURORA 2.0. © 2013 IEEE.
Resumo:
This paper presents an overview of the Text-to-Speech synthesis system developed at the Institute for Language and Speech Processing (ILSP). It focuses on the key issues regarding the design of the system components. The system currently fully supports three languages (Greek, English, Bulgarian) and is designed in such a way to be as language and speaker independent as possible. Also, experimental results are presented which show that the system produces high quality synthetic speech in terms of naturalness and intelligibility. The system was recently ranked among the first three systems worldwide in terms of achieved quality for the English language, at the international Blizzard Challenge 2013 workshop. © 2014 Springer International Publishing.
Resumo:
We investigate the use of independent component analysis (ICA) for speech feature extraction in digits speech recognition systems.We observe that this may be true for a recognition tasks based on geometrical learning with little training data. In contrast to image processing, phase information is not essential for digits speech recognition. We therefore propose a new scheme that shows how the phase sensitivity can be removed by using an analytical description of the ICA-adapted basis functions via the Hilbert transform. Furthermore, since the basis functions are not shift invariant, we extend the method to include a frequency-based ICA stage that removes redundant time shift information. The digits speech recognition results show promising accuracy, Experiments show method based on ICA and geometrical learning outperforms HMM in different number of train samples.
Resumo:
In the light of descriptive geometry and notions in set theory, this paper re-defines the basic elements in space such as curve and surface and so on, presents some fundamental notions with respect to the point cover based on the High-dimension space (HDS) point covering theory, finally takes points from mapping part of speech signals to HDS, so as to analyze distribution information of these speech points in HDS, and various geometric covering objects for speech points and their relationship. Besides, this paper also proposes a new algorithm for speaker independent continuous digit speech recognition based on the HDS point dynamic searching theory without end-points detection and segmentation. First from the different digit syllables in real continuous digit speech, we establish the covering area in feature space for continuous speech. During recognition, we make use of the point covering dynamic searching theory in HDS to do recognition, and then get the satisfying recognized results. At last, compared to HMM (Hidden Markov models)-based method, from the development trend of the comparing results, as sample amount increasing, the difference of recognition rate between two methods will decrease slowly, while sample amount approaching to be very large, two recognition rates all close to 100% little by little. As seen from the results, the recognition rate of HDS point covering method is higher than that of in HMM (Hidden Markov models) based method, because, the point covering describes the morphological distribution for speech in HDS, whereas HMM-based method is only a probability distribution, whose accuracy is certainly inferior to point covering.
Resumo:
In this paper, a novel approach for mandarin speech emotion recognition, that is mandarin speech emotion recognition based on high dimensional geometry theory, is proposed. The human emotions are classified into 6 archetypal classes: fear, anger, happiness, sadness, surprise and disgust. According to the characteristics of these emotional speech signals, the amplitude, pitch frequency and formant are used as the feature parameters for speech emotion recognition. The new method called high dimensional geometry theory is applied for recognition. Compared with traditional GSVM model, the new method has some advantages. It is noted that this method has significant values for researches and applications henceforth.
Resumo:
Based on biomimetic pattern recognition theory, we proposed a novel speaker-independent continuous speech keyword-spotting algorithm. Without endpoint detection and division, we can get the minimum distance curve between continuous speech samples and every keyword-training net through the dynamic searching to the feature-extracted continuous speech. Then we can count the number of the keywords by investigating the vale-value and the numbers of the vales in the curve. Experiments of small vocabulary continuous speech with various speaking rate have got good recognition results and proved the validity of the algorithm.
Resumo:
In speaker-independent speech recognition, the disadvantage of the most diffused technology (HMMs, or Hidden Markov models) is not only the need of many more training samples, but also long train time requirement. This paper describes the use of Biomimetic pattern recognition (BPR) in recognizing some mandarin continuous speech in a speaker-independent manner. A speech database was developed for the course of study. The vocabulary of the database consists of 15 Chinese dish's names, the length of each name is 4 Chinese words. Neural networks (NNs) based on Multi-weight neuron (MWN) model are used to train and recognize the speech sounds. The number of MWN was investigated to achieve the optimal performance of the NNs-based BPR. This system, which is based on BPR and can carry out real time recognition reaches a recognition rate of 98.14% for the first option and 99.81% for the first two options to the persons from different provinces of China speaking common Chinese speech. Experiments were also carried on to evaluate Continuous density hidden Markov models (CDHMM), Dynamic time warping (DTW) and BPR for speech recognition. The Experiment results show that BPR outperforms CDHMM and DTW especially in the cases of samples of a finite size.
Resumo:
We investigate the use of independent component analysis (ICA) for speech feature extraction in digits speech recognition systems. We observe that this may be true for recognition tasks based on Geometrical Learning with little training data. In contrast to image processing, phase information is not essential for digits speech recognition. We therefore propose a new scheme that shows how the phase sensitivity can be removed by using an analytical description of the ICA-adapted basis functions. Furthermore, since the basis functions are not shift invariant, we extend the method to include a frequency-based ICA stage that removes redundant time shift information. The digits speech recognition results show promising accuracy. Experiments show that the method based on ICA and Geometrical Learning outperforms HMM in a different number of training samples.
Resumo:
In this paper, we presents HyperSausage Neuron based on the High-Dimension Space(HDS), and proposes a new algorithm for speaker independent continuous digit speech recognition. At last, compared to HMM-based method, the recognition rate of HyperSausage Neuron method is higher than that of in HMM-based method.