12 resultados para speaker diarization

em Chinese Academy of Sciences Institutional Repositories Grid Portal


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, a new classifier of speaker identification has been proposed, which is based on Biomimetic pattern recognition (BPR). Distinguished from traditional speaker recognition methods, such as DWT, HMM, GMM, SVM and so on, the proposed classifier is constructed by some finite sub-space which is reasonable covering of the points in high dimensional space according to distributing characteristic of speech feature points. It has been used in the system of speaker identification. Experiment results show that better effect could be obtained especially with lesser samples. Furthermore, the proposed classifier employs a much simpler modeling structure as compared to the GMM. In addition, the basic idea "cognition" of Biomimetic pattern recognition (BPR) results in no requirement of retraining the old system for enrolling new speakers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the light of descriptive geometry and notions in set theory, this paper re-defines the basic elements in space such as curve and surface and so on, presents some fundamental notions with respect to the point cover based on the High-dimension space (HDS) point covering theory, finally takes points from mapping part of speech signals to HDS, so as to analyze distribution information of these speech points in HDS, and various geometric covering objects for speech points and their relationship. Besides, this paper also proposes a new algorithm for speaker independent continuous digit speech recognition based on the HDS point dynamic searching theory without end-points detection and segmentation. First from the different digit syllables in real continuous digit speech, we establish the covering area in feature space for continuous speech. During recognition, we make use of the point covering dynamic searching theory in HDS to do recognition, and then get the satisfying recognized results. At last, compared to HMM (Hidden Markov models)-based method, from the development trend of the comparing results, as sample amount increasing, the difference of recognition rate between two methods will decrease slowly, while sample amount approaching to be very large, two recognition rates all close to 100% little by little. As seen from the results, the recognition rate of HDS point covering method is higher than that of in HMM (Hidden Markov models) based method, because, the point covering describes the morphological distribution for speech in HDS, whereas HMM-based method is only a probability distribution, whose accuracy is certainly inferior to point covering.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We studied the application of Biomimetic Pattern Recognition to speaker recognition. A speaker recognition neural network using network matching degree as criterion is proposed. It has been used in the system of text-dependent speaker recognition. Experimental results show that good effect could be obtained even with lesser samples. Furthermore, the misrecognition caused by untrained speakers occurring in testing could be controlled effectively. In addition, the basic idea "cognition" of Biomimetic Pattern Recognition results in no requirement of retraining the old system for enrolling new speakers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In speaker-independent speech recognition, the disadvantage of the most diffused technology (HMMs, or Hidden Markov models) is not only the need of many more training samples, but also long train time requirement. This paper describes the use of Biomimetic pattern recognition (BPR) in recognizing some mandarin continuous speech in a speaker-independent manner. A speech database was developed for the course of study. The vocabulary of the database consists of 15 Chinese dish's names, the length of each name is 4 Chinese words. Neural networks (NNs) based on Multi-weight neuron (MWN) model are used to train and recognize the speech sounds. The number of MWN was investigated to achieve the optimal performance of the NNs-based BPR. This system, which is based on BPR and can carry out real time recognition reaches a recognition rate of 98.14% for the first option and 99.81% for the first two options to the persons from different provinces of China speaking common Chinese speech. Experiments were also carried on to evaluate Continuous density hidden Markov models (CDHMM), Dynamic time warping (DTW) and BPR for speech recognition. The Experiment results show that BPR outperforms CDHMM and DTW especially in the cases of samples of a finite size.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In speaker-independent speech recognition, the disadvantage of the most diffused technology ( Hidden Markov Models) is not only the need of many more training samples, but also long train time requirement. This paper describes the use of Biomimetic Pattern Recognition (BPR) in recognizing some Mandarin Speech in a speaker-independent manner. The vocabulary of the system consists of 15 Chinese dish's names. Neural networks based on Multi-Weight Neuron (MWN) model are used to train and recognize the speech sounds. Experimental results are presented to show that the system, which can carry out real time recognition of the persons from different provinces speaking common Chinese speech, outperforms HMMs especially in the cases of samples of a finite size.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We studied the application of Biomimetic Pattern Recognition to speaker recognition. A speaker recognition neural network using network matching degree as criterion is proposed. It has been used in the system of text-dependent speaker recognition. Experimental results show that good effect could be obtained even with lesser samples. Furthermore, the misrecognition caused by untrained speakers occurring in testing could be controlled effectively. In addition, the basic idea "cognition" of Biomimetic Pattern Recognition results in no requirement of retraining the old system for enrolling new speakers.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

现有的半导体激光干涉仪存在测量精度与测量范围的矛盾。本文提出一种新的实时位移测量半导体激光干涉仪,并分析了干涉仪的测量原理。首先提出一种新的解相算法,它通过两路实时相位探测电路从干涉信号中得到待测量相位,消除了光强波动、初始光程差、电路放大倍数、调制深度、Bessel函数等参数对测量精度的影响,提高了测量精度。其次,提出一种扩大测量范围的技术,并用解包裹电路得到真实相位和待测量的位移, 将测量范围从半个波长提高到几个波长。在实验中,测得喇叭的峰峰值为2361.7nm,重复测量精度为2.56nm,测量时间为

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Based on biomimetic pattern recognition theory, we proposed a novel speaker-independent continuous speech keyword-spotting algorithm. Without endpoint detection and division, we can get the minimum distance curve between continuous speech samples and every keyword-training net through the dynamic searching to the feature-extracted continuous speech. Then we can count the number of the keywords by investigating the vale-value and the numbers of the vales in the curve. Experiments of small vocabulary continuous speech with various speaking rate have got good recognition results and proved the validity of the algorithm.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we presents HyperSausage Neuron based on the High-Dimension Space(HDS), and proposes a new algorithm for speaker independent continuous digit speech recognition. At last, compared to HMM-based method, the recognition rate of HyperSausage Neuron method is higher than that of in HMM-based method.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we presents HyperSausage Neuron based on the High-Dimension Space(HDS), and proposes a new algorithm for speaker independent continuous digit speech recognition. At last, compared to HMM-based method, the recognition rate of HyperSausage Neuron method is higher than that of in HMM-based method.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

As Levelt and Meyer (2000) noted, because studies of lexical access during multiword utterances production such as phrases and sentences, they raise two novel questions which studies of single word production do not. Firstly, does the access of different words in a sentence occur in a parallel or a serial fashion? Secondly, does the access of the different words in a sentence occur in an interactive or a discrete fashion? The latter question concerns the horizontal information flow (Smith & Wheeldon, 2004), which is a very important aspect of continuous speech production. A variant of the picture–word interference paradigm combining with eye-tracking technique and a dual task paradigm was used in 7 experiments to investigate the horizontal information flow of semantic and phonological information between nouns in spoken Mandarin Chinese sentences. The results suggested that: 1. Before speech onset, semantic information of different words accross the whole sentence has been activated, while phonological activation has been limited within the first phrase of the sentence. 2. Before speech onset, speaker will look ahead and check the semantic information of latter words as the first noun is beening processed, such looking ahead for phonological information can just occur within the first phrase of the sentence. 3. After speech onset, speaker will concentrate on the content words beyond the first one and will check the semantic information of other words with the same sentence. 4. The result suggested that the lexical accesses of multiple words during spoken sentence production are processed in a partly serial and partly parallel manner and stands for the Unit-by-Unit and Incremental view proposed by Levelt (2000). 5. The horizontal information flow during spoken sentence production is not an automatic process and is constrained by cognitive resource.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Voice alarm plays an important role in emergency evacuation of public place, because it can provide information and instruct evacuation. This paper studied the optimization of acoustic and semantic parameters of voice alarms in emergency evacuation, so that alarm design can improve the evacuation performance. Both method of magnitude estimation and scale were implemented to investigate participants' perceived urgency of the alarms with different parameters. The results indicated that, participants evaluated the alarms with faster speech rate, with greater signal to noise ratio (SNR) and under louder noises more urgent. There was an interaction between noise level and content of voice alarm. Signals with speech rate below 4 characters / second were evaluated as non urgent at all. Intelligibility of the voice alarm was investigated by evaluating the key pointed recognition performance. The results showed that, speech rate’s effect was a marginal significance, and 7 characters / second has the highest intelligibility. It might because that the faster the signal spoken, the more attention was paid. Gender of speaker and SNR did not have a significant effect on the signals’ intelligibility. This paper also investigated impact of voice alarms' content on human behavior in emergency evacuation in a 3-D virtual reality environment. In condition of "telling the occupants what had happened and what to do", the number of participants who succeeded in evacuation was the largest. Further study, in which similar numbers of participants evacuate successfully in three conditions, indicated that the reaction time and evacuation time was the shortest in the aforesaid condition. Although one-way ANOVA shows that the difference was not significant, the results still provided some reference to the alarm design. In sum, parameters of voice alarm in emergency evacuation should be chosen to meet needs from both perceived urgency and intelligibility. Contents of the alarms should include "what had happened and what to do", and should vary according to noise levels in different public places.