11 resultados para Digit speech recognition

em Deakin Research Online - Australia


Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper, we present our system for online context recognition of multimodal sequences acquired from multiple sensors. The system uses Dynamic Time Warping (DTW) to recognize multimodal sequences of different lengths, embedded in continuous data streams. We evaluate the performance of our system on two real world datasets: 1) accelerometer data acquired from performing two hand gestures and 2) NOKIA's benchmark dataset for context recognition. The results from both datasets demonstrate that the system can perform online context recognition efficiently and achieve high recognition accuracy.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Speaker recognition is the process of automatically recognizing the speaker by analyzing individual information contained in the speech waves. In this paper, we discuss the development of an intelligent system for text-dependent speaker recognition. The system comprises two main modules, a wavelet-based signal-processing module for feature extraction of speech waves, and an artificial-neural-network-based classifier module to identify and categorize the speakers. Wavelet is used in de-noising and in compressing the speech signals. The wavelet family that we used is the Daubechies Wavelets. After extracting the necessary features from the speech waves, the features were then fed to a neural-network-based classifier to identify the speakers. We have implemented the Fuzzy ARTMAP (FAM) network in the classifier module to categorize the de-noised and compressed signals. The proposed intelligent learning system has been applied to a case study of text-dependent speaker recognition problem.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We develop an algorithm for the detection and classification of affective sound events underscored by specific patterns of sound energy dynamics. We relate the portrayal of these events to proposed high level affect or emotional coloring of the events. In this paper, four possible characteristic sound energy events are identified that convey well established meanings through their dynamics to portray and deliver certain affect, sentiment related to the horror film genre. Our algorithm is developed with the ultimate aim of automatically structuring sections of films that contain distinct shades of emotion related to horror themes for nonlinear media access and navigation. An average of 82% of the energy events, obtained from the analysis of the audio tracks of sections of four sample films corresponded correctly to the proposed affect. While the discrimination between certain sound energy event types was low, the algorithm correctly detected 71% of the occurrences of the sound energy events within audio tracks of the films analyzed, and thus forms a useful basis for determining affective scenes characteristic of horror in movies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the last decade, the efforts of spoken language processing have achieved significant advances, however, the work with emotional recognition has not progressed so far, and can only achieve 50% to 60% in accuracy. This is because a majority of researchers in this field have focused on the synthesis of emotional speech rather than focusing on automating human emotion recognition. Many research groups have focused on how to improve the performance of the classifier they used for emotion recognition, and few work has been done on data pre-processing, such as the extraction and selection of a set of specifying acoustic features instead of using all the possible ones they had in hand. To work with well-selected acoustic features does not mean to delay the whole job, but this will save much time and resources by removing the irrelative information and reducing the high-dimension data calculation. In this paper, we developed an automatic feature selector based on a RF2TREE algorithm and the traditional C4.5 algorithm. RF2TREE applied here helped us to solve the problems that did not have enough data examples. The ensemble learning technique was applied to enlarge the original data set by building a bagged random forest to generate many virtual examples, and then the new data set was used to train a single decision tree, which selects the most efficient features to represent the speech signals for the emotion recognition. Finally, the output of the selector was a set of specifying acoustic features, produced by RF2TREE and a single decision tree.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An intelligent system for text-dependent speaker recognition is proposed in this paper. The system consists of a wavelet-based module as the feature extractor of speech signals and a neural-network-based module as the signal classifier. The Daubechies wavelet is employed to filter and compress the speech signals. The fuzzy ARTMAP (FAM) neural network is used to classify the processed signals. A series of experiments on text-dependent gender and speaker recognition are conducted to assess the effectiveness of the proposed system using a collection of vowel signals from 100 speakers. A variety of operating strategies for improving the FAM performance are examined and compared. The experimental results are analyzed and discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper addresses the problem of speaker recognition from speech signals. The study focuses on the development of a speaker recognition system comprising two modules: a wavelet-based feature extractor, and a neural-network-based classifier. We have conducted a number of experiments to investigate the applicability of Discrete Wavelet Transform (D WT) in extracting discriminative features from the speech signals, and have examined various models from the Adaptive Resonance Theory (ART) family of neural networks in classijjing the extracted features. The results indicate that DWT could be a potential feature extraction tool for speaker recognition. In addition, the ART-based classijiers have yielded very promising recognition accuracy at more than 81%.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Named entity recognition (NER) is an essential step in the process of information extraction within text mining. This paper proposes a technique to extract drug named entities from unstructured and informal medical text using a hybrid model of lexicon-based and rule-based techniques. In the proposed model, a lexicon is first used as the initial step to detect drug named entities. Inference rules are then deployed to further extract undetected drug names. The designed rules employ part of speech tags and morphological features for drug name detection. The proposed hybrid model is evaluated using a benchmark data set from the i2b2 2009 medication challenge, and is able to achieve an f-score of 66.97%.