1000 resultados para Vowel recognition


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A procedure that uses fuzzy ARTMAP and K-Nearest Neighbor (K-NN) categorizers to evaluate intrinsic and extrinsic speaker normalization methods is described. Each classifier is trained on preprocessed, or normalized, vowel tokens from about 30% of the speakers of the Peterson-Barney database, then tested on data from the remaining speakers. Intrinsic normalization methods included one nonscaled, four psychophysical scales (bark, bark with end-correction, mel, ERB), and three log scales, each tested on four different combinations of the fundamental (Fo) and the formants (F1 , F2, F3). For each scale and frequency combination, four extrinsic speaker adaptation schemes were tested: centroid subtraction across all frequencies (CS), centroid subtraction for each frequency (CSi), linear scale (LS), and linear transformation (LT). A total of 32 intrinsic and 128 extrinsic methods were thus compared. Fuzzy ARTMAP and K-NN showed similar trends, with K-NN performing somewhat better and fuzzy ARTMAP requiring about 1/10 as much memory. The optimal intrinsic normalization method was bark scale, or bark with end-correction, using the differences between all frequencies (Diff All). The order of performance for the extrinsic methods was LT, CSi, LS, and CS, with fuzzy AHTMAP performing best using bark scale with Diff All; and K-NN choosing psychophysical measures for all except CSi.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The applications of Automatic Vowel Recognition (AVR), which is a sub-part of fundamental importance in most of the speech processing systems, vary from automatic interpretation of spoken language to biometrics. State-of-the-art systems for AVR are based on traditional machine learning models such as Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs), however, such classifiers can not deal with efficiency and effectiveness at the same time, existing a gap to be explored when real-time processing is required. In this work, we present an algorithm for AVR based on the Optimum-Path Forest (OPF), which is an emergent pattern recognition technique recently introduced in literature. Adopting a supervised training procedure and using speech tags from two public datasets, we observed that OPF has outperformed ANNs, SVMs, plus other classifiers, in terms of training time and accuracy. ©2010 IEEE.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The design and operation of the minimum cost classifier, where the total cost is the sum of the measurement cost and the classification cost, is computationally complex. Noting the difficulties associated with this approach, decision tree design directly from a set of labelled samples is proposed in this paper. The feature space is first partitioned to transform the problem to one of discrete features. The resulting problem is solved by a dynamic programming algorithm over an explicitly ordered state space of all outcomes of all feature subsets. The solution procedure is very general and is applicable to any minimum cost pattern classification problem in which each feature has a finite number of outcomes. These techniques are applied to (i) voiced, unvoiced, and silence classification of speech, and (ii) spoken vowel recognition. The resulting decision trees are operationally very efficient and yield attractive classification accuracies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we deal with the problem of feature selection by introducing a new approach based on Gravitational Search Algorithm (GSA). The proposed algorithm combines the optimization behavior of GSA together with the speed of Optimum-Path Forest (OPF) classifier in order to provide a fast and accurate framework for feature selection. Experiments on datasets obtained from a wide range of applications, such as vowel recognition, image classification and fraud detection in power distribution systems are conducted in order to asses the robustness of the proposed technique against Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and a Particle Swarm Optimization (PSO)-based algorithm for feature selection.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Accurate and fast decoding of speech imagery from electroencephalographic (EEG) data could serve as a basis for a new generation of brain computer interfaces (BCIs), more portable and easier to use. However, decoding of speech imagery from EEG is a hard problem due to many factors. In this paper we focus on the analysis of the classification step of speech imagery decoding for a three-class vowel speech imagery recognition problem. We empirically show that different classification subtasks may require different classifiers for accurately decoding and obtain a classification accuracy that improves the best results previously published. We further investigate the relationship between the classifiers and different sets of features selected by the common spatial patterns method. Our results indicate that further improvement on BCIs based on speech imagery could be achieved by carefully selecting an appropriate combination of classifiers for the subtasks involved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents the design of a full fledged OCR system for printed Kannada text. The machine recognition of Kannada characters is difficult due to similarity in the shapes of different characters, script complexity and non-uniqueness in the representation of diacritics. The document image is subject to line segmentation, word segmentation and zone detection. From the zonal information, base characters, vowel modifiers and consonant conjucts are separated. Knowledge based approach is employed for recognizing the base characters. Various features are employed for recognising the characters. These include the coefficients of the Discrete Cosine Transform, Discrete Wavelet Transform and Karhunen-Louve Transform. These features are fed to different classifiers. Structural features are used in the subsequent levels to discriminate confused characters. Use of structural features, increases recognition rate from 93% to 98%. Apart from the classical pattern classification technique of nearest neighbour, Artificial Neural Network (ANN) based classifiers like Back Propogation and Radial Basis Function (RBF) Networks have also been studied. The ANN classifiers are trained in supervised mode using the transform features. Highest recognition rate of 99% is obtained with RBF using second level approximation coefficients of Haar wavelets as the features on presegmented base characters.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this article, we aim at reducing the error rate of the online Tamil symbol recognition system by employing multiple experts to reevaluate certain decisions of the primary support vector machine classifier. Motivated by the relatively high percentage of occurrence of base consonants in the script, a reevaluation technique has been proposed to correct any ambiguities arising in the base consonants. Secondly, a dynamic time-warping method is proposed to automatically extract the discriminative regions for each set of confused characters. Class-specific features derived from these regions aid in reducing the degree of confusion. Thirdly, statistics of specific features are proposed for resolving any confusions in vowel modifiers. The reevaluation approaches are tested on two databases (a) the isolated Tamil symbols in the IWFHR test set, and (b) the symbols segmented from a set of 10,000 Tamil words. The recognition rate of the isolated test symbols of the IWFHR database improves by 1.9 %. For the word database, the incorporation of the reevaluation step improves the symbol recognition rate by 3.5 % (from 88.4 to 91.9 %). This, in turn, boosts the word recognition rate by 11.9 % (from 65.0 to 76.9 %). The reduction in the word error rate has been achieved using a generic approach, without the incorporation of language models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Human listeners can identify vowels regardless of speaker size, although the sound waves for an adult and a child speaking the ’same’ vowel would differ enormously. The differences are mainly due to the differences in vocal tract length (VTL) and glottal pulse rate (GPR) which are both related to body size. Automatic speech recognition machines are notoriously bad at understanding children if they have been trained on the speech of an adult. In this paper, we propose that the auditory system adapts its analysis of speech sounds, dynamically and automatically to the GPR and VTL of the speaker on a syllable-to-syllable basis. We illustrate how this rapid adaptation might be performed with the aid of a computational version of the auditory image model, and we propose that an auditory preprocessor of this form would improve the robustness of speech recognisers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An intelligent system for text-dependent speaker recognition is proposed in this paper. The system consists of a wavelet-based module as the feature extractor of speech signals and a neural-network-based module as the signal classifier. The Daubechies wavelet is employed to filter and compress the speech signals. The fuzzy ARTMAP (FAM) neural network is used to classify the processed signals. A series of experiments on text-dependent gender and speaker recognition are conducted to assess the effectiveness of the proposed system using a collection of vowel signals from 100 speakers. A variety of operating strategies for improving the FAM performance are examined and compared. The experimental results are analyzed and discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents some results of the application on Evolvable Hardware (EHW) in the area of voice recognition. Evolvable Hardware is able to change inner connections, using genetic learning techniques, adapting its own functionality to external condition changing. This technique became feasible by the improvement of the Programmable Logic Devices. Nowadays, it is possible to have, in a single device, the ability to change, on-line and in real-time, part of its own circuit. This work proposes a reconfigurable architecture of a system that is able to receive voice commands to execute special tasks as, to help handicapped persons in their daily home routines. The idea is to collect several voice samples, process them through algorithms based on Mel - Ceptrais theory to obtain their numerical coefficients for each sample, which, compose the universe of search used by genetic algorithm. The voice patterns considered, are limited to seven sustained Portuguese vowel phonemes (a, eh, e, i, oh, o, u).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper investigates the effectiveness of virtual product placement as a marketing tool by examining the relationship between brand recall and recognition and virtual product placement. It also aims to address a gap in the existing academic literature by focusing on the impact of product placement on recall and recognition of new brands. The growing importance of product placement is discussed and a review of previous research on product placement and virtual product placement is provided. The research methodology used to study the recall and recognition effects of virtual product placement are described and key findings presented. Finally, implications are discussed and recommendations for future research provided.

Relevância:

20.00% 20.00%

Publicador: