40 resultados para Speech Recognition System using LPC


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a description of our system for the Albayzin 2012 LRE competition. One of the main characteristics of this evaluation was the reduced number of available files for training the system, especially for the empty condition where no training data set was provided but only a development set. In addition, the whole database was created from online videos and around one third of the training data was labeled as noisy files. Our primary system was the fusion of three different i-vector based systems: one acoustic system based on MFCCs, a phonotactic system using trigrams of phone-posteriorgram counts, and another acoustic system based on RPLPs that improved robustness against noise. A contrastive system that included new features based on the glottal source was also presented. Official and postevaluation results for all the conditions using the proposed metrics for the evaluation and the Cavg metric are presented in the paper.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The availability of inertial sensors embedded in mobile devices has enabled a new type of interaction based on the movements or “gestures” made by the users when holding the device. In this paper we propose a gesture recognition system for mobile devices based on accelerometer and gyroscope measurements. The system is capable of recognizing a set of predefined gestures in a user-independent way, without the need of a training phase. Furthermore, it was designed to be executed in real-time in resource-constrained devices, and therefore has a low computational complexity. The performance of the system is evaluated offline using a dataset of gestures, and also online, through some user tests with the system running in a smart phone.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a methodology for adapting an advanced communication system for deaf people in a new domain. This methodology is a user-centered design approach consisting of four main steps: requirement analysis, parallel corpus generation, technology adaptation to the new domain, and finally, system evaluation. In this paper, the new considered domain has been the dialogues in a hotel reception. With this methodology, it was possible to develop the system in a few months, obtaining very good performance: good speech recognition and translation rates (around 90%) with small processing times.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A depth-based face recognition algorithm specially adapted to high range resolution data acquired by the new Microsoft Kinect 2 sensor is presented. A novel descriptor called Depth Local Quantized Pattern descriptor has been designed to make use of the extended range resolution of the new sensor. This descriptor is a substantial modification of the popular Local Binary Pattern algorithm. One of the main contributions is the introduction of a quantification step, increasing its capacity to distinguish different depth patterns. The proposed descriptor has been used to train and test a Support Vector Machine classifier, which has proven to be able to accurately recognize different people faces from a wide range of poses. In addition, a new depth-based face database acquired by the new Kinect 2 sensor have been created and made public to evaluate the proposed face recognition system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A more natural, intuitive, user-friendly, and less intrusive Human–Computer interface for controlling an application by executing hand gestures is presented. For this purpose, a robust vision-based hand-gesture recognition system has been developed, and a new database has been created to test it. The system is divided into three stages: detection, tracking, and recognition. The detection stage searches in every frame of a video sequence potential hand poses using a binary Support Vector Machine classifier and Local Binary Patterns as feature vectors. These detections are employed as input of a tracker to generate a spatio-temporal trajectory of hand poses. Finally, the recognition stage segments a spatio-temporal volume of data using the obtained trajectories, and compute a video descriptor called Volumetric Spatiograms of Local Binary Patterns (VS-LBP), which is delivered to a bank of SVM classifiers to perform the gesture recognition. The VS-LBP is a novel video descriptor that constitutes one of the most important contributions of the paper, which is able to provide much richer spatio-temporal information than other existing approaches in the state of the art with a manageable computational cost. Excellent results have been obtained outperforming other approaches of the state of the art.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Although there has been a lot of interest in recognizing and understanding air traffic control (ATC) speech, none of the published works have obtained detailed field data results. We have developed a system able to identify the language spoken and recognize and understand sentences in both Spanish and English. We also present field results for several in-tower controller positions. To the best of our knowledge, this is the first time that field ATC speech (not simulated) is captured, processed, and analyzed. The use of stochastic grammars allows variations in the standard phraseology that appear in field data. The robust understanding algorithm developed has 95% concept accuracy from ATC text input. It also allows changes in the presentation order of the concepts and the correction of errors created by the speech recognition engine improving it by 17% and 25%, respectively, absolute in the percentage of fully correctly understood sentences for English and Spanish in relation to the percentages of fully correctly recognized sentences. The analysis of errors due to the spontaneity of the speech and its comparison to read speech is also carried out. A 96% word accuracy for read speech is reduced to 86% word accuracy for field ATC data for Spanish for the "clearances" task confirming that field data is needed to estimate the performance of a system. A literature review and a critical discussion on the possibilities of speech recognition and understanding technology applied to ATC speech are also given.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article evaluates an authentication technique for mobiles based on gestures. Users create a remindful identifying gesture to be considered as their in-air signature. This work analyzes a database of 120 gestures of different vulnerability, obtaining an Equal Error Rate (EER) of 9.19% when robustness of gestures is not verified. Most of the errors in this EER come from very simple and easily forgeable gestures that should be discarded at enrollment phase. Therefore, an in-air signature robustness verification system using Linear Discriminant Analysis is proposed to infer automatically whether the gesture is secure or not. Different configurations have been tested obtaining a lowest EER of 4.01% when 45.02% of gestures were discarded, and an optimal compromise of EER of 4.82% when 19.19% of gestures were automatically rejected.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes an architecture, based on statistical machine translation, for developing the text normalization module of a text to speech conversion system. The main target is to generate a language independent text normalization module, based on data and flexible enough to deal with all situa-tions presented in this task. The proposed architecture is composed by three main modules: a tokenizer module for splitting the text input into a token graph (tokenization), a phrase-based translation module (token translation) and a post-processing module for removing some tokens. This paper presents initial exper-iments for numbers and abbreviations. The very good results obtained validate the proposed architecture.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present an approach to adapt dynamically the language models (LMs) used by a speech recognizer that is part of a spoken dialogue system. We have developed a grammar generation strategy that automatically adapts the LMs using the semantic information that the user provides (represented as dialogue concepts), together with the information regarding the intentions of the speaker (inferred by the dialogue manager, and represented as dialogue goals). We carry out the adaptation as a linear interpolation between a background LM, and one or more of the LMs associated to the dialogue elements (concepts or goals) addressed by the user. The interpolation weights between those models are automatically estimated on each dialogue turn, using measures such as the posterior probabilities of concepts and goals, estimated as part of the inference procedure to determine the actions to be carried out. We propose two approaches to handle the LMs related to concepts and goals. Whereas in the first one we estimate a LM for each one of them, in the second one we apply several clustering strategies to group together those elements that share some common properties, and estimate a LM for each cluster. Our evaluation shows how the system can estimate a dynamic model adapted to each dialogue turn, which helps to improve the performance of the speech recognition (up to a 14.82% of relative improvement), which leads to an improvement in both the language understanding and the dialogue management tasks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

MFCC coefficients extracted from the power spectral density of speech as a whole, seems to have become the de facto standard in the area of speaker recognition, as demonstrated by its use in almost all systems submitted to the 2013 Speaker Recognition Evaluation (SRE) in Mobile Environment [1], thus relegating to background this component of the recognition systems. However, in this article we will show that selecting the adequate speaker characterization system is as important as the selection of the classifier. To accomplish this we will compare the recognition rates achieved by different recognition systems that relies on the same classifier (GMM-UBM) but connected with different feature extraction systems (based on both classical and biometric parameters). As a result we will show that a gender dependent biometric parameterization with a simple recognition system based on GMM- UBM paradigm provides very competitive or even better recognition rates when compared to more complex classification systems based on classical features

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Human Activity Recognition (HAR) is an emerging research field with the aim to identify the actions carried out by a person given a set of observations and the surrounding environment. The wide growth in this research field inside the scientific community is mainly explained by the high number of applications that are arising in the last years. A great part of the most promising applications are related to the healthcare field, where it is possible to track the mobility of patients with motor dysfunction as also the physical activity in patients with cardiovascular risk. Until a few years ago, by using distinct kind of sensors, a patient follow-up was possible. However, far from being a long-term solution and with the smartphone irruption, that monitoring can be achieved in a non-invasive way by using the embedded smartphone’s sensors. For these reasons this Final Degree Project arises with the main target to evaluate new feature extraction techniques in order to carry out an activity and user recognition, and also an activity segmentation. The recognition is done thanks to the inertial signals integration obtained by two widespread sensors in the greater part of smartphones: accelerometer and gyroscope. In particular, six different activities are evaluated walking, walking-upstairs, walking-downstairs, sitting, standing and lying. Furthermore, a segmentation task is carried out taking into account the activities performed by thirty users. This can be done by using Hidden Markov Models and also a set of tools tested satisfactory in speech recognition: HTK (Hidden Markov Model Toolkit).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Este trabajo de Tesis ha abordado el objetivo de dar robustez y mejorar la Detección de Actividad de Voz en entornos acústicos adversos con el fin de favorecer el comportamiento de muchas aplicaciones vocales, por ejemplo aplicaciones de telefonía basadas en reconocimiento automático de voz, aplicaciones en sistemas de transcripción automática, aplicaciones en sistemas multicanal, etc. En especial, aunque se han tenido en cuenta todos los tipos de ruido, se muestra especial interés en el estudio de las voces de fondo, principal fuente de error de la mayoría de los Detectores de Actividad en la actualidad. Las tareas llevadas a cabo poseen como punto de partida un Detector de Actividad basado en Modelos Ocultos de Markov, cuyo vector de características contiene dos componentes: la energía normalizada y la variación de la energía. Las aportaciones fundamentales de esta Tesis son las siguientes: 1) ampliación del vector de características de partida dotándole así de información espectral, 2) ajuste de los Modelos Ocultos de Markov al entorno y estudio de diferentes topologías y, finalmente, 3) estudio e inclusión de nuevas características, distintas de las del punto 1, para filtrar los pulsos de pronunciaciones que proceden de las voces de fondo. Los resultados de detección, teniendo en cuenta los tres puntos anteriores, muestran con creces los avances realizados y son significativamente mejores que los resultados obtenidos, bajo las mismas condiciones, con otros detectores de actividad de referencia. This work has been focused on improving the robustness at Voice Activity Detection in adverse acoustic environments in order to enhance the behavior of many vocal applications, for example telephony applications based on automatic speech recognition, automatic transcription applications, multichannel systems applications, and so on. In particular, though all types of noise have taken into account, this research has special interest in the study of pronunciations coming from far-field speakers, the main error source of most activity detectors today. The tasks carried out have, as starting point, a Hidden Markov Models Voice Activity Detector which a feature vector containing two components: normalized energy and delta energy. The key points of this Thesis are the following: 1) feature vector extension providing spectral information, 2) Hidden Markov Models adjustment to environment and study of different Hidden Markov Model topologies and, finally, 3) study and inclusion of new features, different from point 1, to reject the pronunciations coming from far-field speakers. Detection results, taking into account the above three points, show the advantages of using this method and are significantly better than the results obtained under the same conditions by other well-known voice activity detectors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The time delay of arrival (TDOA) between multiple microphones has been used since 2006 as a source of information (localization) to complement the spectral features for speaker diarization. In this paper, we propose a new localization feature, the intensity channel contribution (ICC) based on the relative energy of the signal arriving at each channel compared to the sum of the energy of all the channels. We have demonstrated that by joining the ICC features and the TDOA features, the robustness of the localization features is improved and that the diarization error rate (DER) of the complete system (using localization and spectral features) has been reduced. By using this new localization feature, we have been able to achieve a 5.2% DER relative improvement in our development data, a 3.6% DER relative improvement in the RT07 evaluation data and a 7.9% DER relative improvement in the last year's RT09 evaluation data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Incorporating the possibility of attaching attributes to variables in a logic programming system has been shown to allow the addition of general constraint solving capabilities to it. This approach is very attractive in that by adding a few primitives any logic programming system can be turned into a generic constraint logic programming system in which constraint solving can be user deñned, and at source level - an extreme example of the "glass box" approach. In this paper we propose a different and novel use for the concept of attributed variables: developing a generic parallel/concurrent (constraint) logic programming system, using the same "glass box" flavor. We argüe that a system which implements attributed variables and a few additional primitives can be easily customized at source level to implement many of the languages and execution models of parallelism and concurrency currently proposed, in both shared memory and distributed systems. We illustrate this through examples and report on an implementation of our ideas.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This document describes the basic steps to developed and embedded Linux-based system using the BeagleBoard. The document has been specifically written to use a BeagleBoard development system based on the OMAP `processor.