925 resultados para Visual Speech Recognition, Multiple Views, Frontal View, Profile View
Resumo:
Although there has been a lot of interest in recognizing and understanding air traffic control (ATC) speech, none of the published works have obtained detailed field data results. We have developed a system able to identify the language spoken and recognize and understand sentences in both Spanish and English. We also present field results for several in-tower controller positions. To the best of our knowledge, this is the first time that field ATC speech (not simulated) is captured, processed, and analyzed. The use of stochastic grammars allows variations in the standard phraseology that appear in field data. The robust understanding algorithm developed has 95% concept accuracy from ATC text input. It also allows changes in the presentation order of the concepts and the correction of errors created by the speech recognition engine improving it by 17% and 25%, respectively, absolute in the percentage of fully correctly understood sentences for English and Spanish in relation to the percentages of fully correctly recognized sentences. The analysis of errors due to the spontaneity of the speech and its comparison to read speech is also carried out. A 96% word accuracy for read speech is reduced to 86% word accuracy for field ATC data for Spanish for the "clearances" task confirming that field data is needed to estimate the performance of a system. A literature review and a critical discussion on the possibilities of speech recognition and understanding technology applied to ATC speech are also given.
Resumo:
We present a novel approach for the detection of severe obstructive sleep apnea (OSA) based on patients' voices introducing nonlinear measures to describe sustained speech dynamics. Nonlinear features were combined with state-of-the-art speech recognition systems using statistical modeling techniques (Gaussian mixture models, GMMs) over cepstral parameterization (MFCC) for both continuous and sustained speech. Tests were performed on a database including speech records from both severe OSA and control speakers. A 10 % relative reduction in classification error was obtained for sustained speech when combining MFCC-GMM and nonlinear features, and 33 % when fusing nonlinear features with both sustained and continuous MFCC-GMM. Accuracy reached 88.5 % allowing the system to be used in OSA early detection. Tests showed that nonlinear features and MFCCs are lightly correlated on sustained speech, but uncorrelated on continuous speech. Results also suggest the existence of nonlinear effects in OSA patients' voices, which should be found in continuous speech.
Resumo:
This paper presents a methodology for adapting an advanced communication system for deaf people in a new domain. This methodology is a user-centered design approach consisting of four main steps: requirement analysis, parallel corpus generation, technology adaptation to the new domain, and finally, system evaluation. In this paper, the new considered domain has been the dialogues in a hotel reception. With this methodology, it was possible to develop the system in a few months, obtaining very good performance: good speech recognition and translation rates (around 90%) with small processing times.
Diseño de un videojuego orientado a mejorar el proceso de enseñanza-aprendizaje de la lengua inglesa
Resumo:
Desde que el proceso de la globalización empezó a tener efectos en la sociedad actual, la lengua inglesa se ha impuesto como primera opción de comunicación entre las grandes empresas y sobre todo en el ámbito de los negocios. Por estos motivos se hace necesario el conocimiento de esta lengua que con el paso de los años ha ido creciendo en número de hablantes. Cada vez son más las personas que quieren dominar la lengua inglesa. El aprendizaje en esta doctrina se va iniciando en edades muy tempranas, facilitando y mejorando así la adquisición de una base de conocimientos con todas las destrezas que tiene la lengua inglesa: lectura, escritura, expresión oral y comprensión oral. Con este proyecto se quiso mejorar el proceso de enseñanza-aprendizaje de la lengua inglesa en un rango de población menor de 13 años. Se propuso crear un método de aprendizaje que motivara al usuario y le reportase una ayuda constante durante su progreso en el conocimiento de la lengua inglesa. El mejor método que se pensó para llevar a cabo este objetivo fue la realización de un videojuego que cumpliese todas las características propuestas anteriormente. Un videojuego de aprendizaje en inglés, que además incluyese algo tan novedoso como el reconocimiento de voz para mejorar la expresión oral del usuario, ayudaría a la población a mejorar el nivel de inglés básico en todas las destrezas así como el establecimiento de una base sólida que serviría para asentar mejor futuros conocimientos más avanzados. ABSTRACT Since Globalization began to have an effect on today's society, the English language has emerged as the first choice for communication among companies and especially in the field of business. Therefore, the command of this language, which over the years has grown in number of speakers, has become more and more necessary. Increasingly people want to master the English language. They start learning at very early age, thus facilitating and improving the acquisition of a new knowledge like English language. The skills of English must be practiced are: reading, writing, listening and speaking. If people learnt all these skills, they could achieve a high level of English. In this project, the aim is to improve the process of teaching and learning English in a range of population less than 13 years. To do so, an interactive learning video game that motivates the users and brings them constant help during their progress in the learning of the English language is designed. The video game designed to learn English, also includes some novelties from the point of view of the technology used as is speech recognition. The aim of this integration is to improve speaking skills of users, who will therefore improve the standard of English in all four basic learning skills and establish a solid base that would facilitate the acquisition of future advanced knowledge.
Resumo:
This paper describes the GTH-UPM system for the Albayzin 2014 Search on Speech Evaluation. Teh evaluation task consists of searching a list of terms/queries in audio files. The GTH-UPM system we are presenting is based on a LVCSR (Large Vocabulary Continuous Speech Recognition) system. We have used MAVIR corpus and the Spanish partition of the EPPS (European Parliament Plenary Sessions) database for training both acoustic and language models. The main effort has been focused on lexicon preparation and text selection for the language model construction. The system makes use of different lexicon and language models depending on the task that is performed. For the best configuration of the system on the development set, we have obtained a FOM of 75.27 for the deyword spotting task.