914 resultados para robust speech recognition
Resumo:
Cette thèse contribue a la recherche vers l'intelligence artificielle en utilisant des méthodes connexionnistes. Les réseaux de neurones récurrents sont un ensemble de modèles séquentiels de plus en plus populaires capable en principe d'apprendre des algorithmes arbitraires. Ces modèles effectuent un apprentissage en profondeur, un type d'apprentissage machine. Sa généralité et son succès empirique en font un sujet intéressant pour la recherche et un outil prometteur pour la création de l'intelligence artificielle plus générale. Le premier chapitre de cette thèse donne un bref aperçu des sujets de fonds: l'intelligence artificielle, l'apprentissage machine, l'apprentissage en profondeur et les réseaux de neurones récurrents. Les trois chapitres suivants couvrent ces sujets de manière de plus en plus spécifiques. Enfin, nous présentons quelques contributions apportées aux réseaux de neurones récurrents. Le chapitre \ref{arxiv1} présente nos travaux de régularisation des réseaux de neurones récurrents. La régularisation vise à améliorer la capacité de généralisation du modèle, et joue un role clé dans la performance de plusieurs applications des réseaux de neurones récurrents, en particulier en reconnaissance vocale. Notre approche donne l'état de l'art sur TIMIT, un benchmark standard pour cette tâche. Le chapitre \ref{cpgp} présente une seconde ligne de travail, toujours en cours, qui explore une nouvelle architecture pour les réseaux de neurones récurrents. Les réseaux de neurones récurrents maintiennent un état caché qui représente leurs observations antérieures. L'idée de ce travail est de coder certaines dynamiques abstraites dans l'état caché, donnant au réseau une manière naturelle d'encoder des tendances cohérentes de l'état de son environnement. Notre travail est fondé sur un modèle existant; nous décrivons ce travail et nos contributions avec notamment une expérience préliminaire.
Resumo:
This work focuses on Machine Translation (MT) and Speech-to-Speech Translation, two emerging technologies that allow users to automatically translate written and spoken texts. The first part of this work provides a theoretical framework for the evaluation of Google Translate and Microsoft Translator, which is at the core of this study. Chapter one focuses on Machine Translation, providing a definition of this technology and glimpses of its history. In this chapter we will also learn how MT works, who uses it, for what purpose, what its pros and cons are, and how machine translation quality can be defined and assessed. Chapter two deals with Speech-to-Speech Translation by focusing on its history, characteristics and operation, potential uses and limits deriving from the intrinsic difficulty of translating spoken language. After describing the future prospects for SST, the final part of this chapter focuses on the quality assessment of Speech-to-Speech Translation applications. The last part of this dissertation describes the evaluation test carried out on Google Translate and Microsoft Translator, two mobile translation apps also providing a Speech-to-Speech Translation service. Chapter three illustrates the objectives, the research questions, the participants, the methodology and the elaboration of the questionnaires used to collect data. The collected data and the results of the evaluation of the automatic speech recognition subsystem and the language translation subsystem are presented in chapter four and finally analysed and compared in chapter five, which provides a general description of the performance of the evaluated apps and possible explanations for each set of results. In the final part of this work suggestions are made for future research and reflections on the usability and usefulness of the evaluated translation apps are provided.
Resumo:
Cette thèse contribue a la recherche vers l'intelligence artificielle en utilisant des méthodes connexionnistes. Les réseaux de neurones récurrents sont un ensemble de modèles séquentiels de plus en plus populaires capable en principe d'apprendre des algorithmes arbitraires. Ces modèles effectuent un apprentissage en profondeur, un type d'apprentissage machine. Sa généralité et son succès empirique en font un sujet intéressant pour la recherche et un outil prometteur pour la création de l'intelligence artificielle plus générale. Le premier chapitre de cette thèse donne un bref aperçu des sujets de fonds: l'intelligence artificielle, l'apprentissage machine, l'apprentissage en profondeur et les réseaux de neurones récurrents. Les trois chapitres suivants couvrent ces sujets de manière de plus en plus spécifiques. Enfin, nous présentons quelques contributions apportées aux réseaux de neurones récurrents. Le chapitre \ref{arxiv1} présente nos travaux de régularisation des réseaux de neurones récurrents. La régularisation vise à améliorer la capacité de généralisation du modèle, et joue un role clé dans la performance de plusieurs applications des réseaux de neurones récurrents, en particulier en reconnaissance vocale. Notre approche donne l'état de l'art sur TIMIT, un benchmark standard pour cette tâche. Le chapitre \ref{cpgp} présente une seconde ligne de travail, toujours en cours, qui explore une nouvelle architecture pour les réseaux de neurones récurrents. Les réseaux de neurones récurrents maintiennent un état caché qui représente leurs observations antérieures. L'idée de ce travail est de coder certaines dynamiques abstraites dans l'état caché, donnant au réseau une manière naturelle d'encoder des tendances cohérentes de l'état de son environnement. Notre travail est fondé sur un modèle existant; nous décrivons ce travail et nos contributions avec notamment une expérience préliminaire.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
Thesis (Master's)--University of Washington, 2016-06
Resumo:
These are the full proceedings of the conference.
Resumo:
This paper reviews some basic issues and methods involved in using neural networks to respond in a desired fashion to a temporally-varying environment. Some popular network models and training methods are introduced. A speech recognition example is then used to illustrate the central difficulty of temporal data processing: learning to notice and remember relevant contextual information. Feedforward network methods are applicable to cases where this problem is not severe. The application of these methods are explained and applications are discussed in the areas of pure mathematics, chemical and physical systems, and economic systems. A more powerful but less practical algorithm for temporal problems, the moving targets algorithm, is sketched and discussed. For completeness, a few remarks are made on reinforcement learning.
Resumo:
Keyword identification in one of two simultaneous sentences is improved when the sentences differ in F0, particularly when they are almost continuously voiced. Sentences of this kind were recorded, monotonised using PSOLA, and re-synthesised to give a range of harmonic ?F0s (0, 1, 3, and 10 semitones). They were additionally re-synthesised by LPC with the LPC residual frequency shifted by 25% of F0, to give excitation with inharmonic but regularly spaced components. Perceptual identification of frequency-shifted sentences showed a similar large improvement with nominal ?F0 as seen for harmonic sentences, although overall performance was about 10% poorer. We compared performance with that of two autocorrelation-based computational models comprising four stages: (i) peripheral frequency selectivity and half-wave rectification; (ii) within-channel periodicity extraction; (iii) identification of the two major peaks in the summary autocorrelation function (SACF); (iv) a template-based approach to speech recognition using dynamic time warping. One model sampled the correlogram at the target-F0 period and performed spectral matching; the other deselected channels dominated by the interferer and performed matching on the short-lag portion of the residual SACF. Both models reproduced the monotonic increase observed in human performance with increasing ?F0 for the harmonic stimuli, but not for the frequency-shifted stimuli. A revised version of the spectral-matching model, which groups patterns of periodicity that lie on a curve in the frequency-delay plane, showed a closer match to the perceptual data for frequency-shifted sentences. The results extend the range of phenomena originally attributed to harmonic processing to grouping by common spectral pattern.
Resumo:
Modern technology has moved on and completely changed the way that people can use the telephone or mobile to dialogue with information held on computers. Well developed “written speech analysis” does not work with “verbal speech”. The main purpose of our article is, firstly, to highlights the problems and, secondly, to shows the possible ways to solve these problems.
Resumo:
This paper presents an effective decision making system for leak detection based on multiple generalized linear models and clustering techniques. The training data for the proposed decision system is obtained by setting up an experimental pipeline fully operational distribution system. The system is also equipped with data logging for three variables; namely, inlet pressure, outlet pressure, and outlet flow. The experimental setup is designed such that multi-operational conditions of the distribution system, including multi pressure and multi flow can be obtained. We then statistically tested and showed that pressure and flow variables can be used as signature of leak under the designed multi-operational conditions. It is then shown that the detection of leakages based on the training and testing of the proposed multi model decision system with pre data clustering, under multi operational conditions produces better recognition rates in comparison to the training based on the single model approach. This decision system is then equipped with the estimation of confidence limits and a method is proposed for using these confidence limits for obtaining more robust leakage recognition results.
Resumo:
A novel approach of automatic ECG analysis based on scale-scale signal representation is proposed. The approach uses curvature scale-space representation to locate main ECG waveform limits and peaks and may be used to correct results of other ECG analysis techniques or independently. Moreover dynamic matching of ECG CSS representations provides robust preliminary recognition of ECG abnormalities which has been proven by experimental results.
Resumo:
Negli ultimi anni, l'avanzamento incredibilmente rapido della tecnologia ha portato allo sviluppo e alla diffusione di dispositivi elettronici portatili aventi dimensioni estremamente ridotte e, allo stesso tempo, capacità computazionali molto notevoli. Più nello specifico, una particolare categoria di dispositivi, attualmente in forte sviluppo, che ha già fatto la propria comparsa sul mercato mondiale è sicuramente la categoria dei dispositivi Wearable. Come suggerisce il nome, questi sono progettati per essere letteralmente indossati, pensati per fornire continuo supporto, in diversi ambiti, a chi li utilizza. Se per interagire con essi l’utente non deve ricorrere obbligatoriamente all'utilizzo delle mani, allora si parla di dispositivi Wearable Hands Free. Questi sono generalmente in grado di percepire e catture l’input dell'utente seguendo tecniche e metodologie diverse, non basate sul tatto. Una di queste è sicuramente quella che prevede di modellare l’input dell’utente stesso attraverso la sua voce, appoggiandosi alla disciplina dell’ASR (Automatic Speech Recognition), che si occupa della traduzione del linguaggio parlato in testo, mediante l’utilizzo di dispositivi computerizzati. Si giunge quindi all’obiettivo della tesi, che è quello di sviluppare un framework, utilizzabile nell’ambito dei dispositivi Wearable, che fornisca un servizio di riconoscimento vocale appoggiandosi ad uno già esistente, in modo che presenti un certo livello di efficienza e facilità di utilizzo. Più in generale, in questo documento si punta a fornire una descrizione approfondita di quelli che sono i dispositivi Wearable e Wearable Hands-Free, definendone caratteristiche, criticità e ambiti di utilizzo. Inoltre, l’intento è quello di illustrare i principi di funzionamento dell’Automatic Speech Recognition per passare poi ad analisi, progettazione e sviluppo del framework appena citato.
Resumo:
This presentation summarizes experience with the automated speech recognition and translation approach realised in the context of the European project EMMA.
Resumo:
ARAUJO, Márcio V. ; ALSINA, Pablo J. ; MEDEIROS, Adelardo A. D. ; PEREIRA, Jonathan P.P. ; DOMINGOS, Elber C. ; ARAÚJO, Fábio M.U. ; SILVA, Jáder S. . Development of an Active Orthosis Prototype for Lower Limbs. In: INTERNATIONAL CONGRESS OF MECHANICAL ENGINEERING, 20., 2009, Gramado, RS. Proceedings… Gramado, RS: [s. n.], 2009