961 resultados para Robust speech recognition


Relevância:

80.00% 80.00%

Publicador:

Resumo:

A novel approach of automatic ECG analysis based on scale-scale signal representation is proposed. The approach uses curvature scale-space representation to locate main ECG waveform limits and peaks and may be used to correct results of other ECG analysis techniques or independently. Moreover dynamic matching of ECG CSS representations provides robust preliminary recognition of ECG abnormalities which has been proven by experimental results.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Negli ultimi anni, l'avanzamento incredibilmente rapido della tecnologia ha portato allo sviluppo e alla diffusione di dispositivi elettronici portatili aventi dimensioni estremamente ridotte e, allo stesso tempo, capacità computazionali molto notevoli. Più nello specifico, una particolare categoria di dispositivi, attualmente in forte sviluppo, che ha già fatto la propria comparsa sul mercato mondiale è sicuramente la categoria dei dispositivi Wearable. Come suggerisce il nome, questi sono progettati per essere letteralmente indossati, pensati per fornire continuo supporto, in diversi ambiti, a chi li utilizza. Se per interagire con essi l’utente non deve ricorrere obbligatoriamente all'utilizzo delle mani, allora si parla di dispositivi Wearable Hands Free. Questi sono generalmente in grado di percepire e catture l’input dell'utente seguendo tecniche e metodologie diverse, non basate sul tatto. Una di queste è sicuramente quella che prevede di modellare l’input dell’utente stesso attraverso la sua voce, appoggiandosi alla disciplina dell’ASR (Automatic Speech Recognition), che si occupa della traduzione del linguaggio parlato in testo, mediante l’utilizzo di dispositivi computerizzati. Si giunge quindi all’obiettivo della tesi, che è quello di sviluppare un framework, utilizzabile nell’ambito dei dispositivi Wearable, che fornisca un servizio di riconoscimento vocale appoggiandosi ad uno già esistente, in modo che presenti un certo livello di efficienza e facilità di utilizzo. Più in generale, in questo documento si punta a fornire una descrizione approfondita di quelli che sono i dispositivi Wearable e Wearable Hands-Free, definendone caratteristiche, criticità e ambiti di utilizzo. Inoltre, l’intento è quello di illustrare i principi di funzionamento dell’Automatic Speech Recognition per passare poi ad analisi, progettazione e sviluppo del framework appena citato.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This presentation summarizes experience with the automated speech recognition and translation approach realised in the context of the European project EMMA.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

ARAUJO, Márcio V. ; ALSINA, Pablo J. ; MEDEIROS, Adelardo A. D. ; PEREIRA, Jonathan P.P. ; DOMINGOS, Elber C. ; ARAÚJO, Fábio M.U. ; SILVA, Jáder S. . Development of an Active Orthosis Prototype for Lower Limbs. In: INTERNATIONAL CONGRESS OF MECHANICAL ENGINEERING, 20., 2009, Gramado, RS. Proceedings… Gramado, RS: [s. n.], 2009

Relevância:

80.00% 80.00%

Publicador:

Resumo:

ARAUJO, Márcio V. ; ALSINA, Pablo J. ; MEDEIROS, Adelardo A. D. ; PEREIRA, Jonathan P.P. ; DOMINGOS, Elber C. ; ARAÚJO, Fábio M.U. ; SILVA, Jáder S. . Development of an Active Orthosis Prototype for Lower Limbs. In: INTERNATIONAL CONGRESS OF MECHANICAL ENGINEERING, 20., 2009, Gramado, RS. Proceedings… Gramado, RS: [s. n.], 2009

Relevância:

80.00% 80.00%

Publicador:

Resumo:

While humans can easily segregate and track a speaker's voice in a loud noisy environment, most modern speech recognition systems still perform poorly in loud background noise. The computational principles behind auditory source segregation in humans is not yet fully understood. In this dissertation, we develop a computational model for source segregation inspired by auditory processing in the brain. To support the key principles behind the computational model, we conduct a series of electro-encephalography experiments using both simple tone-based stimuli and more natural speech stimulus. Most source segregation algorithms utilize some form of prior information about the target speaker or use more than one simultaneous recording of the noisy speech mixtures. Other methods develop models on the noise characteristics. Source segregation of simultaneous speech mixtures with a single microphone recording and no knowledge of the target speaker is still a challenge. Using the principle of temporal coherence, we develop a novel computational model that exploits the difference in the temporal evolution of features that belong to different sources to perform unsupervised monaural source segregation. While using no prior information about the target speaker, this method can gracefully incorporate knowledge about the target speaker to further enhance the segregation.Through a series of EEG experiments we collect neurological evidence to support the principle behind the model. Aside from its unusual structure and computational innovations, the proposed model provides testable hypotheses of the physiological mechanisms of the remarkable perceptual ability of humans to segregate acoustic sources, and of its psychophysical manifestations in navigating complex sensory environments. Results from EEG experiments provide further insights into the assumptions behind the model and provide motivation for future single unit studies that can provide more direct evidence for the principle of temporal coherence.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Memristori on yksi elektroniikan peruskomponenteista vastuksen, kondensaattorin ja kelan lisäksi. Se on passiivinen komponentti, jonka teorian kehitti Leon Chua vuonna 1971. Kesti kuitenkin yli kolmekymmentä vuotta ennen kuin teoria pystyttiin yhdistämään kokeellisiin tuloksiin. Vuonna 2008 Hewlett Packard julkaisi artikkelin, jossa he väittivät valmistaneensa ensimmäisen toimivan memristorin. Memristori eli muistivastus on resistiivinen komponentti, jonka vastusarvoa pystytään muuttamaan. Nimens mukaisesti memristori kykenee myös säilyttämään vastusarvonsa ilman jatkuvaa virtaa ja jännitettä. Tyypillisesti memristorilla on vähintään kaksi vastusarvoa, joista kumpikin pystytään valitsemaan syöttämällä komponentille jännitettä tai virtaa. Tämän vuoksi memristoreita kutsutaankin usein resistiivisiksi kytkimiksi. Resistiivisiä kytkimiä tutkitaan nykyään paljon erityisesti niiden mahdollistaman muistiteknologian takia. Resistiivisistä kytkimistä rakennettua muistia kutsutaan ReRAM-muistiksi (lyhenne sanoista resistive random access memory). ReRAM-muisti on Flash-muistin tapaan haihtumaton muisti, jota voidaan sähköisesti ohjelmoida tai tyhjentää. Flash-muistia käytetään tällä hetkellä esimerkiksi muistitikuissa. ReRAM-muisti mahdollistaa kuitenkin nopeamman ja vähävirtaiseman toiminnan Flashiin verrattuna, joten se on tulevaisuudessa varteenotettava kilpailija markkinoilla. ReRAM-muisti mahdollistaa myös useammin bitin tallentamisen yhteen muistisoluun binäärisen (”0” tai ”1”) toiminnan sijaan. Tyypillisesti ReRAM-muistisolulla on kaksi rajoittavaa vastusarvoa, mutta näiden kahden tilan välille pystytään mahdollisesti ohjelmoimaan useampia tiloja. Muistisoluja voidaan kutsua analogisiksi, jos tilojen määrää ei ole rajoitettu. Analogisilla muistisoluilla olisi mahdollista rakentaa tehokkaasti esimerkiksi neuroverkkoja. Neuroverkoilla pyritään mallintamaan aivojen toimintaa ja suorittamaan tehtäviä, jotka ovat tyypillisesti vaikeita perinteisille tietokoneohjelmille. Neuroverkkoja käytetään esimerkiksi puheentunnistuksessa tai tekoälytoteutuksissa. Tässä diplomityössä tarkastellaan Ta2O5 -perustuvan ReRAM-muistisolun analogista toimintaa pitäen mielessä soveltuvuus neuroverkkoihin. ReRAM-muistisolun valmistus ja mittaustulokset käydään läpi. Muistisolun toiminta on harvoin täysin analogista, koska kahden rajoittavan vastusarvon välillä on usein rajattu määrä tiloja. Tämän vuoksi toimintaa kutsutaan pseudoanalogiseksi. Mittaustulokset osoittavat, että yksittäinen ReRAM-muistisolu kykenee binääriseen toimintaan hyvin. Joiltain osin yksittäinen solu kykenee tallentamaan useampia tiloja, mutta vastusarvoissa on peräkkäisten ohjelmointisyklien välillä suurta vaihtelevuutta, joka hankaloittaa tulkintaa. Valmistettu ReRAM-muistisolu ei sellaisenaan kykene toimimaan pseudoanalogisena muistina, vaan se vaati rinnalleen virtaa rajoittavan komponentin. Myös valmistusprosessin kehittäminen vähentäisi yksittäisen solun toiminnassa esiintyvää varianssia, jolloin sen toiminta muistuttaisi enemmän pseudoanalogista muistia.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This work focuses in the formal and technical analysis of some aspects of a constructed language. As a first part of the work, a possible coding for the language will be studied, emphasizing the pre x coding, for which an extension of the Hu man algorithm from binary to n-ary will be implemented. Because of that in the language we can't know a priori the frequency of use of the words, a study will be done and several strategies will be proposed for an open words system, analyzing previously the existing number of words in current natural languages. As a possible upgrade of the coding, we'll take also a look to the synchronization loss problem, as well as to its solution: the self-synchronization, a t-codes study with the number of possible words for the language, as well as other alternatives. Finally, and from a less formal approach, several applications for the language have been developed: A voice synthesizer, a speech recognition system and a system font for the use of the language in text processors. For each of these applications, the process used for its construction, as well as the problems encountered and still to solve in each will be detailed.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Early intervention is the key to spoken language for hearing impaired children. A severe hearing loss diagnosis in young children raises the urgent question on the optimal type of hearing aid device. As there is no recent data on comparing selection criteria for a specific hearing aid device, the goal of the Hearing Evaluation of Auditory Rehabilitation Devices (hEARd) project (Coninx & Vermeulen, 2012) evolved to collect and analyze interlingually comparable normative data on the speech perception performances of children with hearing aids and children with cochlear implants (CI). METHOD: In various institutions for hearing rehabilitation in Belgium, Germany and the Netherlands the Adaptive Auditory Speech Test AAST was used in the hEARd project, to determine speech perception abilities in kindergarten and school aged hearing impaired children. Results in the speech audiometric procedures were matched to the unaided hearing loss values of children using hearing aids and compared to results of children using CI. 277 data sets of hearing impaired children were analyzed. Results of children using hearing aids were summarized in groups as to their unaided hearing loss values. The grouping was related to the World Health Organization’s (WHO) grading of hearing impairment from mild (25–40 dB HL) to moderate (41–60 dB HL), severe (61-80 dB HL) and profound hearing impairment (80 dB HL and higher). RESULTS: AAST speech recognition results in quiet showed a significantly better performance for the CI group in comparison to the group of profoundly impaired hearing aid users as well as the group of severely impaired hearing aid users. However the CI users’ performances in speech perception in noise did not vary from the hearing aid users’ performances. Within the collected data analyses showed that children with a CI show an equivalent performance on speech perception in quiet as children using hearing aids with a “moderate” hearing impairment.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis examines the state of audiovisual translation (AVT) in the aftermath of the COVID-19 emergency, highlighting new trends with regards to the implementation of AI technologies as well as their strengths, constraints, and ethical implications. It starts with an overview of the current AVT landscape, focusing on future projections about its evolution and its critical aspects such as the worsening working conditions lamented by AVT professionals – especially freelancers – in recent years and how they might be affected by the advent of AI technologies in the industry. The second chapter delves into the history and development of three AI technologies which are used in combination with neural machine translation in automatic AVT tools: automatic speech recognition, speech synthesis and deepfakes (voice cloning and visual deepfakes for lip syncing), including real examples of start-up companies that utilize them – or are planning to do so – to localize audiovisual content automatically or semi-automatically. The third chapter explores the many ethical concerns around these innovative technologies, which extend far beyond the field of translation; at the same time, it attempts to revindicate their potential to bring about immense progress in terms of accessibility and international cooperation, provided that their use is properly regulated. Lastly, the fourth chapter describes two experiments, testing the efficacy of the currently available tools for automatic subtitling and automatic dubbing respectively, in order to take a closer look at their perks and limitations compared to more traditional approaches. This analysis aims to help discerning legitimate concerns from unfounded speculations with regards to the AI technologies which are entering the field of AVT; the intention behind it is to humbly suggest a constructive and optimistic view of the technological transformations that appear to be underway, whilst also acknowledging their potential risks.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Throughout the years, technology has had an undeniable impact on the AVT field. It has revolutionized the way audiovisual content is consumed by allowing audiences to easily access it at any time and on any device. Especially after the introduction of OTT streaming platforms such as Netflix, Amazon Prime Video, Disney+, Apple TV+, and HBO Max, which offer a vast catalog of national and international products, the consumption of audiovisual products has been on a constant rise and, consequently, the demand for localized content too. In turn, the AVT industry resorts to new technologies and practices to handle the ever-growing workload and the faster turnaround times. Due to the numerous implications that it has on the industry, technological advancement can be considered an area of research of particular interest for the AVT studies. However, in the case of dubbing, research and discussion regarding the topic is lagging behind because of the more limited impact that technology has had on the very conservative dubbing industry. Therefore, the aim of the dissertation is to offer an overview of some of the latest technological innovations and practices that have already been implemented (i.e. cloud dubbing and DeepDub technology) or that are still under development and research (i.e. automatic speech recognition and respeaking, machine translation and post-editing, audio-based and visual-based dubbing techniques, text-based editing of talking-head videos, and automatic dubbing), and respectively discuss their reception by the industry professionals, and make assumptions about their future implementation in the dubbing field.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Il lavoro di tesi presentato è nato da una collaborazione con il Politecnico di Macao, i referenti sono: Prof. Rita Tse, Prof. Marcus Im e Prof. Su-Kit Tang. L'obiettivo consiste nella creazione di un modello di traduzione automatica italiano-cinese e nell'osservarne il comportamento, al fine di determinare se sia o meno possibile l'impresa. Il trattato approfondisce l'argomento noto come Neural Language Processing (NLP), rientrando dunque nell'ambito delle traduzioni automatiche. Sono servizi che, attraverso l'ausilio dell'intelligenza artificiale sono in grado di elaborare il linguaggio naturale, per poi interpretarlo e tradurlo. NLP è una branca dell'informatica che unisce: computer science, intelligenza artificiale e studio di lingue. Dal punto di vista della ricerca, le più grandi sfide in questo ambito coinvolgono: il riconoscimento vocale (speech-recognition), comprensione del testo (natural-language understanding) e infine la generazione automatica di testo (natural-language generation). Lo stato dell'arte attuale è stato definito dall'articolo "Attention is all you need" \cite{vaswani2017attention}, presentato nel 2017 a partire da una collaborazione di ricercatori della Cornell University.\\ I modelli di traduzione automatica più noti ed utilizzati al momento sono i Neural Machine Translators (NMT), ovvero modelli che attraverso le reti neurali artificiali profonde, sono in grado effettuare traduzioni o predizioni. La qualità delle traduzioni è particolarmente buona, tanto da arrivare quasi a raggiungere la qualità di una traduzione umana. Il lavoro infatti si concentrerà largamente sullo studio e utilizzo di NMT, allo scopo di proporre un modello funzionale e che sia in grado di performare al meglio nelle traduzioni da italiano a cinese e viceversa.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The recording and processing of voice data raises increasing privacy concerns for users and service providers. One way to address these issues is to move processing on the edge device closer to the recording so that potentially identifiable information is not transmitted over the internet. However, this is often not possible due to hardware limitations. An interesting alternative is the development of voice anonymization techniques that remove individual speakers characteristics while preserving linguistic and acoustic information in the data. In this work, a state-of-the-art approach to sequence-to-sequence speech conversion, ini- tially based on x-vectors and bottleneck features for automatic speech recognition, is explored to disentangle the two acoustic information using different pre-trained speech and speakers representation. Furthermore, different strategies for selecting target speech representations are analyzed. Results on public datasets in terms of equal error rate and word error rate show that good privacy is achieved with limited impact on converted speech quality relative to the original method.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper studies the relationship between consonant duration and recognition of these consanants by listeners with high frequency hearing loss.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Biometrics is one of the biggest tendencies in human identification. The fingerprint is the most widely used biometric. However considering the automatic fingerprint recognition a completely solved problem is a common mistake. The most popular and extensively used methods, the minutiae-based, do not perform well on poor-quality images and when just a small area of overlap between the template and the query images exists. The use of multibiometrics is considered one of the keys to overcome the weakness and improve the accuracy of biometrics systems. This paper presents the fusion of a minutiae-based and a ridge-based fingerprint recognition method at rank, decision and score level. The fusion techniques implemented leaded to a reduction of the Equal Error Rate by 31.78% (from 4.09% to 2.79%) and a decreasing of 6 positions in the rank to reach a Correct Retrieval (from rank 8 to 2) when assessed in the FVC2002-DB1A database. © 2008 IEEE.