914 resultados para robust speech recognition


Relevância:

80.00% 80.00%

Publicador:

Resumo:

ARAUJO, Márcio V. ; ALSINA, Pablo J. ; MEDEIROS, Adelardo A. D. ; PEREIRA, Jonathan P.P. ; DOMINGOS, Elber C. ; ARAÚJO, Fábio M.U. ; SILVA, Jáder S. . Development of an Active Orthosis Prototype for Lower Limbs. In: INTERNATIONAL CONGRESS OF MECHANICAL ENGINEERING, 20., 2009, Gramado, RS. Proceedings… Gramado, RS: [s. n.], 2009

Relevância:

80.00% 80.00%

Publicador:

Resumo:

While humans can easily segregate and track a speaker's voice in a loud noisy environment, most modern speech recognition systems still perform poorly in loud background noise. The computational principles behind auditory source segregation in humans is not yet fully understood. In this dissertation, we develop a computational model for source segregation inspired by auditory processing in the brain. To support the key principles behind the computational model, we conduct a series of electro-encephalography experiments using both simple tone-based stimuli and more natural speech stimulus. Most source segregation algorithms utilize some form of prior information about the target speaker or use more than one simultaneous recording of the noisy speech mixtures. Other methods develop models on the noise characteristics. Source segregation of simultaneous speech mixtures with a single microphone recording and no knowledge of the target speaker is still a challenge. Using the principle of temporal coherence, we develop a novel computational model that exploits the difference in the temporal evolution of features that belong to different sources to perform unsupervised monaural source segregation. While using no prior information about the target speaker, this method can gracefully incorporate knowledge about the target speaker to further enhance the segregation.Through a series of EEG experiments we collect neurological evidence to support the principle behind the model. Aside from its unusual structure and computational innovations, the proposed model provides testable hypotheses of the physiological mechanisms of the remarkable perceptual ability of humans to segregate acoustic sources, and of its psychophysical manifestations in navigating complex sensory environments. Results from EEG experiments provide further insights into the assumptions behind the model and provide motivation for future single unit studies that can provide more direct evidence for the principle of temporal coherence.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Memristori on yksi elektroniikan peruskomponenteista vastuksen, kondensaattorin ja kelan lisäksi. Se on passiivinen komponentti, jonka teorian kehitti Leon Chua vuonna 1971. Kesti kuitenkin yli kolmekymmentä vuotta ennen kuin teoria pystyttiin yhdistämään kokeellisiin tuloksiin. Vuonna 2008 Hewlett Packard julkaisi artikkelin, jossa he väittivät valmistaneensa ensimmäisen toimivan memristorin. Memristori eli muistivastus on resistiivinen komponentti, jonka vastusarvoa pystytään muuttamaan. Nimens mukaisesti memristori kykenee myös säilyttämään vastusarvonsa ilman jatkuvaa virtaa ja jännitettä. Tyypillisesti memristorilla on vähintään kaksi vastusarvoa, joista kumpikin pystytään valitsemaan syöttämällä komponentille jännitettä tai virtaa. Tämän vuoksi memristoreita kutsutaankin usein resistiivisiksi kytkimiksi. Resistiivisiä kytkimiä tutkitaan nykyään paljon erityisesti niiden mahdollistaman muistiteknologian takia. Resistiivisistä kytkimistä rakennettua muistia kutsutaan ReRAM-muistiksi (lyhenne sanoista resistive random access memory). ReRAM-muisti on Flash-muistin tapaan haihtumaton muisti, jota voidaan sähköisesti ohjelmoida tai tyhjentää. Flash-muistia käytetään tällä hetkellä esimerkiksi muistitikuissa. ReRAM-muisti mahdollistaa kuitenkin nopeamman ja vähävirtaiseman toiminnan Flashiin verrattuna, joten se on tulevaisuudessa varteenotettava kilpailija markkinoilla. ReRAM-muisti mahdollistaa myös useammin bitin tallentamisen yhteen muistisoluun binäärisen (”0” tai ”1”) toiminnan sijaan. Tyypillisesti ReRAM-muistisolulla on kaksi rajoittavaa vastusarvoa, mutta näiden kahden tilan välille pystytään mahdollisesti ohjelmoimaan useampia tiloja. Muistisoluja voidaan kutsua analogisiksi, jos tilojen määrää ei ole rajoitettu. Analogisilla muistisoluilla olisi mahdollista rakentaa tehokkaasti esimerkiksi neuroverkkoja. Neuroverkoilla pyritään mallintamaan aivojen toimintaa ja suorittamaan tehtäviä, jotka ovat tyypillisesti vaikeita perinteisille tietokoneohjelmille. Neuroverkkoja käytetään esimerkiksi puheentunnistuksessa tai tekoälytoteutuksissa. Tässä diplomityössä tarkastellaan Ta2O5 -perustuvan ReRAM-muistisolun analogista toimintaa pitäen mielessä soveltuvuus neuroverkkoihin. ReRAM-muistisolun valmistus ja mittaustulokset käydään läpi. Muistisolun toiminta on harvoin täysin analogista, koska kahden rajoittavan vastusarvon välillä on usein rajattu määrä tiloja. Tämän vuoksi toimintaa kutsutaan pseudoanalogiseksi. Mittaustulokset osoittavat, että yksittäinen ReRAM-muistisolu kykenee binääriseen toimintaan hyvin. Joiltain osin yksittäinen solu kykenee tallentamaan useampia tiloja, mutta vastusarvoissa on peräkkäisten ohjelmointisyklien välillä suurta vaihtelevuutta, joka hankaloittaa tulkintaa. Valmistettu ReRAM-muistisolu ei sellaisenaan kykene toimimaan pseudoanalogisena muistina, vaan se vaati rinnalleen virtaa rajoittavan komponentin. Myös valmistusprosessin kehittäminen vähentäisi yksittäisen solun toiminnassa esiintyvää varianssia, jolloin sen toiminta muistuttaisi enemmän pseudoanalogista muistia.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This work focuses in the formal and technical analysis of some aspects of a constructed language. As a first part of the work, a possible coding for the language will be studied, emphasizing the pre x coding, for which an extension of the Hu man algorithm from binary to n-ary will be implemented. Because of that in the language we can't know a priori the frequency of use of the words, a study will be done and several strategies will be proposed for an open words system, analyzing previously the existing number of words in current natural languages. As a possible upgrade of the coding, we'll take also a look to the synchronization loss problem, as well as to its solution: the self-synchronization, a t-codes study with the number of possible words for the language, as well as other alternatives. Finally, and from a less formal approach, several applications for the language have been developed: A voice synthesizer, a speech recognition system and a system font for the use of the language in text processors. For each of these applications, the process used for its construction, as well as the problems encountered and still to solve in each will be detailed.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Early intervention is the key to spoken language for hearing impaired children. A severe hearing loss diagnosis in young children raises the urgent question on the optimal type of hearing aid device. As there is no recent data on comparing selection criteria for a specific hearing aid device, the goal of the Hearing Evaluation of Auditory Rehabilitation Devices (hEARd) project (Coninx & Vermeulen, 2012) evolved to collect and analyze interlingually comparable normative data on the speech perception performances of children with hearing aids and children with cochlear implants (CI). METHOD: In various institutions for hearing rehabilitation in Belgium, Germany and the Netherlands the Adaptive Auditory Speech Test AAST was used in the hEARd project, to determine speech perception abilities in kindergarten and school aged hearing impaired children. Results in the speech audiometric procedures were matched to the unaided hearing loss values of children using hearing aids and compared to results of children using CI. 277 data sets of hearing impaired children were analyzed. Results of children using hearing aids were summarized in groups as to their unaided hearing loss values. The grouping was related to the World Health Organization’s (WHO) grading of hearing impairment from mild (25–40 dB HL) to moderate (41–60 dB HL), severe (61-80 dB HL) and profound hearing impairment (80 dB HL and higher). RESULTS: AAST speech recognition results in quiet showed a significantly better performance for the CI group in comparison to the group of profoundly impaired hearing aid users as well as the group of severely impaired hearing aid users. However the CI users’ performances in speech perception in noise did not vary from the hearing aid users’ performances. Within the collected data analyses showed that children with a CI show an equivalent performance on speech perception in quiet as children using hearing aids with a “moderate” hearing impairment.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

This paper investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. It has been previously shown in our own work, and in the work of others, that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features. We demonstrate that the fusion of lip and speech information allows for a highly robust speaker verification system which outperforms the performance of either sub-system. We present a new technique for determining the weighting to be applied to each modality so as to optimize the performance of the fused system. Given a correct weighting, lip information is shown to be highly effective for reducing the false acceptance and false rejection error rates in the presence of background noise

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. We have previously shown (Int. Conf. on Acoustics, Speech and Signal Proc., vol. 6, pp. 3693-3696, May 1998) that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features. We demonstrate that the fusion of lip and speech information allows for a highly robust speaker verification system which outperforms either subsystem individually. We present a new technique for determining the weighting to be applied to each modality so as to optimize the performance of the fused system. Given a correct weighting, lip information is shown to be highly effective for reducing the false acceptance and false rejection error rates in the presence of background noise

Relevância:

50.00% 50.00%

Publicador:

Resumo:

This paper presents a novel method of audio-visual fusion for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowledge about the corruption. Furthermore, we assume there is a limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new representation and a modified cosine similarity are introduced for combining and comparing bimodal features with limited training data as well as vastly differing data rates and feature sizes. Optimal feature selection and multicondition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Experiments have been carried out on a bimodal data set created from the SPIDRE and AR databases with variable noise corruption of speech and occlusion in the face images. The new method has demonstrated improved recognition accuracy.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The paper presents a fast and robust stereo object recognition method. The method is currently unable to identify the rotation of objects. This makes it very good at locating spheres which are rotationally independent. Approximate methods for located non-spherical objects have been developed. Fundamental to the method is that the correspondence problem is solved using information about the dimensions of the object being located. This is in contrast to previous stereo object recognition systems where the scene is first reconstructed by point matching techniques. The method is suitable for real-time application on low-power devices.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Occlusion is a big challenge for facial expression recognition (FER) in real-world situations. Previous FER efforts to address occlusion suffer from loss of appearance features and are largely limited to a few occlusion types and single testing strategy. This paper presents a robust approach for FER in occluded images and addresses these issues. A set of Gabor based templates is extracted from images in the gallery using a Monte Carlo algorithm. These templates are converted into distance features using template matching. The resulting feature vectors are robust to occlusion. Occluded eyes and mouth regions and randomly places occlusion patches are used for testing. Two testing strategies analyze the effects of these occlusions on the overall recognition performance as well as each facial expression. Experimental results on the Cohn-Kanade database confirm the high robustness of our approach and provide useful insights about the effects of occlusion on FER. Performance is also compared with previous approaches.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Less cooperative iris identification systems at a distance and on the move often suffers from poor resolution. The lack of pixel resolution significantly degrades the iris recognition performance. Super-resolution has been considered to enhance resolution of iris images. This paper proposes a pixelwise super-resolution technique to reconstruct a high resolution iris image from a video sequence of an eye. A novel fusion approach is proposed to incorporate information details from multiple frames using robust mean. Experiments on the MBGC NIR portal database show the validity of the proposed approach in comparison with other resolution enhancement techniques.