873 resultados para Audio-visual Speech Recognition, Visual Feature Extraction, Free-parts, Monolithic, ROI


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação apresentada para obtenção do grau de Mestre em Educação Matemática na Educação Pré-Escolar e nos 1º e 2º Ciclos do Ensino Básico na especialidade de Didática da Matemática

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Engenharia Informática

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial Para obtenção do grau de Mestre em Engenharia Informática

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Human Activity Recognition systems require objective and reliable methods that can be used in the daily routine and must offer consistent results according with the performed activities. These systems are under development and offer objective and personalized support for several applications such as the healthcare area. This thesis aims to create a framework for human activities recognition based on accelerometry signals. Some new features and techniques inspired in the audio recognition methodology are introduced in this work, namely Log Scale Power Bandwidth and the Markov Models application. The Forward Feature Selection was adopted as the feature selection algorithm in order to improve the clustering performances and limit the computational demands. This method selects the most suitable set of features for activities recognition in accelerometry from a 423th dimensional feature vector. Several Machine Learning algorithms were applied to the used accelerometry databases – FCHA and PAMAP databases - and these showed promising results in activities recognition. The developed algorithm set constitutes a mighty contribution for the development of reliable evaluation methods of movement disorders for diagnosis and treatment applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study is part of an ongoing collaborative effort between the medical and the signal processing communities to promote research on applying standard Automatic Speech Recognition (ASR) techniques for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based detection could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we describe an acoustic search for distinctive apnoea voice characteristics. We also study abnormal nasalization in OSA patients by modelling vowels in nasal and nonnasal phonetic contexts using Gaussian Mixture Model (GMM) pattern recognition on speech spectra. Finally, we present experimental findings regarding the discriminative power of GMMs applied to severe apnoea detection. We have achieved an 81% correct classification rate, which is very promising and underpins the interest in this line of inquiry.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It has been demonstrated in earlier studies that patients with a cochlear implant have increased abilities for audio-visual integration because the crude information transmitted by the cochlear implant requires the persistent use of the complementary speech information from the visual channel. The brain network for these abilities needs to be clarified. We used an independent components analysis (ICA) of the activation (H2 (15) O) positron emission tomography data to explore occipito-temporal brain activity in post-lingually deaf patients with unilaterally implanted cochlear implants at several months post-implantation (T1), shortly after implantation (T0) and in normal hearing controls. In between-group analysis, patients at T1 had greater blood flow in the left middle temporal cortex as compared with T0 and normal hearing controls. In within-group analysis, patients at T0 had a task-related ICA component in the visual cortex, and patients at T1 had one task-related ICA component in the left middle temporal cortex and the other in the visual cortex. The time courses of temporal and visual activities during the positron emission tomography examination at T1 were highly correlated, meaning that synchronized integrative activity occurred. The greater involvement of the visual cortex and its close coupling with the temporal cortex at T1 confirm the importance of audio-visual integration in more experienced cochlear implant subjects at the cortical level.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

ABSTRACT This thesis is composed of two main parts. The first addressed the question of whether the auditory and somatosensory systems, like their visual counterpart, comprise parallel functional pathways for processing identity and spatial attributes (so-called `what' and `where' pathways, respectively). The second part examined the independence of control processes mediating task switching across 'what' and `where' pathways in the auditory and visual modalities. Concerning the first part, electrical neuroimaging of event-related potentials identified the spatio-temporal mechanisms subserving auditory (see Appendix, Study n°1) and vibrotactile (see Appendix, Study n°2) processing during two types of blocks of trials. `What' blocks varied stimuli in their frequency independently of their location.. `Where' blocks varied the same stimuli in their location independently of their frequency. Concerning the second part (see Appendix, Study n°3), a psychophysical task-switching paradigm was used to investigate the hypothesis that the efficacy of control processes depends on the extent of overlap between the neural circuitry mediating the different tasks at hand, such that more effective task preparation (and by extension smaller switch costs) is achieved when the anatomical/functional overlap of this circuitry is small. Performance costs associated with switching tasks and/or switching sensory modalities were measured. Tasks required the analysis of either the identity or spatial location of environmental objects (`what' and `where' tasks, respectively) that were presented either visually or acoustically on any given trial. Pretrial cues informed participants of the upcoming task, but not of the sensory modality. - In the audio-visual domain, the results showed that switch costs between tasks were significantly smaller when the sensory modality of the task switched versus when it repeated. In addition, switch costs between the senses were correlated only when the sensory modality of the task repeated across trials and not when it switched. The collective evidence not only supports the independence of control processes mediating task switching and modality switching, but also the hypothesis that switch costs reflect competitive interterence between neural circuits that in turn can be diminished when these neural circuits are distinct. - In the auditory and somatosensory domains, the findings show that a segregation of location vs. recognition information is observed across sensory systems and that these happen around 100ms for both sensory modalities. - Also, our results show that functionally specialized pathways for audition and somatosensation involve largely overlapping brain regions, i.e. posterior superior and middle temporal cortices and inferior parietal areas. Both these properties (synchrony of differential processing and overlapping brain regions) probably optimize the relationships across sensory modalities. - Therefore, these results may be indicative of a computationally advantageous organization for processing spatial anal identity information.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work tries to identify some of the skills an audio visual translator must develop, from a practical point of view, in order to pursue a career in this field, putting the stress on mastering subtitling-specific software. This report describes trial and error process during the making of the subtitles for a documentary and identifies some of the difficulties we might encounter while working on an assignment of this kind if we work with free licensing software. Moreover, it tries to contribute with some answers to these issues.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Este documento es una introducción a las herramientas Dragon Naturally Speaking y Audacity, especializadas en optimizar la transcripción de archivos sonoros.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This Master's thesis addresses the design and implementation of the optical character recognition (OCR) system for a mobile device working on the Symbian operating system. The developed OCR system, named OCRCapriccio, emphasizes the modularity, effective extensibility and reuse. The system consists of two parts which are the graphical user interface and the OCR engine that was implemented as a plug-in. In fact, the plug-in includes two implementations of the OCR engine for enabling two types of recognition: the bitmap comparison based recognition and statistical recognition. The implementation results have shown that the approach based on bitmap comparison is more suitable for the Symbian environment because of its nature. Although the current implementation of bitmap comparison is lacking in accuracy, further development should be done in its direction. The biggest challenges of this work were related to developing an OCR scheme that would be suitable for Symbian OS Smartphones that have limited computational power and restricted resources.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Este trabajo pretende identificar algunas de las habilidades que un traductor audiovisual debe desarrollar, desde un punto de vista práctico, para ejercer la profesión, haciendo hincapié en el dominio del software específico para subtituladores. Esta memoria describe el proceso de ensayo y error llevado a cabo durante la elaboración de los subtítulos de un documental e identifica algunas de las dificultades con las que podemos encontrarnos al realizar un encargo de este tipo si trabajamos con programas de licencia gratuita, además de intentar aportar las soluciones correspondientes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work, image based estimation methods, also known as direct methods, are studied which avoid feature extraction and matching completely. Cost functions use raw pixels as measurements and the goal is to produce precise 3D pose and structure estimates. The cost functions presented minimize the sensor error, because measurements are not transformed or modified. In photometric camera pose estimation, 3D rotation and translation parameters are estimated by minimizing a sequence of image based cost functions, which are non-linear due to perspective projection and lens distortion. In image based structure refinement, on the other hand, 3D structure is refined using a number of additional views and an image based cost metric. Image based estimation methods are particularly useful in conditions where the Lambertian assumption holds, and the 3D points have constant color despite viewing angle. The goal is to improve image based estimation methods, and to produce computationally efficient methods which can be accomodated into real-time applications. The developed image-based 3D pose and structure estimation methods are finally demonstrated in practise in indoor 3D reconstruction use, and in a live augmented reality application.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cette étude se penche sur le geste singulier se dégageant de l’œuvre du cinéaste sénégalais Djibril Diop Mambety. Une force de « mise en présence » y est identifiée, dont la présente recherche démontre qu’elle s’apparente à l’action médiatrice du griot des traditions orales d’Afrique de l’Ouest. Singulièrement, cette force tenant de l’oralité ne repose pas sur le récit ou la parole comme discours, mais relève au contraire de ruptures narratives et de disjonctions image-son qui mettent le récit en question, invitant le spectateur à fréquemment réviser son interprétation de ce qu’il voit et entend. C’est le film lui-même qui devient alors griot, actualisant un lien en constante transformation entre l’univers qu’il porte et son spectateur. En instaurant un rapport critique à l’égard du monde dans lequel s’inscrit le récit, les multiples ruptures dans le cinéma de Mambety sont également les brèches par lesquelles se crée un espace d’accueil pour la marginalité, qui habite tous ses films. La tradition orale et le griot sont présentés en premier lieu, de manière à poser les bases à partir desquelles peut se développer la réflexion. La description et l’analyse des films Parlons Grand-mère et Le franc démontrent en quoi ceux-ci sont des films médiateurs, qui se comportent en griots. Cette découverte ouvre la voie à une réflexion plus large sur la médiation au cinéma, où la portée éthique du film-médiateur est explorée, ainsi que la nature des relations possibles entre médiation et récit. Finalement, l’analyse du film Hyènes, eu égard à la différence qu’il présente en déployant un récit plus linéaire, est l’occasion d’approfondir une compréhension à la fois de ce que font les films de Mambety et de ce que peut la médiation au cinéma de façon plus large.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

On-line handwriting recognition has been a frontier area of research for the last few decades under the purview of pattern recognition. Word processing turns to be a vexing experience even if it is with the assistance of an alphanumeric keyboard in Indian languages. A natural solution for this problem is offered through online character recognition. There is abundant literature on the handwriting recognition of western, Chinese and Japanese scripts, but there are very few related to the recognition of Indic script such as Malayalam. This paper presents an efficient Online Handwritten character Recognition System for Malayalam Characters (OHR-M) using K-NN algorithm. It would help in recognizing Malayalam text entered using pen-like devices. A novel feature extraction method, a combination of time domain features and dynamic representation of writing direction along with its curvature is used for recognizing Malayalam characters. This writer independent system gives an excellent accuracy of 98.125% with recognition time of 15-30 milliseconds

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents the application of wavelet processing in the domain of handwritten character recognition. To attain high recognition rate, robust feature extractors and powerful classifiers that are invariant to degree of variability of human writing are needed. The proposed scheme consists of two stages: a feature extraction stage, which is based on Haar wavelet transform and a classification stage that uses support vector machine classifier. Experimental results show that the proposed method is effective