904 resultados para Audio-Visual Automatic Speech Recognition
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
In the present days it is critical to identify the factors that contribute to the quality of the audiologic care provided. The hearing aid fitting model proposed by the Brazilian Unified Health System (SUS) implies multidisciplinary care. This leads to some relevant and current questions. OBJECTIVE: To evaluate and compare the results of the hearing aid fitting model proposed by the SUS with a more compact and streamlined care. METHOD: We conducted a prospective longitudinal study with 174 participants randomly assigned to two groups: SUS Group and Streamline Group. For both groups we assessed key areas related to hearing aid fitting through the International Outcome Inventory for Hearing Aids (IOI-HA) questionnaire, in addition to evaluating the results of Speech Recognition Index (SRI) 3 and 9 months after fitting. RESULTS: Both groups had the same improvement related to the speech recognition after nine months of AASI use, and the IOI-HA didn't show any statically significant difference on three and nine months. CONCLUSION: The two strategies of care did not differ, from the clinical point of view, as regards the hearing aid fitting results obtained upon the evaluation of patients in the short and medium term, thus changes in the current model of care should be considered.
Resumo:
[EN]Perceptual User Interfaces (PUIs) aim at facilitating human-computer interaction with the aid of human-like capacities (computer vision, speech recognition, etc.). In PUIs, the human face is a central element, since it conveys not only identity but also other important information, particularly with respect to the user’s mood or emotional state. This paper describes both a face detector and a smile detector for PUIs. Both are suitable for real-time interaction.
Resumo:
OBJECTIVE To evaluate the speech intelligibility in noise with a new cochlear implant (CI) processor that uses a pinna effect imitating directional microphone system. STUDY DESIGN Prospective experimental study. SETTING Tertiary referral center. PATIENTS Ten experienced, unilateral CI recipients with bilateral severe-to-profound hearing loss. INTERVENTION All participants performed speech in noise tests with the Opus 2 processor (omnidirectional microphone mode only) and the newer Sonnet processor (omnidirectional and directional microphone mode). MAIN OUTCOME MEASURE The speech reception threshold (SRT) in noise was measured in four spatial settings. The test sentences were always presented from the front. The noise was arriving either from the front (S0N0), the ipsilateral side of the CI (S0NIL), the contralateral side of the CI (S0NCL), or the back (S0N180). RESULTS The directional mode improved the SRTs by 3.6 dB (p < 0.01), 2.2 dB (p < 0.01), and 1.3 dB (p < 0.05) in the S0N180, S0NIL, and S0NCL situations, when compared with the Sonnet in the omnidirectional mode. There was no statistically significant difference in the S0N0 situation. No differences between the Opus 2 and the Sonnet in the omnidirectional mode were observed. CONCLUSION Speech intelligibility with the Sonnet system was statistically different to speech recognition with the Opus 2 system suggesting that CI users might profit from the pinna effect imitating directionality mode in noisy environments.
Resumo:
Three rhesus monkeys (Macaca mulatta) and four pigeons (Columba livia) were trained in a visual serial probe recognition (SPR) task. A list of visual stimuli (slides) was presented sequentially to the subjects. Following the list and after a delay interval, a probe stimulus was presented that could be either from the list (Same) or not from the list (Different). The monkeys readily acquired a variable list length SPR task, while pigeons showed acquisition only under constant list length condition. However, monkeys memorized the responses to the probes (absolute strategy) when overtrained with the same lists and probes, while pigeons compared the probe to the list in memory (relational strategy). Performance of the pigeon on 4-items constant list length was disrupted when blocks of trials of different list lengths were imbedded between the 4-items blocks. Serial position curves for recognition at variable probe delays showed better relative performance on the last items of the list at short delays (0-0.5 seconds) and better relative performance on the initial items of the list at long delays (6-10 seconds for the pigeons and 20-30 seconds for the monkeys and a human adolescent). The serial position curves also showed reliable primacy and recency effects at intermediate probe delays. The monkeys showed evidence of using a relational strategy in the variable probe delay task. The results are the first demonstration of relational serial probe recognition performance in an avian and suggest similar underlying dynamic recognition memory mechanisms in primates and avians. ^
Resumo:
Este trabajo deviene de un trabajo de corte etnográfico virtual, que pretendió realizar un análisis discursivo de las imágenes exhibidas en la red social Facebook por jóvenes universitarios. Aquellas imágenes donde el cuerpo es objetivado en clave virtual, a través de distintos géneros. Una vez decodificadas estas imágenes se reconocieron narraciones de vida, signos y códigos que ponen en la escena discursos elaborados por los sujetos universitarios, a partir de su estadía en la moderna estructura social de consumo. Con base en ello, se evidencia la necesidad de alfabetizar a los sujetos en el lenguaje audiovisual, pretendiendo formar consumidores críticos
Resumo:
Este trabajo deviene de un trabajo de corte etnográfico virtual, que pretendió realizar un análisis discursivo de las imágenes exhibidas en la red social Facebook por jóvenes universitarios. Aquellas imágenes donde el cuerpo es objetivado en clave virtual, a través de distintos géneros. Una vez decodificadas estas imágenes se reconocieron narraciones de vida, signos y códigos que ponen en la escena discursos elaborados por los sujetos universitarios, a partir de su estadía en la moderna estructura social de consumo. Con base en ello, se evidencia la necesidad de alfabetizar a los sujetos en el lenguaje audiovisual, pretendiendo formar consumidores críticos
Resumo:
Este trabajo deviene de un trabajo de corte etnográfico virtual, que pretendió realizar un análisis discursivo de las imágenes exhibidas en la red social Facebook por jóvenes universitarios. Aquellas imágenes donde el cuerpo es objetivado en clave virtual, a través de distintos géneros. Una vez decodificadas estas imágenes se reconocieron narraciones de vida, signos y códigos que ponen en la escena discursos elaborados por los sujetos universitarios, a partir de su estadía en la moderna estructura social de consumo. Con base en ello, se evidencia la necesidad de alfabetizar a los sujetos en el lenguaje audiovisual, pretendiendo formar consumidores críticos
Resumo:
In the last two decades, there has been an important increase in research on speech technology in Spain, mainly due to a higher level of funding from European, Spanish and local institutions and also due to a growing interest in these technologies for developing new services and applications. This paper provides a review of the main areas of speech technology addressed by research groups in Spain, their main contributions in the recent years and the main focus of interest these days. This description is classified in five main areas: audio processing including speech, speaker characterization, speech and language processing, text to speech conversion and spoken language applications. This paper also introduces the Spanish Network of Speech Technologies (RTTH. Red Temática en Tecnologías del Habla) as the research network that includes almost all the researchers working in this area, presenting some figures, its objectives and its main activities developed in the last years.
Resumo:
Las técnicas SAR (Synthetic Aperture Radar, radar de apertura sintética) e ISAR (Inverse SAR, SAR inverso) son sistemas radar coherentes de alta resolución, capaces de proporcionar un mapa de la sección radar del blanco en el dominio espacial de distancia y acimut. El objetivo de ambas técnicas radica en conseguir una resolución acimutal más fina generando una apertura sintética a partir del movimiento relativo entre radar y blanco. Los radares imagen complementan la labor de los sistemas ópticos e infrarrojos convencionales, especialmente en condiciones meteorológicas adversas. Los sistemas SAR e ISAR convencionales se diseñan para iluminar blancos en situaciones de línea de vista entre sensor y blanco. Por este motivo, presentan un menor rendimiento en escenarios complejos, como por ejemplo en bosques o entornos urbanos, donde los retornos multitrayecto se superponen a los ecos directos procedentes de los blancos. Se conocen como "imágenes fantasma", puesto que enmascaran a los verdaderos blancos y dan lugar a una calidad visual pobre, complicando en gran medida la detección del blanco. El problema de la mitigación del multitrayecto en imágenes radar adquiere una relevancia teórica y práctica. En esta Tesis Doctoral, se hace uso del concepto de inversión temporal (Time Reversal, TR) para mejorar la calidad visual de las imágenes SAR e ISAR eliminando las "imágenes fantasma" originadas por la propagación multitrayecto (algoritmos TR-SAR y TR-ISAR, respectivamente). No obstante, previamente a la aplicación de estas innovadoras técnicas de mitigación del multi-trayecto, es necesario resolver el problema geométrico asociado al multitrayecto. Centrando la atención en la mejora de las prestaciones de TR-ISAR, se implementan una serie de técnicas de procesado de señal avanzadas antes y después de la etapa basada en inversión temporal (el eje central de esta Tesis). Las primeras (técnicas de pre-procesado) están relacionadas con el multilook averaging, las transformadas tiempo-frecuencia y la transformada de Radon, mientras que las segundas (técnicas de post-procesado) se componen de un conjunto de algoritmos de superresolución. En pocas palabras, todas ellas pueden verse como un valor añadido al concepto de TR, en lugar de ser consideradas como técnicas independientes. En resumen, la utilización del algoritmo diseñado basado en inversión temporal, junto con algunas de las técnicas de procesado de señal propuestas, no deben obviarse si se desean obtener imágenes ISAR de gran calidad en escenarios con mucho multitrayecto. De hecho, las imágenes resultantes pueden ser útiles para posteriores esquemas de reconocimiento automático de blancos (Automatic Target Recognition, ATR). Como prueba de concepto, se hace uso tanto de datos simulados como experimentales obtenidos a partir de radares de alta resolución con el fin de verificar los métodos propuestos.
Resumo:
This paper presents a methodology for adapting an advanced communication system for deaf people in a new domain. This methodology is a user-centered design approach consisting of four main steps: requirement analysis, parallel corpus generation, technology adaptation to the new domain, and finally, system evaluation. In this paper, the new considered domain has been the dialogues in a hotel reception. With this methodology, it was possible to develop the system in a few months, obtaining very good performance: good speech recognition and translation rates (around 90%) with small processing times.
Resumo:
Nonlinear analysis tools for studying and characterizing the dynamics of physiological signals have gained popularity, mainly because tracking sudden alterations of the inherent complexity of biological processes might be an indicator of altered physiological states. Typically, in order to perform an analysis with such tools, the physiological variables that describe the biological process under study are used to reconstruct the underlying dynamics of the biological processes. For that goal, a procedure called time-delay or uniform embedding is usually employed. Nonetheless, there is evidence of its inability for dealing with non-stationary signals, as those recorded from many physiological processes. To handle with such a drawback, this paper evaluates the utility of non-conventional time series reconstruction procedures based on non uniform embedding, applying them to automatic pattern recognition tasks. The paper compares a state of the art non uniform approach with a novel scheme which fuses embedding and feature selection at once, searching for better reconstructions of the dynamics of the system. Moreover, results are also compared with two classic uniform embedding techniques. Thus, the goal is comparing uniform and non uniform reconstruction techniques, including the one proposed in this work, for pattern recognition in biomedical signal processing tasks. Once the state space is reconstructed, the scheme followed characterizes with three classic nonlinear dynamic features (Largest Lyapunov Exponent, Correlation Dimension and Recurrence Period Density Entropy), while classification is carried out by means of a simple k-nn classifier. In order to test its generalization capabilities, the approach was tested with three different physiological databases (Speech Pathologies, Epilepsy and Heart Murmurs). In terms of the accuracy obtained to automatically detect the presence of pathologies, and for the three types of biosignals analyzed, the non uniform techniques used in this work lightly outperformed the results obtained using the uniform methods, suggesting their usefulness to characterize non-stationary biomedical signals in pattern recognition applications. On the other hand, in view of the results obtained and its low computational load, the proposed technique suggests its applicability for the applications under study.
Resumo:
The aim of this Master Thesis is the analysis, design and development of a robust and reliable Human-Computer Interaction interface, based on visual hand-gesture recognition. The implementation of the required functions is oriented to the simulation of a classical hardware interaction device: the mouse, by recognizing a specific hand-gesture vocabulary in color video sequences. For this purpose, a prototype of a hand-gesture recognition system has been designed and implemented, which is composed of three stages: detection, tracking and recognition. This system is based on machine learning methods and pattern recognition techniques, which have been integrated together with other image processing approaches to get a high recognition accuracy and a low computational cost. Regarding pattern recongition techniques, several algorithms and strategies have been designed and implemented, which are applicable to color images and video sequences. The design of these algorithms has the purpose of extracting spatial and spatio-temporal features from static and dynamic hand gestures, in order to identify them in a robust and reliable way. Finally, a visual database containing the necessary vocabulary of gestures for interacting with the computer has been created.
Resumo:
Human Activity Recognition (HAR) is an emerging research field with the aim to identify the actions carried out by a person given a set of observations and the surrounding environment. The wide growth in this research field inside the scientific community is mainly explained by the high number of applications that are arising in the last years. A great part of the most promising applications are related to the healthcare field, where it is possible to track the mobility of patients with motor dysfunction as also the physical activity in patients with cardiovascular risk. Until a few years ago, by using distinct kind of sensors, a patient follow-up was possible. However, far from being a long-term solution and with the smartphone irruption, that monitoring can be achieved in a non-invasive way by using the embedded smartphone’s sensors. For these reasons this Final Degree Project arises with the main target to evaluate new feature extraction techniques in order to carry out an activity and user recognition, and also an activity segmentation. The recognition is done thanks to the inertial signals integration obtained by two widespread sensors in the greater part of smartphones: accelerometer and gyroscope. In particular, six different activities are evaluated walking, walking-upstairs, walking-downstairs, sitting, standing and lying. Furthermore, a segmentation task is carried out taking into account the activities performed by thirty users. This can be done by using Hidden Markov Models and also a set of tools tested satisfactory in speech recognition: HTK (Hidden Markov Model Toolkit).