847 resultados para multimodal biometrics
Patrimonio e Identidad en la Investigación Educativa Basada en las Artes desde un Enfoque Multimodal
Resumo:
Este artículo describe dos experiencias de investigación de nuestro grupo interconectadas, la primera desarrollada durante el año 2007 a través del proyecto internacional CALVINO del Programa Cultura 2000 de la Unión Europea, y la segunda implementada durante el año 2014 en el marco del Proyecto Investigación e Innovación en Secundaria en Andalucía (PIIISA). Ambos proyectos tienen en común el eje temático de la identidad a partir de una idea de patrimonio y el hecho de haber puesto en práctica metodologías de investigación basadas en las artes visuales con un enfoque multimodal. Desde estos dos puntos de anclaje relativos a la temática (qué) y a la metodología (cómo) analizamos lo acontecido para obtener conclusiones relevantes que, por una parte, pongan en valor estas prácticas significativas y, por otra, aporten nuestra experiencia para futuras propuestas de investigación en el ámbito temático y/o metodológico.
Resumo:
For the first time in this paper the authors present results showing the effect of out of plane speaker head pose variation on a lip biometric based speaker verification system. Using appearance DCT based features, they adopt a Mutual Information analysis technique to highlight the class discriminant DCT components most robust to changes in out of plane pose. Experiments are conducted using the initial phase of a new multi view Audio-Visual database designed for research and development of pose-invariant speech and speaker recognition. They show that verification performance can be improved by substituting higher order horizontal DCT components for vertical, particularly in the case of a train/test pose angle mismatch.
Resumo:
In this paper, a novel video-based multimodal biometric verification scheme using the subspace-based low-level feature fusion of face and speech is developed for specific speaker recognition for perceptual human--computer interaction (HCI). In the proposed scheme, human face is tracked and face pose is estimated to weight the detected facelike regions in successive frames, where ill-posed faces and false-positive detections are assigned with lower credit to enhance the accuracy. In the audio modality, mel-frequency cepstral coefficients are extracted for voice-based biometric verification. In the fusion step, features from both modalities are projected into nonlinear Laplacian Eigenmap subspace for multimodal speaker recognition and combined at low level. The proposed approach is tested on the video database of ten human subjects, and the results show that the proposed scheme can attain better accuracy in comparison with the conventional multimodal fusion using latent semantic analysis as well as the single-modality verifications. The experiment on MATLAB shows the potential of the proposed scheme to attain the real-time performance for perceptual HCI applications.
Resumo:
SEMAINE has created a large audiovisual database as a part of an iterative approach to building Sensitive Artificial Listener (SAL) agents that can engage a person in a sustained, emotionally colored conversation. Data used to build the agents came from interactions between users and an operator simulating a SAL agent, in different configurations: Solid SAL (designed so that operators displayed an appropriate nonverbal behavior) and Semi-automatic SAL (designed so that users' experience approximated interacting with a machine). We then recorded user interactions with the developed system, Automatic SAL, comparing the most communicatively competent version to versions with reduced nonverbal skills. High quality recording was provided by five high-resolution, high-framerate cameras, and four microphones, recorded synchronously. Recordings total 150 participants, for a total of 959 conversations with individual SAL characters, lasting approximately 5 minutes each. Solid SAL recordings are transcribed and extensively annotated: 6-8 raters per clip traced five affective dimensions and 27 associated categories. Other scenarios are labeled on the same pattern, but less fully. Additional information includes FACS annotation on selected extracts, identification of laughs, nods, and shakes, and measures of user engagement with the automatic system. The material is available through a web-accessible database. © 2010-2012 IEEE.
Resumo:
This paper presents a novel method of audio-visual feature-level fusion for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowledge about the corruption. Furthermore, we assume there are limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new multimodal feature representation and a modified cosine similarity are introduced to combine and compare bimodal features with limited training data, as well as vastly differing data rates and feature sizes. Optimal feature selection and multicondition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Experiments have been carried out on a bimodal dataset created from the SPIDRE speaker recognition database and AR face recognition database with variable noise corruption of speech and occlusion in the face images. The system's speaker identification performance on the SPIDRE database, and facial identification performance on the AR database, is comparable with the literature. Combining both modalities using the new method of multimodal fusion leads to significantly improved accuracy over the unimodal systems, even when both modalities have been corrupted. The new method also shows improved identification accuracy compared with the bimodal systems based on multicondition model training or missing-feature decoding alone.
Resumo:
Structural and functional change in the microcirculation in type 1 diabetes mellitus predicts future end-organ damage and macrovascular events. We explored the utility of novel signal processing techniques to detect and track change in ocular hemodynamics in patients with this disease. 24 patients with uncomplicated type 1 diabetes mellitus, and 18 age-and-sex matched control subjects were studied. Doppler ultrasound was used to interrogate the carotid and ophthalmic arteries and digital photography to image the retinal vasculature. Frequency analysis algorithms were applied to quantify velocity waveform structure and retinal photographic data at baseline and following inhalation of 100% oxygen. Frequency data was compared between groups. No significant differences were found in the resistive index between groups at baseline or following inhaled oxygen. Frequency analysis of the Doppler flow velocity waveforms identified significant differences in bands 3-7 between patients and controls in data captured from the ophthalmic artery (p<0.01 for each band). In response to inhaled oxygen, changes in the frequency band amplitudes were significantly greater in control subjects compared with patients (p<0.05). Only control subjects demonstrated a positive correlation (R=0.61) between change in retinal vessel diameter and frequency band amplitudes derived from ophthalmic artery waveform data. The use of multimodal signal processing techniques applied to Doppler flow velocity waveforms and retinal photographic data identified preclinical change in the ocular microcirculation in patients with uncomplicated diabetes mellitus. An impaired autoregulatory response of the retinal microvasculature may contribute to the future development of retinopathy in such patients.
Resumo:
This paper presents a multimodal analysis of online self-representations of the Elite Squad of the military police of Rio de Janeiro, the Special Police Operations Battalion BOPE. The analysis is placed within the wider context of a “new military urbanism”, which is evidenced in the ongoing “Pacification” of many of the city’s favelas, in which BOPE plays an active interventionist as well as a symbolic role, and is a kind of solution which clearly fails to address the root causes of violence which lie in poverty and social inequality. The paper first provides a sociocultural account of BOPE’s role in Rio’s public security and then looks at some of the mainly visual mediated discourses the Squad employs in constructing a public image of itself as a modern and efficient, yet at the same time “magical” police force.
Resumo:
Critical Discourse Analysis (CDA) has probably made the most comprehensive attempt to develop a theory of the inter-connectedness of discourse, power and ideology and is specifically concerned with the role that discourse plays in main-taining and legitimizing inequality in society. While CDA’s general thrust has been towards the analysis of linguistic structures, some critical discourse analysts have begun to focus on multimodal discourses because of the increasingly impor-tant role these play in many social and political contexts. Still, a great deal of CDA analysis has remained largely monomodal. The principal aim of this chapter is therefore to address this situation and demonstrate in what ways CDA can be deployed to analyse the ways that ideological discourses can be communicated, naturalised and legitimated beyond the linguistic level. The chapter also offers a rationale for a multimodal approach based on Halliday’s Systemic Functional Linguistics (SFL), by which it is directly informed
Resumo:
Introduction: Previous research has suggested that visual images are more easily generated, more vivid and more memorable than other sensory modalities. This research examined whether or not imagery is experienced in similar ways by people with and without sight. Specifically, the imabeability of visual, auditory and tactile cue words was compared. The degree to which images were multimodal or unimodal was also examined. Method: Twelve participants totally blind from early infancy and 12 sighted participants generated images in response to 53 sensory and non sensory words, rating imageability and the sensory modality, and describing images. From these 53 items, 4 subgroups of words, which stimulated images that were predominantly visual, tactile, auditory and low-imagery, respectively, were created. Results: T-tests comparing imageability ratings from blind and sighted participants found no differences for auditory and tactile words (both p>.1). Nevertheless, whilst participants without sight found auditory and tactile images equally imageable, sighted participants found images in response to tactile cue words harder to generate than visual cue words (mean difference: -0.51, p=.025). Participants with sight were also more likely to develop multisensory images than were participants without sight (both U≥15.0, N1=12, N2=12, p≤.008). Discussion: For both the blind and sighted, auditory and tactile images were rich and varied and similar language was used. Sighted participants were more likely to generate multimodal images. This was particularly the case for tactile words. Nevertheless, cue words that resulted in multisensory images were not necessarily rated as more imageable. The discussion considers whether or not multimodal imagery represent a method of compensating for impoverished unimodal imagery. Implications for Practitioners: Imagery is important not only as a mnemonic in memory rehabilitation, but also everyday uses for things such as autobiographical memory. This research emphasises both the importance of not only auditory and tactile sensory imagery, but also spatial imagery for people without sight.