45 resultados para Multimodal
em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast
Resumo:
This paper introduces a novel interface designed to help blind and visually impaired people to explore and navigate on the Web. In contrast to traditionally used assistive tools, such as screen readers and magnifiers, the new interface employs a combination of both audio and haptic features to provide spatial and navigational information to users. The haptic features are presented via a low-cost force feedback mouse allowing blind people to interact with the Web, in a similar fashion to their sighted counterparts. The audio provides navigational and textual information through the use of non-speech sounds and synthesised speech. Interacting with the multimodal interface offers a novel experience to target users, especially to those with total blindness. A series of experiments have been conducted to ascertain the usability of the interface and compare its performance to that of a traditional screen reader. Results have shown the advantages that the new multimodal interface offers blind and visually impaired people. This includes the enhanced perception of the spatial layout of Web pages, and navigation towards elements on a page. Certain issues regarding the design of the haptic and audio features raised in the evaluation are discussed and presented in terms of recommendations for future work.
Resumo:
In this paper, a novel video-based multimodal biometric verification scheme using the subspace-based low-level feature fusion of face and speech is developed for specific speaker recognition for perceptual human--computer interaction (HCI). In the proposed scheme, human face is tracked and face pose is estimated to weight the detected facelike regions in successive frames, where ill-posed faces and false-positive detections are assigned with lower credit to enhance the accuracy. In the audio modality, mel-frequency cepstral coefficients are extracted for voice-based biometric verification. In the fusion step, features from both modalities are projected into nonlinear Laplacian Eigenmap subspace for multimodal speaker recognition and combined at low level. The proposed approach is tested on the video database of ten human subjects, and the results show that the proposed scheme can attain better accuracy in comparison with the conventional multimodal fusion using latent semantic analysis as well as the single-modality verifications. The experiment on MATLAB shows the potential of the proposed scheme to attain the real-time performance for perceptual HCI applications.
Resumo:
SEMAINE has created a large audiovisual database as a part of an iterative approach to building Sensitive Artificial Listener (SAL) agents that can engage a person in a sustained, emotionally colored conversation. Data used to build the agents came from interactions between users and an operator simulating a SAL agent, in different configurations: Solid SAL (designed so that operators displayed an appropriate nonverbal behavior) and Semi-automatic SAL (designed so that users' experience approximated interacting with a machine). We then recorded user interactions with the developed system, Automatic SAL, comparing the most communicatively competent version to versions with reduced nonverbal skills. High quality recording was provided by five high-resolution, high-framerate cameras, and four microphones, recorded synchronously. Recordings total 150 participants, for a total of 959 conversations with individual SAL characters, lasting approximately 5 minutes each. Solid SAL recordings are transcribed and extensively annotated: 6-8 raters per clip traced five affective dimensions and 27 associated categories. Other scenarios are labeled on the same pattern, but less fully. Additional information includes FACS annotation on selected extracts, identification of laughs, nods, and shakes, and measures of user engagement with the automatic system. The material is available through a web-accessible database. © 2010-2012 IEEE.
Resumo:
This paper presents a novel method of audio-visual feature-level fusion for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowledge about the corruption. Furthermore, we assume there are limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new multimodal feature representation and a modified cosine similarity are introduced to combine and compare bimodal features with limited training data, as well as vastly differing data rates and feature sizes. Optimal feature selection and multicondition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Experiments have been carried out on a bimodal dataset created from the SPIDRE speaker recognition database and AR face recognition database with variable noise corruption of speech and occlusion in the face images. The system's speaker identification performance on the SPIDRE database, and facial identification performance on the AR database, is comparable with the literature. Combining both modalities using the new method of multimodal fusion leads to significantly improved accuracy over the unimodal systems, even when both modalities have been corrupted. The new method also shows improved identification accuracy compared with the bimodal systems based on multicondition model training or missing-feature decoding alone.
Resumo:
Structural and functional change in the microcirculation in type 1 diabetes mellitus predicts future end-organ damage and macrovascular events. We explored the utility of novel signal processing techniques to detect and track change in ocular hemodynamics in patients with this disease. 24 patients with uncomplicated type 1 diabetes mellitus, and 18 age-and-sex matched control subjects were studied. Doppler ultrasound was used to interrogate the carotid and ophthalmic arteries and digital photography to image the retinal vasculature. Frequency analysis algorithms were applied to quantify velocity waveform structure and retinal photographic data at baseline and following inhalation of 100% oxygen. Frequency data was compared between groups. No significant differences were found in the resistive index between groups at baseline or following inhaled oxygen. Frequency analysis of the Doppler flow velocity waveforms identified significant differences in bands 3-7 between patients and controls in data captured from the ophthalmic artery (p<0.01 for each band). In response to inhaled oxygen, changes in the frequency band amplitudes were significantly greater in control subjects compared with patients (p<0.05). Only control subjects demonstrated a positive correlation (R=0.61) between change in retinal vessel diameter and frequency band amplitudes derived from ophthalmic artery waveform data. The use of multimodal signal processing techniques applied to Doppler flow velocity waveforms and retinal photographic data identified preclinical change in the ocular microcirculation in patients with uncomplicated diabetes mellitus. An impaired autoregulatory response of the retinal microvasculature may contribute to the future development of retinopathy in such patients.
Resumo:
This paper presents a multimodal analysis of online self-representations of the Elite Squad of the military police of Rio de Janeiro, the Special Police Operations Battalion BOPE. The analysis is placed within the wider context of a “new military urbanism”, which is evidenced in the ongoing “Pacification” of many of the city’s favelas, in which BOPE plays an active interventionist as well as a symbolic role, and is a kind of solution which clearly fails to address the root causes of violence which lie in poverty and social inequality. The paper first provides a sociocultural account of BOPE’s role in Rio’s public security and then looks at some of the mainly visual mediated discourses the Squad employs in constructing a public image of itself as a modern and efficient, yet at the same time “magical” police force.
Resumo:
Critical Discourse Analysis (CDA) has probably made the most comprehensive attempt to develop a theory of the inter-connectedness of discourse, power and ideology and is specifically concerned with the role that discourse plays in main-taining and legitimizing inequality in society. While CDA’s general thrust has been towards the analysis of linguistic structures, some critical discourse analysts have begun to focus on multimodal discourses because of the increasingly impor-tant role these play in many social and political contexts. Still, a great deal of CDA analysis has remained largely monomodal. The principal aim of this chapter is therefore to address this situation and demonstrate in what ways CDA can be deployed to analyse the ways that ideological discourses can be communicated, naturalised and legitimated beyond the linguistic level. The chapter also offers a rationale for a multimodal approach based on Halliday’s Systemic Functional Linguistics (SFL), by which it is directly informed