58 resultados para MULTIMODAL ELUTION
Resumo:
This paper introduces a novel interface designed to help blind and visually impaired people to explore and navigate on the Web. In contrast to traditionally used assistive tools, such as screen readers and magnifiers, the new interface employs a combination of both audio and haptic features to provide spatial and navigational information to users. The haptic features are presented via a low-cost force feedback mouse allowing blind people to interact with the Web, in a similar fashion to their sighted counterparts. The audio provides navigational and textual information through the use of non-speech sounds and synthesised speech. Interacting with the multimodal interface offers a novel experience to target users, especially to those with total blindness. A series of experiments have been conducted to ascertain the usability of the interface and compare its performance to that of a traditional screen reader. Results have shown the advantages that the new multimodal interface offers blind and visually impaired people. This includes the enhanced perception of the spatial layout of Web pages, and navigation towards elements on a page. Certain issues regarding the design of the haptic and audio features raised in the evaluation are discussed and presented in terms of recommendations for future work.
Resumo:
In this paper, a novel video-based multimodal biometric verification scheme using the subspace-based low-level feature fusion of face and speech is developed for specific speaker recognition for perceptual human--computer interaction (HCI). In the proposed scheme, human face is tracked and face pose is estimated to weight the detected facelike regions in successive frames, where ill-posed faces and false-positive detections are assigned with lower credit to enhance the accuracy. In the audio modality, mel-frequency cepstral coefficients are extracted for voice-based biometric verification. In the fusion step, features from both modalities are projected into nonlinear Laplacian Eigenmap subspace for multimodal speaker recognition and combined at low level. The proposed approach is tested on the video database of ten human subjects, and the results show that the proposed scheme can attain better accuracy in comparison with the conventional multimodal fusion using latent semantic analysis as well as the single-modality verifications. The experiment on MATLAB shows the potential of the proposed scheme to attain the real-time performance for perceptual HCI applications.
Resumo:
SEMAINE has created a large audiovisual database as a part of an iterative approach to building Sensitive Artificial Listener (SAL) agents that can engage a person in a sustained, emotionally colored conversation. Data used to build the agents came from interactions between users and an operator simulating a SAL agent, in different configurations: Solid SAL (designed so that operators displayed an appropriate nonverbal behavior) and Semi-automatic SAL (designed so that users' experience approximated interacting with a machine). We then recorded user interactions with the developed system, Automatic SAL, comparing the most communicatively competent version to versions with reduced nonverbal skills. High quality recording was provided by five high-resolution, high-framerate cameras, and four microphones, recorded synchronously. Recordings total 150 participants, for a total of 959 conversations with individual SAL characters, lasting approximately 5 minutes each. Solid SAL recordings are transcribed and extensively annotated: 6-8 raters per clip traced five affective dimensions and 27 associated categories. Other scenarios are labeled on the same pattern, but less fully. Additional information includes FACS annotation on selected extracts, identification of laughs, nods, and shakes, and measures of user engagement with the automatic system. The material is available through a web-accessible database. © 2010-2012 IEEE.