8 resultados para Multimodal Interaction
em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast
Resumo:
In this paper, a novel video-based multimodal biometric verification scheme using the subspace-based low-level feature fusion of face and speech is developed for specific speaker recognition for perceptual human--computer interaction (HCI). In the proposed scheme, human face is tracked and face pose is estimated to weight the detected facelike regions in successive frames, where ill-posed faces and false-positive detections are assigned with lower credit to enhance the accuracy. In the audio modality, mel-frequency cepstral coefficients are extracted for voice-based biometric verification. In the fusion step, features from both modalities are projected into nonlinear Laplacian Eigenmap subspace for multimodal speaker recognition and combined at low level. The proposed approach is tested on the video database of ten human subjects, and the results show that the proposed scheme can attain better accuracy in comparison with the conventional multimodal fusion using latent semantic analysis as well as the single-modality verifications. The experiment on MATLAB shows the potential of the proposed scheme to attain the real-time performance for perceptual HCI applications.
Resumo:
This paper presents a novel method of audio-visual feature-level fusion for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowledge about the corruption. Furthermore, we assume there are limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new multimodal feature representation and a modified cosine similarity are introduced to combine and compare bimodal features with limited training data, as well as vastly differing data rates and feature sizes. Optimal feature selection and multicondition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Experiments have been carried out on a bimodal dataset created from the SPIDRE speaker recognition database and AR face recognition database with variable noise corruption of speech and occlusion in the face images. The system's speaker identification performance on the SPIDRE database, and facial identification performance on the AR database, is comparable with the literature. Combining both modalities using the new method of multimodal fusion leads to significantly improved accuracy over the unimodal systems, even when both modalities have been corrupted. The new method also shows improved identification accuracy compared with the bimodal systems based on multicondition model training or missing-feature decoding alone.
Resumo:
In this paper we present a convolutional neuralnetwork (CNN)-based model for human head pose estimation inlow-resolution multi-modal RGB-D data. We pose the problemas one of classification of human gazing direction. We furtherfine-tune a regressor based on the learned deep classifier. Next wecombine the two models (classification and regression) to estimateapproximate regression confidence. We present state-of-the-artresults in datasets that span the range of high-resolution humanrobot interaction (close up faces plus depth information) data tochallenging low resolution outdoor surveillance data. We buildupon our robust head-pose estimation and further introduce anew visual attention model to recover interaction with theenvironment. Using this probabilistic model, we show thatmany higher level scene understanding like human-human/sceneinteraction detection can be achieved. Our solution runs inreal-time on commercial hardware
Resumo:
Background: Sociocultural theories state that learning results from people participating in contexts where social interaction is facilitated. There is a need to create such facilitated pedagogical spaces where participants share their ways of knowing and doing. The aim of this exploratory study was to introduce pedagogical space for sociocultural interaction using ‘Identity Text’.
Methods: Identity texts are sociocultural artifacts produced by participants, which can be written, spoken, visual, musical, or multimodal. In 2013, participants of an international medical education fellowship program were asked to create their own Identity Texts to promote discussion about participants’ cultural backgrounds. Thematic analysis was used to make the analysis relevant to studying the pedagogical utility of the intervention.
Result: The Identity Text intervention created two spaces: a ‘reflective space’ helped
participants reflect on sensitive topics like institutional environments, roles in
interdisciplinary teams, and gender discrimination. A ‘narrative space’ allowed
participants to tell powerful stories that provided cultural insights and challenged cultural hegemony; they described the conscious and subconscious transformation in identity that evolved secondary to struggles with local power dynamics and social demands involving the impact of family, peers and country of origin.
Conclusion: Whilst the impact of providing pedagogical space using Identity Text on
cognitive engagement and enhanced learning requires further research, the findings of
this study suggest that it is a useful pedagogical strategy to support cross-cultural
education.