992 resultados para multimodal interaction
Resumo:
We are witnessing a fundamental transformation in how Internet of Things (IoT) is having an impact on the experience users have with data-driven devices, smart appliances, and connected products. The experience of any place is commonly defined as the result of a series of user engagements with a surrounding place in order to carry out daily activities (Golledge, 2002). Knowing about users? experiences becomes vital to the process of designing a map. In the near future, a user will be able to interact directly with any IoT device placed in his surrounding place and very little is known on what kinds of interactions and experiences a map might offer (Roth, 2015). The main challenge is to develop an experience design process to devise maps capable of supporting different user experience dimensions such as cognitive, sensory-physical, affective, and social (Tussyadiah and Zach, 2012). For example, in a smart city of the future, the IoT devices allowing a multimodal interaction with a map could help tourists in the assimilation of their knowledge about points of interest (cognitive experience), their association of sounds and smells to these places (sensory-physical experience), their emotional connection to them (affective experience) and their relationships with other nearby tourists (social experience). This paper aims to describe a conceptual framework for developing a Mapping Experience Design (MXD) process for building maps for smart connected places of the future. Our MXD process is focussed on the cognitive dimension of an experience in which a person perceives a place as a "living entity" that uses and feeds through his experiences. We want to help people to undergo a meaningful experience of a place through mapping what is being communicated during their interactions with the IoT devices situated in this place. Our purpose is to understand how maps can support a person?s experience in making better decisions in real-time.
Resumo:
The emerging technologies have expanded a new dimension of self – ‘technoself’ driven by socio-technical innovations and taken an important step forward in pervasive learning. Technology Enhanced Learning (TEL) research has increasingly focused on emergent technologies such as Augmented Reality (AR) for augmented learning, mobile learning, and game-based learning in order to improve self-motivation and self-engagement of the learners in enriched multimodal learning environments. These researches take advantage of technological innovations in hardware and software across different platforms and devices including tablets, phoneblets and even game consoles and their increasing popularity for pervasive learning with the significant development of personalization processes which place the student at the center of the learning process. In particular, augmented reality (AR) research has matured to a level to facilitate augmented learning, which is defined as an on-demand learning technique where the learning environment adapts to the needs and inputs from learners. In this paper we firstly study the role of Technology Acceptance Model (TAM) which is one of the most influential theories applied in TEL on how learners come to accept and use a new technology. Then we present the design methodology of the technoself approach for pervasive learning and introduce technoself enhanced learning as a novel pedagogical model to improve student engagement by shaping personal learning focus and setting. Furthermore we describe the design and development of an AR-based interactive digital interpretation system for augmented learning and discuss key features. By incorporating mobiles, game simulation, voice recognition, and multimodal interaction through Augmented Reality, the learning contents can be geared toward learner's needs and learners can stimulate discovery and gain greater understanding. The system demonstrates that Augmented Reality can provide rich contextual learning environment and contents tailored for individuals. Augment learning via AR can bridge this gap between the theoretical learning and practical learning, and focus on how the real and virtual can be combined together to fulfill different learning objectives, requirements, and even environments. Finally, we validate and evaluate the AR-based technoself enhanced learning approach to enhancing the student motivation and engagement in the learning process through experimental learning practices. It shows that Augmented Reality is well aligned with constructive learning strategies, as learners can control their own learning and manipulate objects that are not real in augmented environment to derive and acquire understanding and knowledge in a broad diversity of learning practices including constructive activities and analytical activities.
Resumo:
199 p.
Resumo:
Surgical interventions are usually performed in an operation room; however, access to the information by the medical team members during the intervention is limited. While in conversations with the medical staff, we observed that they attach significant importance to the improvement of the information and communication direct access by queries during the process in real time. It is due to the fact that the procedure is rather slow and there is lack of interaction with the systems in the operation room. These systems can be integrated on the Cloud adding new functionalities to the existing systems the medical expedients are processed. Therefore, such a communication system needs to be built upon the information and interaction access specifically designed and developed to aid the medical specialists. Copyright 2014 ACM.
Resumo:
En el presente artículo se introduce el concepto de alineamiento modal, un fenómeno interactivo característico del preámbulo de reuniones por videoconferencia, en las que la interacción puede llevarse a cabo a través del chat escrito, la imagen y la voz. Con tal propósito, se parte del modelo de interacción de Erving Goffman y la metodología del Análisis de la Conversación (AC). A través de una selección de ejemplos extraídos de un corpus de dieciocho interacciones por Adobe Connect 7.0, el análisis muestra que la selección del canal, dentro del contexto analizado, constituye un recurso para el alineamiento y la (re)configuración del marco de participación de las reuniones. Asimismo, se sugiere que dicho recurso es utilizado por los participantes como estrategia para gestionar la orientación recíproca y la toma de turno durante los preámbulos.
Resumo:
In this paper, a novel video-based multimodal biometric verification scheme using the subspace-based low-level feature fusion of face and speech is developed for specific speaker recognition for perceptual human--computer interaction (HCI). In the proposed scheme, human face is tracked and face pose is estimated to weight the detected facelike regions in successive frames, where ill-posed faces and false-positive detections are assigned with lower credit to enhance the accuracy. In the audio modality, mel-frequency cepstral coefficients are extracted for voice-based biometric verification. In the fusion step, features from both modalities are projected into nonlinear Laplacian Eigenmap subspace for multimodal speaker recognition and combined at low level. The proposed approach is tested on the video database of ten human subjects, and the results show that the proposed scheme can attain better accuracy in comparison with the conventional multimodal fusion using latent semantic analysis as well as the single-modality verifications. The experiment on MATLAB shows the potential of the proposed scheme to attain the real-time performance for perceptual HCI applications.
Resumo:
THE RIGORS OF ESTABLISHING INNATENESS and domain specificity pose challenges to adaptationist models of music evolution. In articulating a series of constraints, the authors of the target articles provide strategies for investigating the potential origins of music. We propose additional approaches for exploring theories based on exaptation. We discuss a view of music as a multimodal system of engaging with affect, enabled by capacities of symbolism and a theory of mind.
Resumo:
Users need to be able to address in-air gesture systems, which means finding where to perform gestures and how to direct them towards the intended system. This is necessary for input to be sensed correctly and without unintentionally affecting other systems. This thesis investigates novel interaction techniques which allow users to address gesture systems properly, helping them find where and how to gesture. It also investigates audio, tactile and interactive light displays for multimodal gesture feedback; these can be used by gesture systems with limited output capabilities (like mobile phones and small household controls), allowing the interaction techniques to be used by a variety of device types. It investigates tactile and interactive light displays in greater detail, as these are not as well understood as audio displays. Experiments 1 and 2 explored tactile feedback for gesture systems, comparing an ultrasound haptic display to wearable tactile displays at different body locations and investigating feedback designs. These experiments found that tactile feedback improves the user experience of gesturing by reassuring users that their movements are being sensed. Experiment 3 investigated interactive light displays for gesture systems, finding this novel display type effective for giving feedback and presenting information. It also found that interactive light feedback is enhanced by audio and tactile feedback. These feedback modalities were then used alongside audio feedback in two interaction techniques for addressing gesture systems: sensor strength feedback and rhythmic gestures. Sensor strength feedback is multimodal feedback that tells users how well they can be sensed, encouraging them to find where to gesture through active exploration. Experiment 4 found that they can do this with 51mm accuracy, with combinations of audio and interactive light feedback leading to the best performance. Rhythmic gestures are continuously repeated gesture movements which can be used to direct input. Experiment 5 investigated the usability of this technique, finding that users can match rhythmic gestures well and with ease. Finally, these interaction techniques were combined, resulting in a new single interaction for addressing gesture systems. Using this interaction, users could direct their input with rhythmic gestures while using the sensor strength feedback to find a good location for addressing the system. Experiment 6 studied the effectiveness and usability of this technique, as well as the design space for combining the two types of feedback. It found that this interaction was successful, with users matching 99.9% of rhythmic gestures, with 80mm accuracy from target points. The findings show that gesture systems could successfully use this interaction technique to allow users to address them. Novel design recommendations for using rhythmic gestures and sensor strength feedback were created, informed by the experiment findings.
Resumo:
A discriminação de fases que são praticamente indistinguíveis ao microscópio ótico de luz refletida ou ao microscópio eletrônico de varredura (MEV) é um dos problemas clássicos da microscopia de minérios. Com o objetivo de resolver este problema vem sendo recentemente empregada a técnica de microscopia colocalizada, que consiste na junção de duas modalidades de microscopia, microscopia ótica e microscopia eletrônica de varredura. O objetivo da técnica é fornecer uma imagem de microscopia multimodal, tornando possível a identificação, em amostras de minerais, de fases que não seriam distinguíveis com o uso de uma única modalidade, superando assim as limitações individuais dos dois sistemas. O método de registro até então disponível na literatura para a fusão das imagens de microscopia ótica e de microscopia eletrônica de varredura é um procedimento trabalhoso e extremamente dependente da interação do operador, uma vez que envolve a calibração do sistema com uma malha padrão a cada rotina de aquisição de imagens. Por esse motivo a técnica existente não é prática. Este trabalho propõe uma metodologia para automatizar o processo de registro de imagens de microscopia ótica e de microscopia eletrônica de varredura de maneira a aperfeiçoar e simplificar o uso da técnica de microscopia colocalizada. O método proposto pode ser subdividido em dois procedimentos: obtenção da transformação e registro das imagens com uso desta transformação. A obtenção da transformação envolve, primeiramente, o pré-processamento dos pares de forma a executar um registro grosseiro entre as imagens de cada par. Em seguida, são obtidos pontos homólogos, nas imagens óticas e de MEV. Para tal, foram utilizados dois métodos, o primeiro desenvolvido com base no algoritmo SIFT e o segundo definido a partir da varredura pelo máximo valor do coeficiente de correlação. Na etapa seguinte é calculada a transformação. Foram empregadas duas abordagens distintas: a média ponderada local (LWM) e os mínimos quadrados ponderados com polinômios ortogonais (MQPPO). O LWM recebe como entradas os chamados pseudo-homólogos, pontos que são forçadamente distribuídos de forma regular na imagem de referência, e que revelam, na imagem a ser registrada, os deslocamentos locais relativos entre as imagens. Tais pseudo-homólogos podem ser obtidos tanto pelo SIFT como pelo método do coeficiente de correlação. Por outro lado, o MQPPO recebe um conjunto de pontos com a distribuição natural. A análise dos registro de imagens obtidos empregou como métrica o valor da correlação entre as imagens obtidas. Observou-se que com o uso das variantes propostas SIFT-LWM e SIFT-Correlação foram obtidos resultados ligeiramente superiores aos do método com a malha padrão e LWM. Assim, a proposta, além de reduzir drasticamente a intervenção do operador, ainda possibilitou resultados mais precisos. Por outro lado, o método baseado na transformação fornecida pelos mínimos quadrados ponderados com polinômios ortogonais mostrou resultados inferiores aos produzidos pelo método que faz uso da malha padrão.
Resumo:
This paper presents a novel method of audio-visual feature-level fusion for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowledge about the corruption. Furthermore, we assume there are limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new multimodal feature representation and a modified cosine similarity are introduced to combine and compare bimodal features with limited training data, as well as vastly differing data rates and feature sizes. Optimal feature selection and multicondition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Experiments have been carried out on a bimodal dataset created from the SPIDRE speaker recognition database and AR face recognition database with variable noise corruption of speech and occlusion in the face images. The system's speaker identification performance on the SPIDRE database, and facial identification performance on the AR database, is comparable with the literature. Combining both modalities using the new method of multimodal fusion leads to significantly improved accuracy over the unimodal systems, even when both modalities have been corrupted. The new method also shows improved identification accuracy compared with the bimodal systems based on multicondition model training or missing-feature decoding alone.