924 resultados para Multimodal interfaces
Resumo:
This paper presents a queue-based agent architecture for multimodal interfaces. Using a novel approach to intelligently organise both agents and input data, this system has the potential to outperform current state-of-the-art multimodal systems, while at the same time allowing greater levels of interaction and flexibility. This assertion is supported by simulation test results showing that significant improvements can be obtained over normal sequential agent scheduling architectures. For real usage, this translates into faster, more comprehensive systems, without the limited application domain that restricts current implementations.
Resumo:
The ability to view and interact with 3D models has been happening for a long time. However, vision-based 3D modeling has only seen limited success in applications, as it faces many technical challenges. Hand-held mobile devices have changed the way we interact with virtual reality environments. Their high mobility and technical features, such as inertial sensors, cameras and fast processors, are especially attractive for advancing the state of the art in virtual reality systems. Also, their ubiquity and fast Internet connection open a path to distributed and collaborative development. However, such path has not been fully explored in many domains. VR systems for real world engineering contexts are still difficult to use, especially when geographically dispersed engineering teams need to collaboratively visualize and review 3D CAD models. Another challenge is the ability to rendering these environments at the required interactive rates and with high fidelity. In this document it is presented a virtual reality system mobile for visualization, navigation and reviewing large scale 3D CAD models, held under the CEDAR (Collaborative Engineering Design and Review) project. It’s focused on interaction using different navigation modes. The system uses the mobile device's inertial sensors and camera to allow users to navigate through large scale models. IT professionals, architects, civil engineers and oil industry experts were involved in a qualitative assessment of the CEDAR system, in the form of direct user interaction with the prototypes and audio-recorded interviews about the prototypes. The lessons learned are valuable and are presented on this document. Subsequently it was prepared a quantitative study on the different navigation modes to analyze the best mode to use it in a given situation.
Resumo:
The ability to view and interact with 3D models has been happening for a long time. However, vision-based 3D modeling has only seen limited success in applications, as it faces many technical challenges. Hand-held mobile devices have changed the way we interact with virtual reality environments. Their high mobility and technical features, such as inertial sensors, cameras and fast processors, are especially attractive for advancing the state of the art in virtual reality systems. Also, their ubiquity and fast Internet connection open a path to distributed and collaborative development. However, such path has not been fully explored in many domains. VR systems for real world engineering contexts are still difficult to use, especially when geographically dispersed engineering teams need to collaboratively visualize and review 3D CAD models. Another challenge is the ability to rendering these environments at the required interactive rates and with high fidelity. In this document it is presented a virtual reality system mobile for visualization, navigation and reviewing large scale 3D CAD models, held under the CEDAR (Collaborative Engineering Design and Review) project. It’s focused on interaction using different navigation modes. The system uses the mobile device's inertial sensors and camera to allow users to navigate through large scale models. IT professionals, architects, civil engineers and oil industry experts were involved in a qualitative assessment of the CEDAR system, in the form of direct user interaction with the prototypes and audio-recorded interviews about the prototypes. The lessons learned are valuable and are presented on this document. Subsequently it was prepared a quantitative study on the different navigation modes to analyze the best mode to use it in a given situation.
Resumo:
This paper focuses on evaluating the usability of an Intelligent Wheelchair (IW) in both real and simulated environments. The wheelchair is controlled at a high-level by a flexible multimodal interface, using voice commands, facial expressions, head movements and joystick as its main inputs. A Quasi-experimental design was applied including a deterministic sample with a questionnaire that enabled to apply the System Usability Scale. The subjects were divided in two independent samples: 46 individuals performing the experiment with an Intelligent Wheelchair in a simulated environment (28 using different commands in a sequential way and 18 with the liberty to choose the command); 12 individuals performing the experiment with a real IW. The main conclusion achieved by this study is that the usability of the Intelligent Wheelchair in a real environment is higher than in the simulated environment. However there were not statistical evidences to affirm that there are differences between the real and simulated wheelchairs in terms of safety and control. Also, most of users considered the multimodal way of driving the wheelchair very practical and satisfactory. Thus, it may be concluded that the multimodal interfaces enables very easy and safe control of the IW both in simulated and real environments.
Resumo:
Technology advances in recent years have dramatically changed the way users exploit contents and services available on the Internet, by enforcing pervasive and mobile computing scenarios and enabling access to networked resources almost from everywhere, at anytime, and independently of the device in use. In addition, people increasingly require to customize their experience, by exploiting specific device capabilities and limitations, inherent features of the communication channel in use, and interaction paradigms that significantly differ from the traditional request/response one. So-called Ubiquitous Internet scenario calls for solutions that address many different challenges, such as device mobility, session management, content adaptation, context-awareness and the provisioning of multimodal interfaces. Moreover, new service opportunities demand simple and effective ways to integrate existing resources into new and value added applications, that can also undergo run-time modifications, according to ever-changing execution conditions. Despite service-oriented architectural models are gaining momentum to tame the increasing complexity of composing and orchestrating distributed and heterogeneous functionalities, existing solutions generally lack a unified approach and only provide support for specific Ubiquitous Internet aspects. Moreover, they usually target rather static scenarios and scarcely support the dynamic nature of pervasive access to Internet resources, that can make existing compositions soon become obsolete or inadequate, hence in need of reconfiguration. This thesis proposes a novel middleware approach to comprehensively deal with Ubiquitous Internet facets and assist in establishing innovative application scenarios. We claim that a truly viable ubiquity support infrastructure must neatly decouple distributed resources to integrate and push any kind of content-related logic outside its core layers, by keeping only management and coordination responsibilities. Furthermore, we promote an innovative, open, and dynamic resource composition model that allows to easily describe and enforce complex scenario requirements, and to suitably react to changes in the execution conditions.
Resumo:
Tracking user’s visual attention is a fundamental aspect in novel human-computer interaction paradigms found in Virtual Reality. For example, multimodal interfaces or dialogue-based communications with virtual and real agents greatly benefit from the analysis of the user’s visual attention as a vital source for deictic references or turn-taking signals. Current approaches to determine visual attention rely primarily on monocular eye trackers. Hence they are restricted to the interpretation of two-dimensional fixations relative to a defined area of projection. The study presented in this article compares precision, accuracy and application performance of two binocular eye tracking devices. Two algorithms are compared which derive depth information as required for visual attention-based 3D interfaces. This information is further applied to an improved VR selection task in which a binocular eye tracker and an adaptive neural network algorithm is used during the disambiguation of partly occluded objects.
Resumo:
Optimism is growing that the near future will witness rapid growth in human-computer interaction using voice. System prototypes have recently been built that demonstrate speaker-independent real-time speech recognition, and understanding of naturally spoken utterances with vocabularies of 1000 to 2000 words, and larger. Already, computer manufacturers are building speech recognition subsystems into their new product lines. However, before this technology can be broadly useful, a substantial knowledge base is needed about human spoken language and performance during computer-based spoken interaction. This paper reviews application areas in which spoken interaction can play a significant role, assesses potential benefits of spoken interaction with machines, and compares voice with other modalities of human-computer interaction. It also discusses information that will be needed to build a firm empirical foundation for the design of future spoken and multimodal interfaces. Finally, it argues for a more systematic and scientific approach to investigating spoken input and performance with future language technology.
Resumo:
In research on Silent Speech Interfaces (SSI), different sources of information (modalities) have been combined, aiming at obtaining better performance than the individual modalities. However, when combining these modalities, the dimensionality of the feature space rapidly increases, yielding the well-known "curse of dimensionality". As a consequence, in order to extract useful information from this data, one has to resort to feature selection (FS) techniques to lower the dimensionality of the learning space. In this paper, we assess the impact of FS techniques for silent speech data, in a dataset with 4 non-invasive and promising modalities, namely: video, depth, ultrasonic Doppler sensing, and surface electromyography. We consider two supervised (mutual information and Fisher's ratio) and two unsupervised (meanmedian and arithmetic mean geometric mean) FS filters. The evaluation was made by assessing the classification accuracy (word recognition error) of three well-known classifiers (knearest neighbors, support vector machines, and dynamic time warping). The key results of this study show that both unsupervised and supervised FS techniques improve on the classification accuracy on both individual and combined modalities. For instance, on the video component, we attain relative performance gains of 36.2% in error rates. FS is also useful as pre-processing for feature fusion. Copyright © 2014 ISCA.
Resumo:
Esta dissertação apresenta o desenvolvimento de uma plataforma multimodal de aquisição e processamento de sinais. O projeto proposto insere-se no contexto do desenvolvimento de interfaces multimodais para aplicação em dispositivos robóticos cujo propósito é a reabilitação motora adaptando o controle destes dispositivos de acordo com a intenção do usuário. A interface desenvolvida adquire, sincroniza e processa sinais eletroencefalográficos (EEG), eletromiográficos (EMG) e sinais provenientes de sensores inerciais (IMUs). A aquisição dos dados é feita em experimentos realizados com sujeitos saudáveis que executam tarefas motoras de membros inferiores. O objetivo é analisar a intenção de movimento, a ativação muscular e o início efetivo dos movimentos realizados, respectivamente, através dos sinais de EEG, EMG e IMUs. Para este fim, uma análise offline foi realizada. Nessa análise, são utilizadas técnicas de processamento dos sinais biológicos e técnicas para processar sinais provenientes de sensores inerciais. A partir destes, os ângulos da articulação do joelho também são aferidos ao longo dos movimentos. Um protocolo experimental de testes foi proposto para as tarefas realizadas. Os resultados demonstraram que o sistema proposto foi capaz de adquirir, sincronizar, processar e classificar os sinais combinadamente. Análises acerca da acurácia dos classificadores utilizados mostraram que a interface foi capaz de identificar intenção de movimento em 76, 0 ± 18, 2% dos movimentos. A maior média de tempo de antecipação ao movimento foi obtida através da análise do sinal de EEG e foi de 716, 0±546, 1 milisegundos. A partir da análise apenas do sinal de EMG, este valor foi de 88, 34 ± 67, 28 milisegundos. Os resultados das etapas de processamento dos sinais biológicos, a medição dos ângulos da articulação, bem como os valores de acurácia e tempo de antecipação ao movimento se mostraram em conformidade com a literatura atual relacionada.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
TESSA is a toolkit for experimenting with sensory augmentation. It includes hardware and software to facilitate rapid prototyping of interfaces that can enhance one sense using information gathered from another sense. The toolkit contains a range of sensors (e.g. ultrasonics, temperature sensors) and actuators (e.g. tactors or stereo sound), designed modularly so that inputs and outputs can be easily swapped in and out and customized using TESSA’s graphical user interface (GUI), with “real time” feedback. The system runs on a Raspberry Pi with a built-in touchscreen, providing a compact and portable form that is amenable for field trials. At CHI Interactivity, the audience will have the opportunity to experience sensory augmentation effects using this system, and design their own sensory augmentation interfaces.
Resumo:
This paper describes a novel architecture to introduce automatic annotation and processing of semantic sensor data within context-aware applications. Based on the well-known state-charts technologies, and represented using W3C SCXML language combined with Semantic Web technologies, our architecture is able to provide enriched higher-level semantic representations of user’s context. This capability to detect and model relevant user situations allows a seamless modeling of the actual interaction situation, which can be integrated during the design of multimodal user interfaces (also based on SCXML) for them to be adequately adapted. Therefore, the final result of this contribution can be described as a flexible context-aware SCXML-based architecture, suitable for both designing a wide range of multimodal context-aware user interfaces, and implementing the automatic enrichment of sensor data, making it available to the entire Semantic Sensor Web