82 resultados para Gesture based audio user interface
Resumo:
Sabor, Software de Análisis de BOcinas y Reflectores, es una herramienta didáctica la cual es utilizada en los laboratorios de la escuela para realizar prácticas de la asignatura Antenas y Compatibilidad Electromagnética, esta herramienta da a los alumnos una visión gráfica de lo que se enseña en clase de teoría de lo que son los campos en las aperturas de los reflectores. El proyector pretende sustituir al primer Sabor , ya que se queda obsoleto debido al sistema operativo, ya que funciona solo para Windows XP y con ordenadores de 32 bits, y también realizar mejoras y corregir errores de la versión anterior. El proyecto se ha desarrollado en Matlab que es un software matemático con grandes ventajas en cuanto a cálculo, desarrollo gráfico, y a la creación de nuevos algoritmos en su propio lenguaje y además está disponible para las plataformas Unix, Windows, Mac OSX y GNU/Linux. El objetivo del proyecto ha sido implementar, al igual que las versiones anteriores, cinco tipos de reflectores, como son: Parabólico, Offset, Cassegrain y los dos Dobles Offset, Cassegrain y Gregorian, y han sido analizados con un alimentador ideal ,cos-q, y por último los resultados obtenidos se han comparado con las versiones anteriores de Sabor, como son Sabor 3.0 y el primer Sabor. El proyecto consta de partes muy bien diferencias como son : La interpretación correctas de las formulas que se han utilizado para la realización de este proyecto ,dichas formulas han sido las dadas por el proyecto fin de carrera titulado Sabor3.0 de Francisco Egea Castejón. GUIDE, the graphical user interface development environment, con el que se creó: GUI, graphical user interface, que es la parte de Matlab dedicada a crear interfaces de usuario , herramienta utilizada para crear nuestras distintas ventanas dedicadas para la obtención de datos para analizar los distintos reflectores y para mostrar por pantalla los distintos resultados. Programación Orientada a Objetos de Matlab y sus distintas propiedades como son la herencia lo cual es muy útil para ocupar menos memoria ya que con un único método podemos realizar distintos cálculos con los distintos reflectores, objetos, solo cambiando las propiedades de cada objeto Y por último ha sido la realización de validación de los resultados con la ayuda de las versiones anteriores de Sabor, que están detallados en el capítulo 5 y la unión con bocinas del proyecto fin de carrera Análisis de Bocinas en Matlab de Javier Montero. Por otra parte tenemos las mejoras realizadas a las antiguas versiones como son: realización de registros que el usuario puede guardar y cargar con las distintas variables, también se ha realizado un fichero .txt en el que consta la amplitud del campo con su respectiva theta para que el usuario pueda visualizarlo en cualquier plataforma gráfica de datos como por ejemplo exel. ABSTRACT. Sabor, Software de Análisis de BOcinas y Reflectores, is a teaching tool, which is used to do laboratory practice in the subject of Antennas y Compatibilidad Electromagnética, this tool gives students a graphic view of the knowledge that are given in theory class in regard to aperture field of reflectors. This project intend to replace the first Sabor, because it is outdated, due to the operating system, because Sabor works only with Widows XP and computer with 32 bits, and to make improves and correct errors that were detected in the last version of Sabor too. This project has been carried out in Matlab, which is a mathematical software with high-level language for numerical computation, visualization and application development, and furthermore it is available to different platforms such as Unix, Windows ,Mac OSX and GNU/Linux This project has focused on implementing, the same as last versions, five kind of reflectors, such as : Parabolic, Offset, Cassegrain and two offset dual reflector Cassegrain y Gregorian ,and these were analysed with a cos-q ideal feed, and finally the results were checked with the versions of Sabor, as well as Sabor 3.0 and the first Sabor. This project consist of four parts: The correct interpretation of the formulas , which were used to do this project, from the final project Sabor3.0 by Francisco Egea Castejón. GUIDE, the graphical user interface development environment, tool that was used to create : GUI, graphical user interface, part of Matlab dedicated to create user interface. Object Oriented Programming of Matlab and different properties like inheritance, that is very useful for saving memory space because with only one method we can analyse different kind of reflectors, object, only change the properties of the object. At finally, the results were contrasted with the results from the previous versions and the link reflectors with horns from the final project Análisis de Bocinas en Matlab by Javier Montero. On the other hand, we have the improvements such as: registers and .txt file. The registers are used by user to save and load different variables and .txt file is useful because it allows to the user plotting in different platforms for example exel.
Resumo:
Many mobile devices embed nowadays inertial sensors. This enables new forms of human-computer interaction through the use of gestures (movements performed with the mobile device) as a way of communication. This paper presents an accelerometer-based gesture recognition system for mobile devices which is able to recognize a collection of 10 different hand gestures. The system was conceived to be light and to operate in a user -independent manner in real time. The recognition system was implemented in a smart phone and evaluated through a collection of user tests, which showed a recognition accuracy similar to other state-of-the art techniques and a lower computational complexity. The system was also used to build a human -robot interface that enables controlling a wheeled robot with the gestures made with the mobile phone.
Resumo:
The area of Human-Machine Interface is growing fast due to its high importance in all technological systems. The basic idea behind designing human-machine interfaces is to enrich the communication with the technology in a natural and easy way. Gesture interfaces are a good example of transparent interfaces. Such interfaces must identify properly the action the user wants to perform, so the proper gesture recognition is of the highest importance. However, most of the systems based on gesture recognition use complex methods requiring high-resource devices. In this work, we propose to model gestures capturing their temporal properties, which significantly reduce storage requirements, and use clustering techniques, namely self-organizing maps and unsupervised genetic algorithm, for their classification. We further propose to train a certain number of algorithms with different parameters and combine their decision using majority voting in order to decrease the false positive rate. The main advantage of the approach is its simplicity, which enables the implementation using devices with limited resources, and therefore low cost. The testing results demonstrate its high potential.
Resumo:
A more natural, intuitive, user-friendly, and less intrusive Human–Computer interface for controlling an application by executing hand gestures is presented. For this purpose, a robust vision-based hand-gesture recognition system has been developed, and a new database has been created to test it. The system is divided into three stages: detection, tracking, and recognition. The detection stage searches in every frame of a video sequence potential hand poses using a binary Support Vector Machine classifier and Local Binary Patterns as feature vectors. These detections are employed as input of a tracker to generate a spatio-temporal trajectory of hand poses. Finally, the recognition stage segments a spatio-temporal volume of data using the obtained trajectories, and compute a video descriptor called Volumetric Spatiograms of Local Binary Patterns (VS-LBP), which is delivered to a bank of SVM classifiers to perform the gesture recognition. The VS-LBP is a novel video descriptor that constitutes one of the most important contributions of the paper, which is able to provide much richer spatio-temporal information than other existing approaches in the state of the art with a manageable computational cost. Excellent results have been obtained outperforming other approaches of the state of the art.
Resumo:
New forms of natural interactions between human operators and UAVs (Unmanned Aerial Vehicle) are demanded by the military industry to achieve a better balance of the UAV control and the burden of the human operator. In this work, a human machine interface (HMI) based on a novel gesture recognition system using depth imagery is proposed for the control of UAVs. Hand gesture recognition based on depth imagery is a promising approach for HMIs because it is more intuitive, natural, and non-intrusive than other alternatives using complex controllers. The proposed system is based on a Support Vector Machine (SVM) classifier that uses spatio-temporal depth descriptors as input features. The designed descriptor is based on a variation of the Local Binary Pattern (LBP) technique to efficiently work with depth video sequences. Other major consideration is the especial hand sign language used for the UAV control. A tradeoff between the use of natural hand signs and the minimization of the inter-sign interference has been established. Promising results have been achieved in a depth based database of hand gestures especially developed for the validation of the proposed system.
Resumo:
We investigated the atomic surface properties of differently prepared silicon and germanium (100) surfaces during metal-organic vapour phase epitaxy/chemical vapour deposition (MOVPE/MOCVD), in particular the impact of the MOVPE ambient, and applied reflectance anisotropy/difference spectroscopy (RAS/RDS) in our MOVPE reactor to in-situ watch and control the preparation on the atomic length scale for subsequent III-V-nucleation. The technological interest in the predominant opto-electronic properties of III-V-compounds drives the research for their heteroepitaxial integration on more abundant and cheaper standard substrates such as Si(100) or Ge(100). In these cases, a general task must be accomplished successfully, i.e. the growth of polar materials on non-polar substrates and, beyond that, very specific variations such as the individual interface formation and the atomic step structure, have to be controlled. Above all, the method of choice to grow industrial relevant high-performance device structures is MOVPE, not normally compatible with surface and interface sensitive characterization tools, which are commonly based on ultrahigh vacuum (UHV) ambients. A dedicated sample transfer system from MOVPE environment to UHV enabled us to benchmark the optical in-situ spectra with results from various surfaces science instruments without considering disruptive contaminants. X-ray photoelectron spectroscopy (XPS) provided direct observation of different terminations such as arsenic and phosphorous and verified oxide removal under various specific process parameters. Absorption lines in Fourier-transform infrared (FTIR) spectra were used to identify specific stretch modes of coupled hydrides and the polarization dependence of the anti-symmetric stretch modes distinguished different dimer orientations. Scanning tunnelling microscopy (STM) studied the atomic arrangement of dimers and steps and tip-induced H-desorption proved the saturation of dangling bonds after preparati- n. In-situ RAS was employed to display details transiently such as the presence of H on the surface at lower temperatures (T <; 800°C) and the absence of Si-H bonds at elevated annealing temperature and also surface terminations. Ge buffer growth by the use of GeH4 enables the preparation of smooth surfaces and leads to a more pronounced amplitude of the features in the spectra which indicates improvements of the surface quality.
Resumo:
Los sensores inerciales (acelerómetros y giróscopos) se han ido introduciendo poco a poco en dispositivos que usamos en nuestra vida diaria gracias a su minituarización. Hoy en día todos los smartphones contienen como mínimo un acelerómetro y un magnetómetro, siendo complementados en losmás modernos por giróscopos y barómetros. Esto, unido a la proliferación de los smartphones ha hecho viable el diseño de sistemas basados en las medidas de sensores que el usuario lleva colocados en alguna parte del cuerpo (que en un futuro estarán contenidos en tejidos inteligentes) o los integrados en su móvil. El papel de estos sensores se ha convertido en fundamental para el desarrollo de aplicaciones contextuales y de inteligencia ambiental. Algunos ejemplos son el control de los ejercicios de rehabilitación o la oferta de información referente al sitio turístico que se está visitando. El trabajo de esta tesis contribuye a explorar las posibilidades que ofrecen los sensores inerciales para el apoyo a la detección de actividad y la mejora de la precisión de servicios de localización para peatones. En lo referente al reconocimiento de la actividad que desarrolla un usuario, se ha explorado el uso de los sensores integrados en los dispositivos móviles de última generación (luz y proximidad, acelerómetro, giróscopo y magnetómetro). Las actividades objetivo son conocidas como ‘atómicas’ (andar a distintas velocidades, estar de pie, correr, estar sentado), esto es, actividades que constituyen unidades de actividades más complejas como pueden ser lavar los platos o ir al trabajo. De este modo, se usan algoritmos de clasificación sencillos que puedan ser integrados en un móvil como el Naïve Bayes, Tablas y Árboles de Decisión. Además, se pretende igualmente detectar la posición en la que el usuario lleva el móvil, no sólo con el objetivo de utilizar esa información para elegir un clasificador entrenado sólo con datos recogidos en la posición correspondiente (estrategia que mejora los resultados de estimación de la actividad), sino también para la generación de un evento que puede producir la ejecución de una acción. Finalmente, el trabajo incluye un análisis de las prestaciones de la clasificación variando el tipo de parámetros y el número de sensores usados y teniendo en cuenta no sólo la precisión de la clasificación sino también la carga computacional. Por otra parte, se ha propuesto un algoritmo basado en la cuenta de pasos utilizando informaiii ción proveniente de un acelerómetro colocado en el pie del usuario. El objetivo final es detectar la actividad que el usuario está haciendo junto con la estimación aproximada de la distancia recorrida. El algoritmo de cuenta pasos se basa en la detección de máximos y mínimos usando ventanas temporales y umbrales sin requerir información específica del usuario. El ámbito de seguimiento de peatones en interiores es interesante por la falta de un estándar de localización en este tipo de entornos. Se ha diseñado un filtro extendido de Kalman centralizado y ligeramente acoplado para fusionar la información medida por un acelerómetro colocado en el pie del usuario con medidas de posición. Se han aplicado también diferentes técnicas de corrección de errores como las de velocidad cero que se basan en la detección de los instantes en los que el pie está apoyado en el suelo. Los resultados han sido obtenidos en entornos interiores usando las posiciones estimadas por un sistema de triangulación basado en la medida de la potencia recibida (RSS) y GPS en exteriores. Finalmente, se han implementado algunas aplicaciones que prueban la utilidad del trabajo desarrollado. En primer lugar se ha considerado una aplicación de monitorización de actividad que proporciona al usuario información sobre el nivel de actividad que realiza durante un período de tiempo. El objetivo final es favorecer el cambio de comportamientos sedentarios, consiguiendo hábitos saludables. Se han desarrollado dos versiones de esta aplicación. En el primer caso se ha integrado el algoritmo de cuenta pasos en una plataforma OSGi móvil adquiriendo los datos de un acelerómetro Bluetooth colocado en el pie. En el segundo caso se ha creado la misma aplicación utilizando las implementaciones de los clasificadores en un dispositivo Android. Por otro lado, se ha planteado el diseño de una aplicación para la creación automática de un diario de viaje a partir de la detección de eventos importantes. Esta aplicación toma como entrada la información procedente de la estimación de actividad y de localización además de información almacenada en bases de datos abiertas (fotos, información sobre sitios) e información sobre sensores reales y virtuales (agenda, cámara, etc.) del móvil. Abstract Inertial sensors (accelerometers and gyroscopes) have been gradually embedded in the devices that people use in their daily lives thanks to their miniaturization. Nowadays all smartphones have at least one embedded magnetometer and accelerometer, containing the most upto- date ones gyroscopes and barometers. This issue, together with the fact that the penetration of smartphones is growing steadily, has made possible the design of systems that rely on the information gathered by wearable sensors (in the future contained in smart textiles) or inertial sensors embedded in a smartphone. The role of these sensors has become key to the development of context-aware and ambient intelligent applications. Some examples are the performance of rehabilitation exercises, the provision of information related to the place that the user is visiting or the interaction with objects by gesture recognition. The work of this thesis contributes to explore to which extent this kind of sensors can be useful to support activity recognition and pedestrian tracking, which have been proven to be essential for these applications. Regarding the recognition of the activity that a user performs, the use of sensors embedded in a smartphone (proximity and light sensors, gyroscopes, magnetometers and accelerometers) has been explored. The activities that are detected belong to the group of the ones known as ‘atomic’ activities (e.g. walking at different paces, running, standing), that is, activities or movements that are part of more complex activities such as doing the dishes or commuting. Simple, wellknown classifiers that can run embedded in a smartphone have been tested, such as Naïve Bayes, Decision Tables and Trees. In addition to this, another aim is to estimate the on-body position in which the user is carrying the mobile phone. The objective is not only to choose a classifier that has been trained with the corresponding data in order to enhance the classification but also to start actions. Finally, the performance of the different classifiers is analysed, taking into consideration different features and number of sensors. The computational and memory load of the classifiers is also measured. On the other hand, an algorithm based on step counting has been proposed. The acceleration information is provided by an accelerometer placed on the foot. The aim is to detect the activity that the user is performing together with the estimation of the distance covered. The step counting strategy is based on detecting minima and its corresponding maxima. Although the counting strategy is not innovative (it includes time windows and amplitude thresholds to prevent under or overestimation) no user-specific information is required. The field of pedestrian tracking is crucial due to the lack of a localization standard for this kind of environments. A loosely-coupled centralized Extended Kalman Filter has been proposed to perform the fusion of inertial and position measurements. Zero velocity updates have been applied whenever the foot is detected to be placed on the ground. The results have been obtained in indoor environments using a triangulation algorithm based on RSS measurements and GPS outdoors. Finally, some applications have been designed to test the usefulness of the work. The first one is called the ‘Activity Monitor’ whose aim is to prevent sedentary behaviours and to modify habits to achieve desired objectives of activity level. Two different versions of the application have been implemented. The first one uses the activity estimation based on the step counting algorithm, which has been integrated in an OSGi mobile framework acquiring the data from a Bluetooth accelerometer placed on the foot of the individual. The second one uses activity classifiers embedded in an Android smartphone. On the other hand, the design of a ‘Travel Logbook’ has been planned. The input of this application is the information provided by the activity and localization modules, external databases (e.g. pictures, points of interest, weather) and mobile embedded and virtual sensors (agenda, camera, etc.). The aim is to detect important events in the journey and gather the information necessary to store it as a journal page.
Resumo:
Detecting user affect automatically during real-time conversation is the main challenge towards our greater aim of infusing social intelligence into a natural-language mixed-initiative High-Fidelity (Hi-Fi) audio control spoken dialog agent. In recent years, studies on affect detection from voice have moved on to using realistic, non-acted data, which is subtler. However, it is more challenging to perceive subtler emotions and this is demonstrated in tasks such as labelling and machine prediction. This paper attempts to address part of this challenge by considering the role of user satisfaction ratings and also conversational/dialog features in discriminating contentment and frustration, two types of emotions that are known to be prevalent within spoken human-computer interaction. However, given the laboratory constraints, users might be positively biased when rating the system, indirectly making the reliability of the satisfaction data questionable. Machine learning experiments were conducted on two datasets, users and annotators, which were then compared in order to assess the reliability of these datasets. Our results indicated that standard classifiers were significantly more successful in discriminating the abovementioned emotions and their intensities (reflected by user satisfaction ratings) from annotator data than from user data. These results corroborated that: first, satisfaction data could be used directly as an alternative target variable to model affect, and that they could be predicted exclusively by dialog features. Second, these were only true when trying to predict the abovementioned emotions using annotator?s data, suggesting that user bias does exist in a laboratory-led evaluation.
Resumo:
Modeling and prediction of the overall elastic–plastic response and local damage mechanisms in heterogeneous materials, in particular particle reinforced composites, is a very complex problem. Microstructural complexities such as the inhomogeneous spatial distribution of particles, irregular morphology of the particles, and anisotropy in particle orientation after secondary processing, such as extrusion, significantly affect deformation behavior. We have studied the effect of particle/matrix interface debonding in SiC particle reinforced Al alloy matrix composites with (a) actual microstructure consisting of angular SiC particles and (b) idealized ellipsoidal SiC particles. Tensile deformation in SiC particle reinforced Al matrix composites was modeled using actual microstructures reconstructed from serial sectioning approach. Interfacial debonding was modeled using user-defined cohesive zone elements. Modeling with the actual microstructure (versus idealized ellipsoids) has a significant influence on: (a) localized stresses and strains in particle and matrix, and (b) far-field strain at which localized debonding takes place. The angular particles exhibited higher degree of load transfer and are more sensitive to interfacial debonding. Larger decreases in stress are observed in the angular particles, because of the flat surfaces, normal to the loading axis, which bear load. Furthermore, simplification of particle morphology may lead to erroneous results.
Resumo:
This paper presents an empirical evidence of user bias within a laboratory-oriented evaluation of a Spoken Dialog System. Specifically, we addressed user bias in their satisfaction judgements. We question the reliability of this data for modeling user emotion, focusing on contentment and frustration in a spoken dialog system. This bias is detected through machine learning experiments that were conducted on two datasets, users and annotators, which were then compared in order to assess the reliability of these datasets. The target used was the satisfaction rating and the predictors were conversational/dialog features. Our results indicated that standard classifiers were significantly more successful in discriminating frustration and contentment and the intensities of these emotions (reflected by user satisfaction ratings) from annotator data than from user data. Indirectly, the results showed that conversational features are reliable predictors of the two abovementioned emotions.
Resumo:
This article proposes an innovative biometric technique based on the idea of authenticating a person on a mobile device by gesture recognition. To accomplish this aim, a user is prompted to be recognized by a gesture he/she performs moving his/her hand while holding a mobile device with an accelerometer embedded. As users are not able to repeat a gesture exactly in the air, an algorithm based on sequence alignment is developed to correct slight differences between repetitions of the same gesture. The robustness of this biometric technique has been studied within 2 different tests analyzing a database of 100 users with real falsifications. Equal Error Rates of 2.01 and 4.82% have been obtained in a zero-effort and an active impostor attack, respectively. A permanence evaluation is also presented from the analysis of the repetition of the gestures of 25 users in 10 sessions over a month. Furthermore, two different gesture databases have been developed: one made up of 100 genuine identifying 3-D hand gestures and 3 impostors trying to falsify each of them and another with 25 volunteers repeating their identifying 3- D hand gesture in 10 sessions over a month. These databases are the most extensive in published studies, to the best of our knowledge.
Resumo:
This work proposes an encapsulation scheme aimed at simplifying the reuse process of hardware cores. This hardware encapsulation approach has been conceived with a twofold objective. First, we look for the improvement of the reuse interface associated with the hardware core description. This is carried out in a first encapsulation level by improving the limited types and configuration options available in the conventional HDLs interface, and also providing information related to the implementation itself. Second, we have devised a more generic interface focused on describing the function avoiding details from a particular implementation, what corresponds to a second encapsulation level. This encapsulation allows the designer to define how to configure and use the design to implement a given functionality. The proposed encapsulation schemes help improving the amount of information that can be supplied with the design, and also allow to automate the process of searching, configuring and implementing diverse alternatives.
Resumo:
The availability of inertial sensors embedded in mobile devices has enabled a new type of interaction based on the movements or “gestures” made by the users when holding the device. In this paper we propose a gesture recognition system for mobile devices based on accelerometer and gyroscope measurements. The system is capable of recognizing a set of predefined gestures in a user-independent way, without the need of a training phase. Furthermore, it was designed to be executed in real-time in resource-constrained devices, and therefore has a low computational complexity. The performance of the system is evaluated offline using a dataset of gestures, and also online, through some user tests with the system running in a smart phone.
Resumo:
Este proyecto tiene como objetivo el desarrollo de una interfaz MIDI, basada en técnicas de procesamiento digital de la imagen, capaz de controlar diversos parámetros de un software de audio mediante información gestual: el movimiento de las manos. La imagen es capturada por una cámara Kinect comercial y los datos obtenidos por ésta son procesados en tiempo real. La finalidad es convertir la posición de varios puntos de control de nuestro cuerpo en información de control musical MIDI. La interfaz ha sido desarrollada en el lenguaje y entorno de programación Processing, el cual está basado en Java, es de libre distribución y de fácil utilización. El software de audio seleccionado es Ableton Live, versión 8.2.2, elegido porque es útil tanto para la composición musical como para la música en directo, y esto último es la principal utilidad que se le pretende dar a la interfaz. El desarrollo del proyecto se divide en dos bloques principales: el primero, diseño gráfico del controlador, y el segundo, la gestión de la información musical. En el primer apartado se justifica el diseño del controlador, formado por botones virtuales: se explica el funcionamiento y, brevemente, la función de cada botón. Este último tema es tratado en profundidad en el Anexo II: Manual de usuario. En el segundo bloque se explica el camino que realiza la información MIDI desde el procesador gestual hasta el sintetizador musical. Este camino empieza en Processing, desde donde se mandan los mensajes que más tarde son interpretados por el secuenciador seleccionado, Ableton Live. Una vez terminada la explicación con detalle del desarrollo del proyecto se exponen las conclusiones del autor acerca del desarrollo del proyecto, donde se encuentran los pros y los contras a tener en cuenta para poder sacar el máximo provecho en el uso del controlador . En este mismo bloque de la memoria se exponen posibles líneas futuras a desarrollar. Se facilita también un presupuesto, desglosado en costes materiales y de personal. ABSTRACT. The aim of this project is the development of a MIDI interface based on image digital processing techniques, able to control several parameters of an audio software using gestural information, the movement of the hands. The image is captured by a commercial Kinect camera and the data obtained by it are processed in real time. The purpose is to convert the position of various points of our body into MIDI musical control information. The interface has been developed in the Processing programming language and environment which is based on Java, freely available and easy to used. The audio software selected is Ableton Live, version 8.2.2, chosen because it is useful for both music composition and live music, and the latter is the interface main intended utility. The project development is divided into two main blocks: the controller graphic design, and the information management. The first section justifies the controller design, consisting of virtual buttons: it is explained the operation and, briefly, the function of each button. This latter topic is covered in detail in Annex II: user manual. In the second section it is explained the way that the MIDI information makes from the gestural processor to the musical synthesizer. It begins in Processing, from where the messages, that are later interpreted by the selected sequencer, Ableton Live, are sent. Once finished the detailed explanation of the project development, the author conclusions are presented, among which are found the pros and cons to take into account in order to take full advantage in the controller use. In this same block are explained the possible future aspects to develop. It is also provided a budget, broken down into material and personal costs.
Resumo:
En esta Tesis se presentan dos líneas de investigación relacionadas y que contribuyen a las áreas de Interacción Hombre-Tecnología (o Máquina; siglas en inglés: HTI o HMI), lingüística computacional y evaluación de la experiencia del usuario. Las dos líneas en cuestión son el diseño y la evaluación centrada en el usuario de sistemas de Interacción Hombre-Máquina avanzados. En la primera parte de la Tesis (Capítulos 2 a 4) se abordan cuestiones fundamentales del diseño de sistemas HMI avanzados. El Capítulo 2 presenta una panorámica del estado del arte de la investigación en el ámbito de los sistemas conversacionales multimodales, con la que se enmarca el trabajo de investigación presentado en el resto de la Tesis. Los Capítulos 3 y 4 se centran en dos grandes aspectos del diseño de sistemas HMI: un gestor del diálogo generalizado para tratar la Interacción Hombre-Máquina multimodal y sensible al contexto, y el uso de agentes animados personificados (ECAs) para mejorar la robustez del diálogo, respectivamente. El Capítulo 3, sobre gestión del diálogo, aborda el tratamiento de la heterogeneidad de la información proveniente de las modalidades comunicativas y de los sensores externos. En este capítulo se propone, en un nivel de abstracción alto, una arquitectura para la gestión del diálogo con influjos heterogéneos de información, apoyándose en el uso de State Chart XML. En el Capítulo 4 se presenta una contribución a la representación interna de intenciones comunicativas, y su traducción a secuencias de gestos a ejecutar por parte de un ECA, diseñados específicamente para mejorar la robustez en situaciones de diálogo críticas que pueden surgir, por ejemplo, cuando se producen errores de entendimiento en la comunicación entre el usuario humano y la máquina. Se propone, en estas páginas, una extensión del Functional Mark-up Language definido en el marco conceptual SAIBA. Esta extensión permite representar actos comunicativos que realizan intenciones del emisor (la máquina) que no se pretende sean captadas conscientemente por el receptor (el usuario humano), pero con las que se pretende influirle a éste e influir el curso del diálogo. Esto se consigue mediante un objeto llamado Base de Intenciones Comunicativas (en inglés, Communication Intention Base, o CIB). La representación en el CIB de intenciones “no claradas” además de las explícitas permite la construcción de actos comunicativos que realizan simultáneamente varias intenciones comunicativas. En el Capítulo 4 también se describe un sistema experimental para el control remoto (simulado) de un asistente domótico, con autenticación de locutor para dar acceso, y con un ECA en el interfaz de cada una de estas tareas. Se incluye una descripción de las secuencias de comportamiento verbal y no verbal de los ECAs, que fueron diseñados específicamente para determinadas situaciones con objeto de mejorar la robustez del diálogo. Los Capítulos 5 a 7 conforman la parte de la Tesis dedicada a la evaluación. El Capítulo 5 repasa antecedentes relevantes en la literatura de tecnologías de la información en general, y de sistemas de interacción hablada en particular. Los principales antecedentes en el ámbito de la evaluación de la interacción sobre los cuales se ha desarrollado el trabajo presentado en esta Tesis son el Technology Acceptance Model (TAM), la herramienta Subjective Assessment of Speech System Interfaces (SASSI), y la Recomendación P.851 de la ITU-T. En el Capítulo 6 se describen un marco y una metodología de evaluación aplicados a la experiencia del usuario con sistemas HMI multimodales. Se desarrolló con este propósito un novedoso marco de evaluación subjetiva de la calidad de la experiencia del usuario y su relación con la aceptación por parte del mismo de la tecnología HMI (el nombre dado en inglés a este marco es Subjective Quality Evaluation Framework). En este marco se articula una estructura de clases de factores subjetivos relacionados con la satisfacción y aceptación por parte del usuario de la tecnología HMI propuesta. Esta estructura, tal y como se propone en la presente tesis, tiene dos dimensiones ortogonales. Primero se identifican tres grandes clases de parámetros relacionados con la aceptación por parte del usuario: “agradabilidad ” (likeability: aquellos que tienen que ver con la experiencia de uso, sin entrar en valoraciones de utilidad), rechazo (los cuales sólo pueden tener una valencia negativa) y percepción de utilidad. En segundo lugar, este conjunto clases se reproduce para distintos “niveles, o focos, percepción del usuario”. Éstos incluyen, como mínimo, un nivel de valoración global del sistema, niveles correspondientes a las tareas a realizar y objetivos a alcanzar, y un nivel de interfaz (en los casos propuestos en esta tesis, el interfaz es un sistema de diálogo con o sin un ECA). En el Capítulo 7 se presenta una evaluación empírica del sistema descrito en el Capítulo 4. El estudio se apoya en los mencionados antecedentes en la literatura, ampliados con parámetros para el estudio específico de los agentes animados (los ECAs), la auto-evaluación de las emociones de los usuarios, así como determinados factores de rechazo (concretamente, la preocupación por la privacidad y la seguridad). También se evalúa el marco de evaluación subjetiva de la calidad propuesto en el capítulo anterior. Los análisis de factores efectuados revelan una estructura de parámetros muy cercana conceptualmente a la división de clases en utilidad-agradabilidad-rechazo propuesta en dicho marco, resultado que da cierta validez empírica al marco. Análisis basados en regresiones lineales revelan estructuras de dependencias e interrelación entre los parámetros subjetivos y objetivos considerados. El efecto central de mediación, descrito en el Technology Acceptance Model, de la utilidad percibida sobre la relación de dependencia entre la intención de uso y la facilidad de uso percibida, se confirma en el estudio presentado en la presente Tesis. Además, se ha encontrado que esta estructura de relaciones se fortalece, en el estudio concreto presentado en estas páginas, si las variables consideradas se generalizan para cubrir más ampliamente las categorías de agradabilidad y utilidad contempladas en el marco de evaluación subjetiva de calidad. Se ha observado, asimismo, que los factores de rechazo aparecen como un componente propio en los análisis de factores, y además se distinguen por su comportamiento: moderan la relación entre la intención de uso (que es el principal indicador de la aceptación del usuario) y su predictor más fuerte, la utilidad percibida. Se presentan también resultados de menor importancia referentes a los efectos de los ECAs sobre los interfaces de los sistemas de diálogo y sobre los parámetros de percepción y las valoraciones de los usuarios que juegan un papel en conformar su aceptación de la tecnología. A pesar de que se observa un rendimiento de la interacción dialogada ligeramente mejor con ECAs, las opiniones subjetivas son muy similares entre los dos grupos experimentales (uno interactuando con un sistema de diálogo con ECA, y el otro sin ECA). Entre las pequeñas diferencias encontradas entre los dos grupos destacan las siguientes: en el grupo experimental sin ECA (es decir, con interfaz sólo de voz) se observó un efecto más directo de los problemas de diálogo (por ejemplo, errores de reconocimiento) sobre la percepción de robustez, mientras que el grupo con ECA tuvo una respuesta emocional más positiva cuando se producían problemas. Los ECAs parecen generar inicialmente expectativas más elevadas en cuanto a las capacidades del sistema, y los usuarios de este grupo se declaran más seguros de sí mismos en su interacción. Por último, se observan algunos indicios de efectos sociales de los ECAs: la “amigabilidad ” percibida los ECAs estaba correlada con un incremento la preocupación por la seguridad. Asimismo, los usuarios del sistema con ECAs tendían más a culparse a sí mismos, en lugar de culpar al sistema, de los problemas de diálogo que pudieran surgir, mientras que se observó una ligera tendencia opuesta en el caso de los usuarios del sistema con interacción sólo de voz. ABSTRACT This Thesis presents two related lines of research work contributing to the general fields of Human-Technology (or Machine) Interaction (HTI, or HMI), computational linguistics, and user experience evaluation. These two lines are the design and user-focused evaluation of advanced Human-Machine (or Technology) Interaction systems. The first part of the Thesis (Chapters 2 to 4) is centred on advanced HMI system design. Chapter 2 provides a background overview of the state of research in multimodal conversational systems. This sets the stage for the research work presented in the rest of the Thesis. Chapers 3 and 4 focus on two major aspects of HMI design in detail: a generalised dialogue manager for context-aware multimodal HMI, and embodied conversational agents (ECAs, or animated agents) to improve dialogue robustness, respectively. Chapter 3, on dialogue management, deals with how to handle information heterogeneity, both from the communication modalities or from external sensors. A highly abstracted architectural contribution based on State Chart XML is proposed. Chapter 4 presents a contribution for the internal representation of communication intentions and their translation into gestural sequences for an ECA, especially designed to improve robustness in critical dialogue situations such as when miscommunication occurs. We propose an extension of the functionality of Functional Mark-up Language, as envisaged in much of the work in the SAIBA framework. Our extension allows the representation of communication acts that carry intentions that are not for the interlocutor to know of, but which are made to influence him or her as well as the flow of the dialogue itself. This is achieved through a design element we have called the Communication Intention Base. Such r pr s ntation of “non- clar ” int ntions allows th construction of communication acts that carry several communication intentions simultaneously. Also in Chapter 4, an experimental system is described which allows (simulated) remote control to a home automation assistant, with biometric (speaker) authentication to grant access, featuring embodied conversation agents for each of the tasks. The discussion includes a description of the behavioural sequences for the ECAs, which were designed for specific dialogue situations with particular attention given to the objective of improving dialogue robustness. Chapters 5 to 7 form the evaluation part of the Thesis. Chapter 5 reviews evaluation approaches in the literature for information technologies, as well as in particular for speech-based interaction systems, that are useful precedents to the contributions of the present Thesis. The main evaluation precedents on which the work in this Thesis has built are the Technology Acceptance Model (TAM), the Subjective Assessment of Speech System Interfaces (SASSI) tool, and ITU-T Recommendation P.851. Chapter 6 presents the author’s work in establishing an valuation framework and methodology applied to the users’ experience with multimodal HMI systems. A novel user-acceptance Subjective Quality Evaluation Framework was developed by the author specifically for this purpose. A class structure arises from two orthogonal sets of dimensions. First we identify three broad classes of parameters related with user acceptance: likeability factors (those that have to do with the experience of using the system), rejection factors (which can only have a negative valence) and perception of usefulness. Secondly, the class structure is further broken down into several “user perception levels”; at the very least: an overall system-assessment level, task and goal-related levels, and an interface level (e.g., a dialogue system with or without an ECA). An empirical evaluation of the system described in Chapter 4 is presented in Chapter 7. The study was based on the abovementioned precedents in the literature, expanded with categories covering the inclusion of an ECA, the users’ s lf-assessed emotions, and particular rejection factors (privacy and security concerns). The Subjective Quality Evaluation Framework proposed in the previous chapter was also scrutinised. Factor analyses revealed an item structure very much related conceptually to the usefulness-likeability-rejection class division introduced above, thus giving it some empirical weight. Regression-based analysis revealed structures of dependencies, paths of interrelations, between the subjective and objective parameters considered. The central mediation effect, in the Technology Acceptance Model, of perceived usefulness on the dependency relationship of intention-to-use with perceived ease of use was confirmed in this study. Furthermore, the pattern of relationships was stronger for variables covering more broadly the likeability and usefulness categories in the Subjective Quality Evaluation Framework. Rejection factors were found to have a distinct presence as components in factor analyses, as well as distinct behaviour: they were found to moderate the relationship between intention-to-use (the main measure of user acceptance) and its strongest predictor, perceived usefulness. Insights of secondary importance are also given regarding the effect of ECAs on the interface of spoken dialogue systems and the dimensions of user perception and judgement attitude that may have a role in determining user acceptance of the technology. Despite observing slightly better performance values in the case of the system with the ECA, subjective opinions regarding both systems were, overall, very similar. Minor differences between two experimental groups (one interacting with an ECA, the other only through speech) include a more direct effect of dialogue problems (e.g., non-understandings) on perceived dialogue robustness for the voice-only interface test group, and a more positive emotional response for the ECA test group. Our findings further suggest that the ECA generates higher initial expectations, and users seem slightly more confident in their interaction with the ECA than do those without it. Finally, mild evidence of social effects of ECAs was also found: the perceived friendliness of the ECA increased security concerns, and ECA users may tend to blame themselves rather than the system when dialogue problems are encountered, while the opposite may be true for voice-only users.