941 resultados para Multi-modal dialogue system


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, a proposal of a multi-modal dialogue system oriented to multilingual question-answering is presented. This system includes the following ways of access: voice, text, avatar, gestures and signs language. The proposal is oriented to the question-answering task as a user interaction mechanism. The proposal here presented is in the first stages of its development phase and the architecture is presented for the first time on the base of the experiences in question-answering and dialogues previously developed. The main objective of this research work is the development of a solid platform that will permit the modular integration of the proposed architecture.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a localization system that targets rapid deployment of stationary wireless sensor networks (WSN). The system uses a particle filter to fuse measurements from multiple localization modalities, such as RF ranging, neighbor information or maps, to obtain position estimations with higher accuracy than that of the individual modalities. The system isolates different modalities into separate components which can be included or excluded independently to tailor the system to a specific scenario. We show that position estimations can be improved with our system by combining multiple modalities. We evaluate the performance of the system in both an indoor and outdoor environment using combinations of five different modalities. Using two anchor nodes as reference points and combining all five modalities, we obtain RMS (Root Mean Square) estimation errors of approximately 2.5m in both cases, while using the components individually results in errors within the range of 3.5 and 9 m.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Intelligent surveillance systems typically use a single visual spectrum modality for their input. These systems work well in controlled conditions, but often fail when lighting is poor, or environmental effects such as shadows, dust or smoke are present. Thermal spectrum imagery is not as susceptible to environmental effects, however thermal imaging sensors are more sensitive to noise and they are only gray scale, making distinguishing between objects difficult. Several approaches to combining the visual and thermal modalities have been proposed, however they are limited by assuming that both modalities are perfuming equally well. When one modality fails, existing approaches are unable to detect the drop in performance and disregard the under performing modality. In this paper, a novel middle fusion approach for combining visual and thermal spectrum images for object tracking is proposed. Motion and object detection is performed on each modality and the object detection results for each modality are fused base on the current performance of each modality. Modality performance is determined by comparing the number of objects tracked by the system with the number detected by each mode, with a small allowance made for objects entering and exiting the scene. The tracking performance of the proposed fusion scheme is compared with performance of the visual and thermal modes individually, and a baseline middle fusion scheme. Improvement in tracking performance using the proposed fusion approach is demonstrated. The proposed approach is also shown to be able to detect the failure of an individual modality and disregard its results, ensuring performance is not degraded in such situations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Investigates the use of temporal lip information, in conjunction with speech information, for robust, text-dependent speaker identification. We propose that significant speaker-dependent information can be obtained from moving lips, enabling speaker recognition systems to be highly robust in the presence of noise. The fusion structure for the audio and visual information is based around the use of multi-stream hidden Markov models (MSHMM), with audio and visual features forming two independent data streams. Recent work with multi-modal MSHMMs has been performed successfully for the task of speech recognition. The use of temporal lip information for speaker identification has been performed previously (T.J. Wark et al., 1998), however this has been restricted to output fusion via single-stream HMMs. We present an extension to this previous work, and show that a MSHMM is a valid structure for multi-modal speaker identification

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To sustain an ongoing rapid growth of video information, there is an emerging demand for a sophisticated content-based video indexing system. However, current video indexing solutions are still immature and lack of any standard. This doctoral consists of a research work based on an integrated multi-modal approach for sports video indexing and retrieval. By combining specific features extractable from multiple audio-visual modalities, generic structure and specific events can be detected and classified. During browsing and retrieval, users will benefit from the integration of high-level semantic and some descriptive mid-level features such as whistle and close-up view of player(s).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In attempting to build intelligent litigation support tools, we have moved beyond first generation, production rule legal expert systems. Our work integrates rule based and case based reasoning with intelligent information retrieval. When using the case based reasoning methodology, or in our case the specialisation of case based retrieval, we need to be aware of how to retrieve relevant experience. Our research, in the legal domain, specifies an approach to the retrieval problem which relies heavily on an extended object oriented/rule based system architecture that is supplemented with causal background information. We use a distributed agent architecture to help support the reasoning process of lawyers. Our approach to integrating rule based reasoning, case based reasoning and case based retrieval is contrasted to the CABARET and PROLEXS architectures which rely on a centralised blackboard architecture. We discuss in detail how our various cooperating agents interact, and provide examples of the system at work. The IKBALS system uses a specialised induction algorithm to induce rules from cases. These rules are then used as indices during the case based retrieval process. Because we aim to build legal support tools which can be modified to suit various domains rather than single purpose legal expert systems, we focus on principles behind developing legal knowledge based systems. The original domain chosen was theAccident Compensation Act 1989 (Victoria, Australia), which relates to the provision of benefits for employees injured at work. For various reasons, which are indicated in the paper, we changed our domain to that ofCredit Act 1984 (Victoria, Australia). This Act regulates the provision of loans by financial institutions. The rule based part of our system which provides advice on the Credit Act has been commercially developed in conjunction with a legal firm. We indicate how this work has lead to the development of a methodology for constructing rule based legal knowledge based systems. We explain the process of integrating this existing commercial rule based system with the case base reasoning and retrieval architecture.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For more dexterous and agile legged robot locomotion, alternative actuation has been one of the most long-awaited technologies. The goal of this paper is to investigate the use of newly developed actuator, the so-called Linear Multi-Modal Actuator (LMMA), in the context of legged robot locomotion, and analyze the behavioral performance of it. The LMMA consists of three discrete couplings which enable the system to switch between different mechanical dynamics such as instantaneous switches between series elastic and fully actuated dynamics. To test this actuator for legged locomotion, this paper introduces a one-legged robot platform we developed to implement the actuator, and explains a novel control strategy for hopping, i.e. 'preloaded hopping control'. This control strategy takes advantage of the coupling mechanism of the LMMA to preload the series elasticity during the flight phase to improve the energy efficiency of hopping locomotion. This paper shows a series of experimental results that compare the control strategy with a simple sinusoidal actuation strategy to discuss the benefits and challenges of the proposed approach. © 2013 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Awareness of emerging situations in a dynamic operational environment of a robotic assistive device is an essential capability of such a cognitive system, based on its effective and efficient assessment of the prevailing situation. This allows the system to interact with the environment in a sensible (semi)autonomous / pro-active manner without the need for frequent interventions from a supervisor. In this paper, we report a novel generic Situation Assessment Architecture for robotic systems directly assisting humans as developed in the CORBYS project. This paper presents the overall architecture for situation assessment and its application in proof-of-concept Demonstrators as developed and validated within the CORBYS project. These include a robotic human follower and a mobile gait rehabilitation robotic system. We present an overview of the structure and functionality of the Situation Assessment Architecture for robotic systems with results and observations as collected from initial validation on the two CORBYS Demonstrators.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To sustain an ongoing rapid growth of video information, there is an emerging demand for a sophisticated content-based video indexing system. However, current video indexing solutions are still immature and lack of any standard. This doctoral consists of a research work based on an integrated multi-modal approach for sports video indexing and retrieval. By combining specific features extractable from multiple audio-visual modalities, generic structure and specific events can be detected and classified. During browsing and retrieval, users will benefit from the integration of high-level semantic and some descriptive mid-level features such as whistle and close-up view of player(s).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper addresses the coordinated use of video and audio cues to capture and index surveillance events with multimodal labels. The focus of this paper is the development of a joint-sensor calibration technique that uses audio-visual observations to improve the calibration process. One significant feature of this approach is the ability to continuously check and update the calibration status of the sensor suite, making it resilient to independent drift in the individual sensors. We present scenarios in which this system is used to enhance surveillance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

GuideView is a system designed for structured, multi-modal delivery of clinical guidelines. Clinical instructions are presented simultaneously in voice, text, pictures or video or animations. Users navigate using mouse-clicks and voice commands. An evaluation study performed at a medical simulation laboratory found that voice and video instructions were rated highly.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work describes preliminary results of a two-modality imaging system aimed at the early detection of breast cancer. The first technique is based on compounding conventional echographic images taken at regular angular intervals around the imaged breast. The other modality obtains tomographic images of propagation velocity using the same circular geometry. For this study, a low-cost prototype has been built. It is based on a pair of opposed 128-element, 3.2 MHz array transducers that are mechanically moved around tissue mimicking phantoms. Compounded images around 360 degrees provide improved resolution, clutter reduction, artifact suppression and reinforce the visualization of internal structures. However, refraction at the skin interface must be corrected for an accurate image compounding process. This is achieved by estimation of the interface geometry followed by computing the internal ray paths. On the other hand, sound velocity tomographic images from time of flight projections have been also obtained. Two reconstruction methods, Filtered Back Projection (FBP) and 2D Ordered Subset Expectation Maximization (2D OSEM), were used as a first attempt towards tomographic reconstruction. These methods yield useable images in short computational times that can be considered as initial estimates in subsequent more complex methods of ultrasound image reconstruction. These images may be effective to differentiate malignant and benign masses and are very promising for breast cancer screening. (C) 2015 The Authors. Published by Elsevier B.V.