45 resultados para Multimodal


Relevância:

20.00% 20.00%

Publicador:

Resumo:

A practically viable multi-biometric recognition system should not only be stable, robust and accurate but should also adhere to real-time processing speed and memory constraints. This study proposes a cascaded classifier-based framework for use in biometric recognition systems. The proposed framework utilises a set of weak classifiers to reduce the enrolled users' dataset to a small list of candidate users. This list is then used by a strong classifier set as the final stage of the cascade to formulate the decision. At each stage, the candidate list is generated by a Mahalanobis distance-based match score quality measure. One of the key features of the authors framework is that each classifier in the ensemble can be designed to use a different modality thus providing the advantages of a truly multimodal biometric recognition system. In addition, it is one of the first truly multimodal cascaded classifier-based approaches for biometric recognition. The performance of the proposed system is evaluated both for single and multimodalities to demonstrate the effectiveness of the approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Invited Plenary Speaker

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we present a convolutional neuralnetwork (CNN)-based model for human head pose estimation inlow-resolution multi-modal RGB-D data. We pose the problemas one of classification of human gazing direction. We furtherfine-tune a regressor based on the learned deep classifier. Next wecombine the two models (classification and regression) to estimateapproximate regression confidence. We present state-of-the-artresults in datasets that span the range of high-resolution humanrobot interaction (close up faces plus depth information) data tochallenging low resolution outdoor surveillance data. We buildupon our robust head-pose estimation and further introduce anew visual attention model to recover interaction with theenvironment. Using this probabilistic model, we show thatmany higher level scene understanding like human-human/sceneinteraction detection can be achieved. Our solution runs inreal-time on commercial hardware

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Goal-directed, coordinated movements in humans emerge from a variety of constraints that range from 'high-level' cognitive strategies based oil perception of the task to 'low-level' neuromuscular-skeletal factors such as differential contributions to coordination from flexor and extensor muscles. There has been a tendency in the literature to dichotomize these sources of constraint, favouring one or the other rather than recognizing and understanding their mutual interplay. In this experiment, subjects were required to coordinate rhythmic flexion and extension movements with an auditory metronome, the rate of which was systematically increased. When subjects started in extension on the beat of the metronome, there was a small tendency to switch to flexion at higher rates, but not vice versa. When subjects: were asked to contact a physical stop, the location of which was either coincident with or counterphase to the auditor) stimulus, two effects occurred. When haptic contact was coincident with sound, coordination was stabilized for both flexion and extension. When haptic contact was counterphase to the metronome, coordination was actually destabilized, with transitions occurring from both extension to flexion on the beat and from flexion to extension on the beat. These results reveal the complementary nature of strategic and neuromuscular factors in sensorimotor coordination. They also suggest the presence of a multimodal neural integration process-which is parametrizable by rate and context - in which intentional movement, touch and sound are bound into a single, coherent unit.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present a multimodal detection and tracking algorithm for sensors composed of a camera mounted between two microphones. Target localization is performed on color-based change detection in the video modality and on time difference of arrival (TDOA) estimation between the two microphones in the audio modality. The TDOA is computed by multiband generalized cross correlation (GCC) analysis. The estimated directions of arrival are then postprocessed using a Riccati Kalman filter. The visual and audio estimates are finally integrated, at the likelihood level, into a particle filter (PF) that uses a zero-order motion model, and a weighted probabilistic data association (WPDA) scheme. We demonstrate that the Kalman filtering (KF) improves the accuracy of the audio source localization and that the WPDA helps to enhance the tracking performance of sensor fusion in reverberant scenarios. The combination of multiband GCC, KF, and WPDA within the particle filtering framework improves the performance of the algorithm in noisy scenarios. We also show how the proposed audiovisual tracker summarizes the observed scene by generating metadata that can be transmitted to other network nodes instead of transmitting the raw images and can be used for very low bit rate communication. Moreover, the generated metadata can also be used to detect and monitor events of interest.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Haptic information originates from a different human sense (touch), therefore the quality of service (QoS) required to supporthaptic traffic is significantly different from that used to support conventional real-time traffic such as voice or video. Each type ofnetwork impairment has different (and severe) impacts on the user’s haptic experience. There has been no specific provision of QoSparameters for haptic interaction. Previous research into distributed haptic virtual environments (DHVEs) have concentrated onsynchronization of positions (haptic device or virtual objects), and are based on client-server architectures.We present a new peerto-peer DHVE architecture that further extends this to enable force interactions between two users whereby force data are sent tothe remote peer in addition to positional information. The work presented involves both simulation and practical experimentationwhere multimodal data is transmitted over a QoS-enabled IP network. Both forms of experiment produce consistent results whichshow that the use of specific QoS classes for haptic traffic will reduce network delay and jitter, leading to improvements in users’haptic experiences with these types of applications.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

For many applications of emotion recognition, such as virtual agents, the system must select responses while the user is speaking. This requires reliable on-line recognition of the user’s affect. However most emotion recognition systems are based on turnwise processing. We present a novel approach to on-line emotion recognition from speech using Long Short-Term Memory Recurrent Neural Networks. Emotion is recognised frame-wise in a two-dimensional valence-activation continuum. In contrast to current state-of-the-art approaches, recognition is performed on low-level signal frames, similar to those used for speech recognition. No statistical functionals are applied to low-level feature contours. Framing at a higher level is therefore unnecessary and regression outputs can be produced in real-time for every low-level input frame. We also investigate the benefits of including linguistic features on the signal frame level obtained by a keyword spotter.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A method is discussed for measuring the acoustic impedance of tubular objects that gives accurate results for a wide range of frequencies. The apparatus that is employed is similar to that used in many previously developed methods; it consists of a cylindrical measurement duct fitted with several microphones, of which two are active in each measurement session, and a driver at one of its ends. The object under study is fitted at the other end. The impedance of the object is determined from the microphone signals obtained during excitation of the air inside the 1 duct by the driver, and from three coefficients that are pre-determined using four calibration measurements with closed cylindrical tubes. The calibration procedure is based on the simple mathematical relationships between the impedances of the calibration tubes, and does not require knowledge of the propagation constant. Measurements with a cylindrical tube yield an estimate of the attenuation constant for plane waves, which is found to differ from the theoretical prediction by less than 1.4% in the frequency range 1 kHz-20 kHz. Impedance measurements of objects with abrupt changes in diameter are found to be in good agreement with multimodal theory.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes a substantial effort to build a real-time interactive multimodal dialogue system with a focus on emotional and non-verbal interaction capabilities. The work is motivated by the aim to provide technology with competences in perceiving and producing the emotional and non-verbal behaviours required to sustain a conversational dialogue. We present the Sensitive Artificial Listener (SAL) scenario as a setting which seems particularly suited for the study of emotional and non- verbal behaviour, since it requires only very limited verbal understanding on the part of the machine. This scenario allows us to concentrate on non-verbal capabilities without having to address at the same time the challenges of spoken language understanding, task modeling etc. We first report on three prototype versions of the SAL scenario, in which the behaviour of the Sensitive Artificial Listener characters was determined by a human operator. These prototypes served the purpose of verifying the effectiveness of the SAL scenario and allowed us to collect data required for building system components for analysing and synthesising the respective behaviours. We then describe the fully autonomous integrated real-time system we created, which combines incremental analysis of user behaviour, dialogue management, and synthesis of speaker and listener behaviour of a SAL character displayed as a virtual agent. We discuss principles that should underlie the evaluation of SAL-type systems. Since the system is designed for modularity and reuse, and since it is publicly available, the SAL system has potential as a joint research tool in the affective computing research community.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This article reviews and discusses how metaphor as a trope has been regarded as an essential element in rhetorical approaches to reading and to writing. In addition it considers the extent to which, while metaphor-making is a fundamental cognitive capacity, a metaphorizing habit of mind may be especially pertinent to some aspects of aesthetic activity in English and it has salience also in a multimodal environment. There is exploration of how contemporary practice in the English classroom could accommodate and consolidate the ability to metaphorize.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Popular culture has been inundated with stories and images of True Crime for a long time, which is testament to people’s enduring fascination with criminals and their deviant actions. In such stories, which present actual cases of notorious crimes in a style that often resembles fiction, criminals are either reviled as monsters or lauded as cultural icons. More recently, popular autobiographical accounts by criminals themselves have begun to emerge within this True Crime genre. Typically self-celebratory in nature, such representations construct a rather glamorized public image of the author. This article undertakes a multimodal analysis of what has been classed as one typical example of this True Crime sub-genre, Australian Mark Brandon Read’s autobiographical account Chopper: From the Inside. It thereby seeks to demonstrate that the book, while glamorizing and mythologizing its protagonist, simultaneously offers scope for a qualitative understanding of Read’s life of crime and the sensual dynamics of his violent offending. To this end, the analysis focuses on some of the linguistic and pictorial strategies Read employs in constructing a public image of himself that alternates between the dangerous ‘hardman’ and the ‘larrikin’ criminal hero. However, it is also shown that Read’s account reveals a degree of critical self-reflection. In addition to the multimodal analysis, the article also endeavours to explore the link between celebrity and crime, thereby engaging with the nature of popular culture’s fascination with celebrated criminals.