963 resultados para multimodal terminals


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, a novel video-based multimodal biometric verification scheme using the subspace-based low-level feature fusion of face and speech is developed for specific speaker recognition for perceptual human--computer interaction (HCI). In the proposed scheme, human face is tracked and face pose is estimated to weight the detected facelike regions in successive frames, where ill-posed faces and false-positive detections are assigned with lower credit to enhance the accuracy. In the audio modality, mel-frequency cepstral coefficients are extracted for voice-based biometric verification. In the fusion step, features from both modalities are projected into nonlinear Laplacian Eigenmap subspace for multimodal speaker recognition and combined at low level. The proposed approach is tested on the video database of ten human subjects, and the results show that the proposed scheme can attain better accuracy in comparison with the conventional multimodal fusion using latent semantic analysis as well as the single-modality verifications. The experiment on MATLAB shows the potential of the proposed scheme to attain the real-time performance for perceptual HCI applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

SEMAINE has created a large audiovisual database as a part of an iterative approach to building Sensitive Artificial Listener (SAL) agents that can engage a person in a sustained, emotionally colored conversation. Data used to build the agents came from interactions between users and an operator simulating a SAL agent, in different configurations: Solid SAL (designed so that operators displayed an appropriate nonverbal behavior) and Semi-automatic SAL (designed so that users' experience approximated interacting with a machine). We then recorded user interactions with the developed system, Automatic SAL, comparing the most communicatively competent version to versions with reduced nonverbal skills. High quality recording was provided by five high-resolution, high-framerate cameras, and four microphones, recorded synchronously. Recordings total 150 participants, for a total of 959 conversations with individual SAL characters, lasting approximately 5 minutes each. Solid SAL recordings are transcribed and extensively annotated: 6-8 raters per clip traced five affective dimensions and 27 associated categories. Other scenarios are labeled on the same pattern, but less fully. Additional information includes FACS annotation on selected extracts, identification of laughs, nods, and shakes, and measures of user engagement with the automatic system. The material is available through a web-accessible database. © 2010-2012 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper addresses the problem of optimally locating intermodal freight terminals in Serbia. To solve this problem and determine the effects of the resulting scenarios, two modeling approaches were combined. The first approach is based on multiple-assignment hub-network design, and the second is based on simulation. The multiple-assignment p-hub network location model was used to determine the optimal location of intermodal terminals. Simulation was used as a tool to estimate intermodal transport flow volumes, due to the unreliability and unavailability of specific statistical data, and as a method for quantitatively analyzing the economic, time, and environmental effects of different scenarios of intermodal terminal development. The results presented here represent a summary, with some extension, of the research realized in the IMOD-X project (Intermodal Solutions for Competitive Transport in Serbia).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a novel method of audio-visual feature-level fusion for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowledge about the corruption. Furthermore, we assume there are limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new multimodal feature representation and a modified cosine similarity are introduced to combine and compare bimodal features with limited training data, as well as vastly differing data rates and feature sizes. Optimal feature selection and multicondition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Experiments have been carried out on a bimodal dataset created from the SPIDRE speaker recognition database and AR face recognition database with variable noise corruption of speech and occlusion in the face images. The system's speaker identification performance on the SPIDRE database, and facial identification performance on the AR database, is comparable with the literature. Combining both modalities using the new method of multimodal fusion leads to significantly improved accuracy over the unimodal systems, even when both modalities have been corrupted. The new method also shows improved identification accuracy compared with the bimodal systems based on multicondition model training or missing-feature decoding alone.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Structural and functional change in the microcirculation in type 1 diabetes mellitus predicts future end-organ damage and macrovascular events. We explored the utility of novel signal processing techniques to detect and track change in ocular hemodynamics in patients with this disease. 24 patients with uncomplicated type 1 diabetes mellitus, and 18 age-and-sex matched control subjects were studied. Doppler ultrasound was used to interrogate the carotid and ophthalmic arteries and digital photography to image the retinal vasculature. Frequency analysis algorithms were applied to quantify velocity waveform structure and retinal photographic data at baseline and following inhalation of 100% oxygen. Frequency data was compared between groups. No significant differences were found in the resistive index between groups at baseline or following inhaled oxygen. Frequency analysis of the Doppler flow velocity waveforms identified significant differences in bands 3-7 between patients and controls in data captured from the ophthalmic artery (p<0.01 for each band). In response to inhaled oxygen, changes in the frequency band amplitudes were significantly greater in control subjects compared with patients (p<0.05). Only control subjects demonstrated a positive correlation (R=0.61) between change in retinal vessel diameter and frequency band amplitudes derived from ophthalmic artery waveform data. The use of multimodal signal processing techniques applied to Doppler flow velocity waveforms and retinal photographic data identified preclinical change in the ocular microcirculation in patients with uncomplicated diabetes mellitus. An impaired autoregulatory response of the retinal microvasculature may contribute to the future development of retinopathy in such patients.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A novel dual-band printed diversity antenna is proposed and studied. The antenna, which consists of two back-to- back monopoles with symmetric configuration, is printed on a printed circuit board. The effects of some important parameters of the proposed antenna are deeply studied and the design methodology is given. A prototype of the proposed antenna operating at UMTS (1920-2170 MHz) and 2.4-GHz WLAN (2400-2484 MHz) bands is provided to demonstrate the usability of the methodology in dual-band diversity antenna for mobile terminals. In the above two bands, the isolations of the prototype are larger than 13 dB and 16 dB, respectively. The measured radiation patterns of the two monopoles in general cover complementary space regions. The diversity performance is also evaluated by calculating the envelope correlation coefficient, the mean effective gains of the antenna elements and the diversity gain. It is proved that the proposed antenna can provide spatial and pattern diversity to combat multipath fading.