910 resultados para Multi-modal information processing


Relevância:

100.00% 100.00%

Publicador:

Resumo:

To detect and annotate the key events of live sports videos, we need to tackle the semantic gaps of audio-visual information. Previous work has successfully extracted semantic from the time-stamped web match reports, which are synchronized with the video contents. However, web and social media articles with no time-stamps have not been fully leveraged, despite they are increasingly used to complement the coverage of major sporting tournaments. This paper aims to address this limitation using a novel multimodal summarization framework that is based on sentiment analysis and players' popularity. It uses audiovisual contents, web articles, blogs, and commentators' speech to automatically annotate and visualize the key events and key players in a sports tournament coverage. The experimental results demonstrate that the automatically generated video summaries are aligned with the events identified from the official website match reports.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The problems and methods for adaptive control and multi-agent processing of information in global telecommunication and computer networks (TCN) are discussed. Criteria for controllability and communication ability (routing ability) of dataflows are described. Multi-agent model for exchange of divided information resources in global TCN has been suggested. Peculiarities for adaptive and intelligent control of dataflows in uncertain conditions and network collisions are analyzed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fingerprinting is a well known approach for identifying multimedia data without having the original data present but what amounts to its essence or ”DNA”. Current approaches show insufficient deployment of three types of knowledge that could be brought to bear in providing a finger printing framework that remains effective, efficient and can accommodate both the whole as well as elemental protection at appropriate levels of abstraction to suit various Foci of Interest (FoI) in an image or cross media artefact. Thus our proposed framework aims to deliver selective composite fingerprinting that remains responsive to the requirements for protection of whole or parts of an image which may be of particularly interest and be especially vulnerable to attempts at rights violation. This is powerfully aided by leveraging both multi-modal information as well as a rich spectrum of collateral context knowledge including both image-level collaterals as well as the inevitably needed market intelligence knowledge such as customers’ social networks interests profiling which we can deploy as a crucial component of our Fingerprinting Collateral Knowledge. This is used in selecting the special FoIs within an image or other media content that have to be selectively and collaterally protected.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fingerprinting is a well known approach for identifying multimedia data without having the original data present but instead what amounts to its essence or 'DNA'. Current approaches show insufficient deployment of various types of knowledge that could be brought to bear in providing a fingerprinting framework that remains effective, efficient and can accommodate both the whole as well as elemental protection at appropriate levels of abstraction to suit various Zones of Interest (ZoI) in an image or cross media artefact. The proposed framework aims to deliver selective composite fingerprinting that is powerfully aided by leveraging both multi-modal information as well as a rich spectrum of collateral context knowledge including both image-level collaterals and also the inevitably needed market intelligence knowledge such as customers' social networks interests profiling which we can deploy as a crucial component of our fingerprinting collateral knowledge.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Investigates the use of temporal lip information, in conjunction with speech information, for robust, text-dependent speaker identification. We propose that significant speaker-dependent information can be obtained from moving lips, enabling speaker recognition systems to be highly robust in the presence of noise. The fusion structure for the audio and visual information is based around the use of multi-stream hidden Markov models (MSHMM), with audio and visual features forming two independent data streams. Recent work with multi-modal MSHMMs has been performed successfully for the task of speech recognition. The use of temporal lip information for speaker identification has been performed previously (T.J. Wark et al., 1998), however this has been restricted to output fusion via single-stream HMMs. We present an extension to this previous work, and show that a MSHMM is a valid structure for multi-modal speaker identification

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, cognitive load analysis via acoustic- and CAN-Bus-based driver performance metrics is employed to assess two different commercial speech dialog systems (SDS) during in-vehicle use. Several metrics are proposed to measure increases in stress, distraction and cognitive load and we compare these measures with statistical analysis of the speech recognition component of each SDS. It is found that care must be taken when designing an SDS as it may increase cognitive load which can be observed through increased speech response delay (SRD), changes in speech production due to negative emotion towards the SDS, and decreased driving performance on lateral control tasks. From this study, guidelines are presented for designing systems which are to be used in vehicular environments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Intelligent surveillance systems typically use a single visual spectrum modality for their input. These systems work well in controlled conditions, but often fail when lighting is poor, or environmental effects such as shadows, dust or smoke are present. Thermal spectrum imagery is not as susceptible to environmental effects, however thermal imaging sensors are more sensitive to noise and they are only gray scale, making distinguishing between objects difficult. Several approaches to combining the visual and thermal modalities have been proposed, however they are limited by assuming that both modalities are perfuming equally well. When one modality fails, existing approaches are unable to detect the drop in performance and disregard the under performing modality. In this paper, a novel middle fusion approach for combining visual and thermal spectrum images for object tracking is proposed. Motion and object detection is performed on each modality and the object detection results for each modality are fused base on the current performance of each modality. Modality performance is determined by comparing the number of objects tracked by the system with the number detected by each mode, with a small allowance made for objects entering and exiting the scene. The tracking performance of the proposed fusion scheme is compared with performance of the visual and thermal modes individually, and a baseline middle fusion scheme. Improvement in tracking performance using the proposed fusion approach is demonstrated. The proposed approach is also shown to be able to detect the failure of an individual modality and disregard its results, ensuring performance is not degraded in such situations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Non-rigid image registration is an essential tool required for overcoming the inherent local anatomical variations that exist between images acquired from different individuals or atlases. Furthermore, certain applications require this type of registration to operate across images acquired from different imaging modalities. One popular local approach for estimating this registration is a block matching procedure utilising the mutual information criterion. However, previous block matching procedures generate a sparse deformation field containing displacement estimates at uniformly spaced locations. This neglects to make use of the evidence that block matching results are dependent on the amount of local information content. This paper presents a solution to this drawback by proposing the use of a Reversible Jump Markov Chain Monte Carlo statistical procedure to optimally select grid points of interest. Three different methods are then compared to propagate the estimated sparse deformation field to the entire image including a thin-plate spline warp, Gaussian convolution, and a hybrid fluid technique. Results show that non-rigid registration can be improved by using the proposed algorithm to optimally select grid points of interest.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We formulate and interpret several multi-modal registration methods in the context of a unified statistical and information theoretic framework. A unified interpretation clarifies the implicit assumptions of each method yielding a better understanding of their relative strengths and weaknesses. Additionally, we discuss a generative statistical model from which we derive a novel analysis tool, the "auto-information function", as a means of assessing and exploiting the common spatial dependencies inherent in multi-modal imagery. We analytically derive useful properties of the "auto-information" as well as verify them empirically on multi-modal imagery. Among the useful aspects of the "auto-information function" is that it can be computed from imaging modalities independently and it allows one to decompose the search space of registration problems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we propose a variational approach for multimodal image registration based on the diffeomorphic demons algorithm. Diffeomorphic demons has proven to be a robust and efficient way for intensity-based image registration. However, the main drawback is that it cannot deal with multiple modalities. We propose to replace the standard demons similarity metric (image intensity differences) by point-wise mutual information (PMI) in the energy function. By comparing the accuracy between our PMI based diffeomorphic demons and the B-Spline based free-form deformation approach (FFD) on simulated deformations, we show the proposed algorithm performs significantly better.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The paper describes education complex "Multi-agent Technologies for Parallel and Distributed Information Processing in Telecommunication Networks".

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To sustain an ongoing rapid growth of video information, there is an emerging demand for a sophisticated content-based video indexing system. However, current video indexing solutions are still immature and lack of any standard. This doctoral consists of a research work based on an integrated multi-modal approach for sports video indexing and retrieval. By combining specific features extractable from multiple audio-visual modalities, generic structure and specific events can be detected and classified. During browsing and retrieval, users will benefit from the integration of high-level semantic and some descriptive mid-level features such as whistle and close-up view of player(s).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traffic congestion has a significant impact on the economy and environment. Encouraging the use of multimodal transport (public transport, bicycle, park’n’ride, etc.) has been identified by traffic operators as a good strategy to tackle congestion issues and its detrimental environmental impacts. A multi-modal and multi-objective trip planner provides users with various multi-modal options optimised on objectives that they prefer (cheapest, fastest, safest, etc) and has a potential to reduce congestion on both a temporal and spatial scale. The computation of multi-modal and multi-objective trips is a complicated mathematical problem, as it must integrate and utilize a diverse range of large data sets, including both road network information and public transport schedules, as well as optimising for a number of competing objectives, where fully optimising for one objective, such as travel time, can adversely affect other objectives, such as cost. The relationship between these objectives can also be quite subjective, as their priorities will vary from user to user. This paper will first outline the various data requirements and formats that are needed for the multi-modal multi-objective trip planner to operate, including static information about the physical infrastructure within Brisbane as well as real-time and historical data to predict traffic flow on the road network and the status of public transport. It will then present information on the graph data structures representing the road and public transport networks within Brisbane that are used in the trip planner to calculate optimal routes. This will allow for an investigation into the various shortest path algorithms that have been researched over the last few decades, and provide a foundation for the construction of the Multi-modal Multi-objective Trip Planner by the development of innovative new algorithms that can operate the large diverse data sets and competing objectives.