994 resultados para Interpretação visual
Resumo:
Regions in video streams attracting human interest contribute significantly to human understanding of the video. Being able to predict salient and informative Regions of Interest (ROIs) through a sequence of eye movements is a challenging problem. Applications such as content-aware retargeting of videos to different aspect ratios while preserving informative regions and smart insertion of dialog (closed-caption text) into the video stream can significantly be improved using the predicted ROIs. We propose an interactive human-in-the-loop framework to model eye movements and predict visual saliency into yet-unseen frames. Eye tracking and video content are used to model visual attention in a manner that accounts for important eye-gaze characteristics such as temporal discontinuities due to sudden eye movements, noise, and behavioral artifacts. A novel statistical-and algorithm-based method gaze buffering is proposed for eye-gaze analysis and its fusion with content-based features. Our robust saliency prediction is instantiated for two challenging and exciting applications. The first application alters video aspect ratios on-the-fly using content-aware video retargeting, thus making them suitable for a variety of display sizes. The second application dynamically localizes active speakers and places dialog captions on-the-fly in the video stream. Our method ensures that dialogs are faithful to active speaker locations and do not interfere with salient content in the video stream. Our framework naturally accommodates personalisation of the application to suit biases and preferences of individual users.
Resumo:
Designing a robust algorithm for visual object tracking has been a challenging task since many years. There are trackers in the literature that are reasonably accurate for many tracking scenarios but most of them are computationally expensive. This narrows down their applicability as many tracking applications demand real time response. In this paper, we present a tracker based on random ferns. Tracking is posed as a classification problem and classification is done using ferns. We used ferns as they rely on binary features and are extremely fast at both training and classification as compared to other classification algorithms. Our experiments show that the proposed tracker performs well on some of the most challenging tracking datasets and executes much faster than one of the state-of-the-art trackers, without much difference in tracking accuracy.
Resumo:
We perceive objects as containing a variety of attributes: local features, relations between features, internal details, and global properties. But we know little about how they combine. Here, we report a remarkably simple additive rule that governs how these diverse object attributes combine in vision. The perceived dissimilarity between two objects was accurately explained as a sum of (a) spatially tuned local contour-matching processes modulated by part decomposition; (b) differences in internal details, such as texture; (c) differences in emergent attributes, such as symmetry; and (d) differences in global properties, such as orientation or overall configuration of parts. Our results elucidate an enduring question in object vision by showing that the whole object is not a sum of its parts but a sum of its many attributes.
Resumo:
This paper presents a novel coarse-to-fine global localization approach that is inspired by object recognition and text retrieval techniques. Harris-Laplace interest points characterized by SIFT descriptors are used as natural land-marks. These descriptors are indexed into two databases: an inverted index and a location database. The inverted index is built based on a visual vocabulary learned from the feature descriptors. In the location database, each location is directly represented by a set of scale invariant descriptors. The localization process consists of two stages: coarse localization and fine localization. Coarse localization from the inverted index is fast but not accurate enough; whereas localization from the location database using voting algorithm is relatively slow but more accurate. The combination of coarse and fine stages makes fast and reliable localization possible. In addition, if necessary, the localization result can be verified by epipolar geometry between the representative view in database and the view to be localized. Experimental results show that our approach is efficient and reliable. ©2005 IEEE.
Resumo:
In the life of the Law School, focus on the “visual” can operate at three different levels: learning, teaching, and examining (legal concepts). My main interest in this paper is to explore the latter level, “examining”, broadly considered so as to encompass evaluation in general. Furthermore, that interest is pinned down here to the area of constitutional rights and human rights in general, even though the conclusions reached can (and should) likely be extrapolated to other areas of the law... In effect, the first logical step regarding the relevance of the visual approach has to do with using it yourself when you study —assuming that you came to the conclusion that you are a “visual learner”. As you know, VARK theorists propose a quadripartite classification of learners. The acronym VARK stands for Visual, Aural, Read/write, and Kinesthetic sensory modalities that are used for learning information. This model was designed in the late 80s by Neil Fleming and it has received some acceptance and a lot of attention...