1000 resultados para video indexing


60.00% 60.00%



An automated system for detection of head movements is described. The goal is to label relevant head gestures in video of American Sign Language (ASL) communication. In the system, a 3D head tracker recovers head rotation and translation parameters from monocular video. Relevant head gestures are then detected by analyzing the length and frequency of the motion signal's peaks and valleys. Each parameter is analyzed independently, due to the fact that a number of relevant head movements in ASL are associated with major changes around one rotational axis. No explicit training of the system is necessary. Currently, the system can detect "head shakes." In experimental evaluation, classification performance is compared against ground-truth labels obtained from ASL linguists. Initial results are promising, as the system matches the linguists' labels in a significant number of cases.


60.00% 60.00%



Addressing core issues in mobile surveillance, we present an architecture for querying and retrieving distributed, semi-permanent multi-modal data through challenged networks with limited connectivity. The system provides a rich set of queries for spatio-temporal querying in a surveillance context, and uses the network availability to provide best quality of service. It incrementally and adaptively refines the query, using data already retrieved that exists on static platforms and on-demand data that it requests from mobile platforms. We demonstrate the system using a real surveillance system on a mobile 20 bus transport network coupled with static bus depot infrastructure. In addition, we show the robustness of the system in handling different conditions in the underlying infrastructure by running simulations on a real, but historic dataset collected in an offline manner.


60.00% 60.00%



The hierarchical hidden Markov model (HHMM) is an extension of the hidden Markov model to include a hierarchy of the hidden states. This form of hierarchical modeling has been found useful in applications such as handwritten character recognition, behavior recognition, video indexing, and text retrieval. Nevertheless, the state hierarchy in the original HHMM is restricted to a tree structure. This prohibits two different states from having the same child, and thus does not allow for sharing of common substructures in the model. In this paper, we present a general HHMM in which the state hierarchy can be a lattice allowing arbitrary sharing of substructures. Furthermore, we provide a method for numerical scaling to avoid underflow, an important issue in dealing with long observation sequences. We demonstrate the working of our method in a simulated environment where a hierarchical behavioral model is automatically learned and later used for recognition.


60.00% 60.00%



To enable high-level semantic indexing of video, we tackle the problem of automatically structuring motion pictures into meaningful story units, namely scenes. In our recent work, drawing guidance from film grammar, we proposed an algorithmic solution for extracting scenes in motion pictures based on a shot neighborhood color coherence measure. In this paper, we extend our work by presenting various refinement mechanisms, inspired by the knowledge of film devices that are brought to bear while crafting scenes, to further improve the results of the scene detection algorithm. We apply the enhanced algorithm to ten motion pictures and demonstrate the resulting improvements in performance.


60.00% 60.00%



Visual tracking is the problem of estimating some variables related to a target given a video sequence depicting the target. Visual tracking is key to the automation of many tasks, such as visual surveillance, robot or vehicle autonomous navigation, automatic video indexing in multimedia databases. Despite many years of research, long term tracking in real world scenarios for generic targets is still unaccomplished. The main contribution of this thesis is the definition of effective algorithms that can foster a general solution to visual tracking by letting the tracker adapt to mutating working conditions. In particular, we propose to adapt two crucial components of visual trackers: the transition model and the appearance model. The less general but widespread case of tracking from a static camera is also considered and a novel change detection algorithm robust to sudden illumination changes is proposed. Based on this, a principled adaptive framework to model the interaction between Bayesian change detection and recursive Bayesian trackers is introduced. Finally, the problem of automatic tracker initialization is considered. In particular, a novel solution for categorization of 3D data is presented. The novel category recognition algorithm is based on a novel 3D descriptors that is shown to achieve state of the art performances in several applications of surface matching.


60.00% 60.00%



With the proliferation of multimedia data and ever-growing requests for multimedia applications, there is an increasing need for efficient and effective indexing, storage and retrieval of multimedia data, such as graphics, images, animation, video, audio and text. Due to the special characteristics of the multimedia data, the Multimedia Database management Systems (MMDBMSs) have emerged and attracted great research attention in recent years. Though much research effort has been devoted to this area, it is still far from maturity and there exist many open issues. In this dissertation, with the focus of addressing three of the essential challenges in developing the MMDBMS, namely, semantic gap, perception subjectivity and data organization, a systematic and integrated framework is proposed with video database and image database serving as the testbed. In particular, the framework addresses these challenges separately yet coherently from three main aspects of a MMDBMS: multimedia data representation, indexing and retrieval. In terms of multimedia data representation, the key to address the semantic gap issue is to intelligently and automatically model the mid-level representation and/or semi-semantic descriptors besides the extraction of the low-level media features. The data organization challenge is mainly addressed by the aspect of media indexing where various levels of indexing are required to support the diverse query requirements. In particular, the focus of this study is to facilitate the high-level video indexing by proposing a multimodal event mining framework associated with temporal knowledge discovery approaches. With respect to the perception subjectivity issue, advanced techniques are proposed to support users' interaction and to effectively model users' perception from the feedback at both the image-level and object-level.


40.00% 40.00%



This paper aims to show that by using low level feature extraction, motion and object identifying and tracking methods, features can be extracted and indexed for efficient and effective retrieval for video; such as an awards ceremony video. Video scene/shot analysis and key frame extraction are used as a foundation to identify objects in video and be able to find spatial relationships within the video. The compounding of low level features such as colour, texture and abstract object identification lead into higher level real object identification and tracking and scene detection. The main focus is on using a video style that is different to the heavily used sports and news genres. Using different video styles can open the door to creating methods that could encompass all video types instead of specialized methods for each specific style of video.


40.00% 40.00%



Content-based indexing is fundamental to support and sustain the ongoing growth of broadcasted sports video. The main challenge is to design extensible frameworks to detect and index highlight events. This paper presents: 1) A statistical-driven event detection approach that utilizes a minimum amount of manual knowledge and is based on a universal scope-of-detection and audio-visual features; 2) A semi-schema-based indexing that combines the benefits of schema-based modeling to ensure that the video indexes are valid at all time without manual checking, and schema-less modeling to allow several passes of instantiation in which additional elements can be declared. To demonstrate the performance of the events detection, a large dataset of sport videos with a total of around 15 hours including soccer, basketball and Australian football is used.


40.00% 40.00%



40.00% 40.00%



40.00% 40.00%



This paper addresses the coordinated use of video and audio cues to capture and index surveillance events with multimodal labels. The focus of this paper is the development of a joint-sensor calibration technique that uses audio-visual observations to improve the calibration process. One significant feature of this approach is the ability to continuously check and update the calibration status of the sensor suite, making it resilient to independent drift in the individual sensors. We present scenarios in which this system is used to enhance surveillance.


40.00% 40.00%



40.00% 40.00%



With rapid advances in video processing technologies and ever fast increments in network bandwidth, the popularity of video content publishing and sharing has made similarity search an indispensable operation to retrieve videos of user interests. The video similarity is usually measured by the percentage of similar frames shared by two video sequences, and each frame is typically represented as a high-dimensional feature vector. Unfortunately, high complexity of video content has posed the following major challenges for fast retrieval: (a) effective and compact video representations, (b) efficient similarity measurements, and (c) efficient indexing on the compact representations. In this paper, we propose a number of methods to achieve fast similarity search for very large video database. First, each video sequence is summarized into a small number of clusters, each of which contains similar frames and is represented by a novel compact model called Video Triplet (ViTri). ViTri models a cluster as a tightly bounded hypersphere described by its position, radius, and density. The ViTri similarity is measured by the volume of intersection between two hyperspheres multiplying the minimal density, i.e., the estimated number of similar frames shared by two clusters. The total number of similar frames is then estimated to derive the overall similarity between two video sequences. Hence the time complexity of video similarity measure can be reduced greatly. To further reduce the number of similarity computations on ViTris, we introduce a new one dimensional transformation technique which rotates and shifts the original axis system using PCA in such a way that the original inter-distance between two high-dimensional vectors can be maximally retained after mapping. An efficient B+-tree is then built on the transformed one dimensional values of ViTris' positions. Such a transformation enables B+-tree to achieve its optimal performance by quickly filtering a large portion of non-similar ViTris. Our extensive experiments on real large video datasets prove the effectiveness of our proposals that outperform existing methods significantly.