994 resultados para Video-Stream Filtering


Relevância:

90.00% 90.00%

Publicador:

Resumo:

We investigate the problem of obtaining a dense reconstruction in real-time, from a live video stream. In recent years, multi-view stereo (MVS) has received considerable attention and a number of methods have been proposed. However, most methods operate under the assumption of a relatively sparse set of still images as input and unlimited computation time. Video based MVS has received less attention despite the fact that video sequences offer significant benefits in terms of usability of MVS systems. In this paper we propose a novel video based MVS algorithm that is suitable for real-time, interactive 3d modeling with a hand-held camera. The key idea is a per-pixel, probabilistic depth estimation scheme that updates posterior depth distributions with every new frame. The current implementation is capable of updating 15 million distributions/s. We evaluate the proposed method against the state-of-the-art real-time MVS method and show improvement in terms of accuracy. © 2011 Elsevier B.V. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

[EN]This paper describes a face detection system which goes beyond traditional approaches normally designed for still images. First the video stream context is considered to apply the detector, and therefore, the resulting system is designed taking into consideration a main feature available in a video stream, i.e. temporal coherence. The resulting system builds a feature based model for each detected face, and searches them using various model information in the next frame. The results achieved for video stream processing outperform Rowley-Kanade's and Viola-Jones' solutions providing eye and face data in a reduced time with a notable correct detection rate.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Recognition of multiple moving objects is a very important task for achieving user-cared knowledge to send to the base station in wireless video-based sensor networks. However, video based sensor nodes, which have constrained resources and produce huge amount of video streams continuously, bring a challenge to segment multiple moving objects from the video stream online. Traditional efficient clustering algorithms such as DBSCAN cannot run time-efficiently and even fail to run on limited memory space on sensor nodes, because the number of pixel points is too huge. This paper provides a novel algorithm named Inter-Frame Change Directing Online clustering (IFCDO clustering) for segmenting multiple moving objects from video stream on sensor nodes. IFCDO clustering only needs to group inter-frame different pixels, thus it reduces both space and time complexity while achieves robust clusters the same as DBSCAN. Experiment results show IFCDO clustering excels DBSCAN in terms of both time and space efficiency. © 2008 IEEE.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Compared to traditional wired video sensor networks to supervise a residential district, Wireless Video-based Sensor Networks (WVSN) can provide more detail and precise information while reduce the cost. However, state-of-the-art low cost wireless video-based sensors have very constrained resources such as low bandwidth, small storage, limited processing capability, and limited energy resource. Also, due to the special sensing range of video-based sensors, cluster-based routing is not as effective as it apply to traditional sensor networks. This paper provides a novel real-time change mining algorithm based on an extracted profile model of moving objects learnt from frog's eyes. Example analysis shows the extracted profile would not miss any important semantic images to send to the Base Station for further hazards detection, while efficiently reducing futile video stream data to the degree that nowadays wireless video sensor can realize. Thus it makes WVSN available to surveillance of residential districts.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Rapid prototyping environments can speed up the research of visual control algorithms. We have designed and implemented a software framework for fast prototyping of visual control algorithms for Micro Aerial Vehicles (MAV). We have applied a combination of a proxy-based network communication architecture and a custom Application Programming Interface. This allows multiple experimental configurations, like drone swarms or distributed processing of a drone's video stream. Currently, the framework supports a low-cost MAV: the Parrot AR.Drone. Real tests have been performed on this platform and the results show comparatively low figures of the extra communication delay introduced by the framework, while adding new functionalities and flexibility to the selected drone. This implementation is open-source and can be downloaded from www.vision4uav.com/?q=VC4MAV-FW

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We propose a method of representing audience behavior through facial and body motions from a single video stream, and use these features to predict the rating for feature-length movies. This is a very challenging problem as: i) the movie viewing environment is dark and contains views of people at different scales and viewpoints; ii) the duration of feature-length movies is long (80-120 mins) so tracking people uninterrupted for this length of time is still an unsolved problem, and; iii) expressions and motions of audience members are subtle, short and sparse making labeling of activities unreliable. To circumvent these issues, we use an infrared illuminated test-bed to obtain a visually uniform input. We then utilize motion-history features which capture the subtle movements of a person within a pre-defined volume, and then form a group representation of the audience by a histogram of pair-wise correlations over a small-window of time. Using this group representation, we learn our movie rating classifier from crowd-sourced ratings collected by rottentomatoes.com and show our prediction capability on audiences from 30 movies across 250 subjects (> 50 hrs).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

It is not uncommon to hear a person of interest described by their height, build, and clothing (i.e. type and colour). These semantic descriptions are commonly used by people to describe others, as they are quick to relate and easy to understand. However such queries are not easily utilised within intelligent surveillance systems as they are difficult to transform into a representation that can be searched for automatically in large camera networks. In this paper we propose a novel approach that transforms such a semantic query into an avatar that is searchable within a video stream, and demonstrate state-of-the-art performance for locating a subject in video based on a description.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

It is not uncommon to hear a person of interest described by their height, build, and clothing (i.e. type and colour). These semantic descriptions are commonly used by people to describe others, as they are quick to communicate and easy to understand. However such queries are not easily utilised within intelligent video surveillance systems, as they are difficult to transform into a representation that can be utilised by computer vision algorithms. In this paper we propose a novel approach that transforms such a semantic query into an avatar in the form of a channel representation that is searchable within a video stream. We show how spatial, colour and prior information (person shape) can be incorporated into the channel representation to locate a target using a particle-filter like approach. We demonstrate state-of-the-art performance for locating a subject in video based on a description, achieving a relative performance improvement of 46.7% over the baseline. We also apply this approach to person re-detection, and show that the approach can be used to re-detect a person in a video steam without the use of person detection.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Regions in video streams attracting human interest contribute significantly to human understanding of the video. Being able to predict salient and informative Regions of Interest (ROIs) through a sequence of eye movements is a challenging problem. Applications such as content-aware retargeting of videos to different aspect ratios while preserving informative regions and smart insertion of dialog (closed-caption text) into the video stream can significantly be improved using the predicted ROIs. We propose an interactive human-in-the-loop framework to model eye movements and predict visual saliency into yet-unseen frames. Eye tracking and video content are used to model visual attention in a manner that accounts for important eye-gaze characteristics such as temporal discontinuities due to sudden eye movements, noise, and behavioral artifacts. A novel statistical-and algorithm-based method gaze buffering is proposed for eye-gaze analysis and its fusion with content-based features. Our robust saliency prediction is instantiated for two challenging and exciting applications. The first application alters video aspect ratios on-the-fly using content-aware video retargeting, thus making them suitable for a variety of display sizes. The second application dynamically localizes active speakers and places dialog captions on-the-fly in the video stream. Our method ensures that dialogs are faithful to active speaker locations and do not interfere with salient content in the video stream. Our framework naturally accommodates personalisation of the application to suit biases and preferences of individual users.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, we propose a H.264/AVC compressed domain human action recognition system with projection based metacognitive learning classifier (PBL-McRBFN). The features are extracted from the quantization parameters and the motion vectors of the compressed video stream for a time window and used as input to the classifier. Since compressed domain analysis is done with noisy, sparse compression parameters, it is a huge challenge to achieve performance comparable to pixel domain analysis. On the positive side, compressed domain allows rapid analysis of videos compared to pixel level analysis. The classification results are analyzed for different values of Group of Pictures (GOP) parameter, time window including full videos. The functional relationship between the features and action labels are established using PBL-McRBFN with a cognitive and meta-cognitive component. The cognitive component is a radial basis function, while the meta-cognitive component employs self-regulation to achieve better performance in subject independent action recognition task. The proposed approach is faster and shows comparable performance with respect to the state-of-the-art pixel domain counterparts. It employs partial decoding, which rules out the complexity of full decoding, and minimizes computational load and memory usage. This results in reduced hardware utilization and increased speed of classification. The results are compared with two benchmark datasets and show more than 90% accuracy using the PBL-McRBFN. The performance for various GOP parameters and group of frames are obtained with twenty random trials and compared with other well-known classifiers in machine learning literature. (C) 2015 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

O uso de Internet para a distribuição de fluxos de vídeo tem se mostrado uma tendência atual e traz consigo grandes desafios. O alicerce sobre qual a Internet está fundamentada, comutação por pacotes e arquitetura cliente-servidor, não proporciona as melhores condições para este tipo de serviço. A arquitetura P2P (peer-to-peer) vem sendo considerada como infraestrutura para a distribuição de fluxos de vídeo na Internet. A idéia básica da distribuição de vídeo com o suporte de P2P é a de que os vários nós integrantes da rede sobreposta distribuem e encaminham pedaços de vídeo de forma cooperativa, dividindo as tarefas, e colocando à disposição da rede seus recursos locais. Dentro deste contexto, é importante investigar o que ocorre com a qualidade do serviço de distribuição de vídeo quando a infraestrutura provida pelas redes P2P é contaminada por nós que não estejam dispostos a cooperar, já que a base desta arquitetura é a cooperação. Neste trabalho, inicialmente é feito um estudo para verificar o quanto a presença de nós não-cooperativos pode afetar a qualidade da aplicação de distribuição de fluxo de vídeo em uma rede P2P. Com base nos resultados obtidos, é proposto um mecanismo de incentivo à cooperação para que seja garantida uma boa qualidade de vídeo aos nós cooperativos e alguma punição aos nós não-cooperativos. Os testes e avaliações foram realizados utilizando-se o simulador PeerSim.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Vision based tracking can provide the spatial location of project related entities such as equipment, workers, and materials in a large-scale congested construction site. It tracks entities in a video stream by inferring their motion. To initiate the process, it is required to determine the pixel areas of the entities to be tracked in the following consecutive video frames. For the purpose of fully automating the process, this paper presents an automated way of initializing trackers using Semantic Texton Forests (STFs) method. STFs method performs simultaneously the segmentation of the image and the classification of the segments based on the low-level semantic information and the context information. In this paper, STFs method is tested in the case of wheel loaders recognition. In the experiments, wheel loaders are further divided into several parts such as wheels and body parts to help learn the context information. The results show 79% accuracy of recognizing the pixel areas of the wheel loader. These results signify that STFs method has the potential to automate the initialization process of vision based tracking.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Gesture spotting is the challenging task of locating the start and end frames of the video stream that correspond to a gesture of interest, while at the same time rejecting non-gesture motion patterns. This paper proposes a new gesture spotting and recognition algorithm that is based on the continuous dynamic programming (CDP) algorithm, and runs in real-time. To make gesture spotting efficient a pruning method is proposed that allows the system to evaluate a relatively small number of hypotheses compared to CDP. Pruning is implemented by a set of model-dependent classifiers, that are learned from training examples. To make gesture spotting more accurate a subgesture reasoning process is proposed that models the fact that some gesture models can falsely match parts of other longer gestures. In our experiments, the proposed method with pruning and subgesture modeling is an order of magnitude faster and 18% more accurate compared to the original CDP algorithm.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This document details the legal agreement that conference participants will need to sign so that the University can video, stream and store recordinsg of the sessions.