997 resultados para Foreground Detection DCT
Resumo:
Background subtraction is a fundamental low-level processing task in numerous computer vision applications. The vast majority of algorithms process images on a pixel-by-pixel basis, where an independent decision is made for each pixel. A general limitation of such processing is that rich contextual information is not taken into account. We propose a block-based method capable of dealing with noise, illumination variations, and dynamic backgrounds, while still obtaining smooth contours of foreground objects. Specifically, image sequences are analyzed on an overlapping block-by-block basis. A low-dimensional texture descriptor obtained from each block is passed through an adaptive classifier cascade, where each stage handles a distinct problem. A probabilistic foreground mask generation approach then exploits block overlaps to integrate interim block-level decisions into final pixel-level foreground segmentation. Unlike many pixel-based methods, ad-hoc postprocessing of foreground masks is not required. Experiments on the difficult Wallflower and I2R datasets show that the proposed approach obtains on average better results (both qualitatively and quantitatively) than several prominent methods. We furthermore propose the use of tracking performance as an unbiased approach for assessing the practical usefulness of foreground segmentation methods, and show that the proposed approach leads to considerable improvements in tracking accuracy on the CAVIAR dataset.
Resumo:
One of the most widely used techniques in computer vision for foreground detection is to model each background pixel as a Mixture of Gaussians (MoG). While this is effective for a static camera with a fixed or a slowly varying background, it fails to handle any fast, dynamic movement in the background. In this paper, we propose a generalised framework, called region-based MoG (RMoG), that takes into consideration neighbouring pixels while generating the model of the observed scene. The model equations are derived from Expectation Maximisation theory for batch mode, and stochastic approximation is used for online mode updates. We evaluate our region-based approach against ten sequences containing dynamic backgrounds, and show that the region-based approach provides a performance improvement over the traditional single pixel MoG. For feature and region sizes that are equal, the effect of increasing the learning rate is to reduce both true and false positives. Comparison with four state-of-the art approaches shows that RMoG outperforms the others in reducing false positives whilst still maintaining reasonable foreground definition. Lastly, using the ChangeDetection (CDNet 2014) benchmark, we evaluated RMoG against numerous surveillance scenes and found it to amongst the leading performers for dynamic background scenes, whilst providing comparable performance for other commonly occurring surveillance scenes.
Resumo:
Detection of Objects in Video is a highly demanding area of research. The Background Subtraction Algorithms can yield better results in Foreground Object Detection. This work presents a Hybrid CodeBook based Background Subtraction to extract the foreground ROI from the background. Codebooks are used to store compressed information by demanding lesser memory usage and high speedy processing. This Hybrid method which uses Block-Based and Pixel-Based Codebooks provide efficient detection results; the high speed processing capability of block based background subtraction as well as high Precision Rate of pixel based background subtraction are exploited to yield an efficient Background Subtraction System. The Block stage produces a coarse foreground area, which is then refined by the Pixel stage. The system’s performance is evaluated with different block sizes and with different block descriptors like 2D-DCT, FFT etc. The Experimental analysis based on statistical measurements yields precision, recall, similarity and F measure of the hybrid system as 88.74%, 91.09%, 81.66% and 89.90% respectively, and thus proves the efficiency of the novel system.
Resumo:
In the recent years, the computer vision community has shown great interest on depth-based applications thanks to the performance and flexibility of the new generation of RGB-D imagery. In this paper, we present an efficient background subtraction algorithm based on the fusion of multiple region-based classifiers that processes depth and color data provided by RGB-D cameras. Foreground objects are detected by combining a region-based foreground prediction (based on depth data) with different background models (based on a Mixture of Gaussian algorithm) providing color and depth descriptions of the scene at pixel and region level. The information given by these modules is fused in a mixture of experts fashion to improve the foreground detection accuracy. The main contributions of the paper are the region-based models of both background and foreground, built from the depth and color data. The obtained results using different database sequences demonstrate that the proposed approach leads to a higher detection accuracy with respect to existing state-of-the-art techniques.
Resumo:
This paper presents an investigation into event detection in crowded scenes, where the event of interest co-occurs with other activities and only binary labels at the clip level are available. The proposed approach incorporates a fast feature descriptor from the MPEG domain, and a novel multiple instance learning (MIL) algorithm using sparse approximation and random sensing. MPEG motion vectors are used to build particle trajectories that represent the motion of objects in uniform video clips, and the MPEG DCT coefficients are used to compute a foreground map to remove background particles. Trajectories are transformed into the Fourier domain, and the Fourier representations are quantized into visual words using the K-Means algorithm. The proposed MIL algorithm models the scene as a linear combination of independent events, where each event is a distribution of visual words. Experimental results show that the proposed approaches achieve promising results for event detection compared to the state-of-the-art.
Resumo:
Along the recent years, several moving object detection strategies by non-parametric background-foreground modeling have been proposed. To combine both models and to obtain the probability of a pixel to belong to the foreground, these strategies make use of Bayesian classifiers. However, these classifiers do not allow to take advantage of additional prior information at different pixels. So, we propose a novel and efficient alternative Bayesian classifier that is suitable for this kind of strategies and that allows the use of whatever prior information. Additionally, we present an effective method to dynamically estimate prior probability from the result of a particle filter-based tracking strategy.
Resumo:
Surveillance networks are typically monitored by a few people, viewing several monitors displaying the camera feeds. It is then very difficult for a human operator to effectively detect events as they happen. Recently, computer vision research has begun to address ways to automatically process some of this data, to assist human operators. Object tracking, event recognition, crowd analysis and human identification at a distance are being pursued as a means to aid human operators and improve the security of areas such as transport hubs. The task of object tracking is key to the effective use of more advanced technologies. To recognize an event people and objects must be tracked. Tracking also enhances the performance of tasks such as crowd analysis or human identification. Before an object can be tracked, it must be detected. Motion segmentation techniques, widely employed in tracking systems, produce a binary image in which objects can be located. However, these techniques are prone to errors caused by shadows and lighting changes. Detection routines often fail, either due to erroneous motion caused by noise and lighting effects, or due to the detection routines being unable to split occluded regions into their component objects. Particle filters can be used as a self contained tracking system, and make it unnecessary for the task of detection to be carried out separately except for an initial (often manual) detection to initialise the filter. Particle filters use one or more extracted features to evaluate the likelihood of an object existing at a given point each frame. Such systems however do not easily allow for multiple objects to be tracked robustly, and do not explicitly maintain the identity of tracked objects. This dissertation investigates improvements to the performance of object tracking algorithms through improved motion segmentation and the use of a particle filter. A novel hybrid motion segmentation / optical flow algorithm, capable of simultaneously extracting multiple layers of foreground and optical flow in surveillance video frames is proposed. The algorithm is shown to perform well in the presence of adverse lighting conditions, and the optical flow is capable of extracting a moving object. The proposed algorithm is integrated within a tracking system and evaluated using the ETISEO (Evaluation du Traitement et de lInterpretation de Sequences vidEO - Evaluation for video understanding) database, and significant improvement in detection and tracking performance is demonstrated when compared to a baseline system. A Scalable Condensation Filter (SCF), a particle filter designed to work within an existing tracking system, is also developed. The creation and deletion of modes and maintenance of identity is handled by the underlying tracking system; and the tracking system is able to benefit from the improved performance in uncertain conditions arising from occlusion and noise provided by a particle filter. The system is evaluated using the ETISEO database. The dissertation then investigates fusion schemes for multi-spectral tracking systems. Four fusion schemes for combining a thermal and visual colour modality are evaluated using the OTCBVS (Object Tracking and Classification in and Beyond the Visible Spectrum) database. It is shown that a middle fusion scheme yields the best results and demonstrates a significant improvement in performance when compared to a system using either mode individually. Findings from the thesis contribute to improve the performance of semi-automated video processing and therefore improve security in areas under surveillance.
Resumo:
Many surveillance applications (object tracking, abandoned object detection) rely on detecting changes in a scene. Foreground segmentation is an effective way to extract the foreground from the scene, but these techniques cannot discriminate between objects that have temporarily stopped and those that are moving. We propose a series of modifications to an existing foreground segmentation system\cite{Butler2003} so that the foreground is further segmented into two or more layers. This yields an active layer of objects currently in motion and a passive layer of objects that have temporarily ceased motion which can itself be decomposed into multiple static layers. We also propose a variable threshold to cope with variable illumination, a feedback mechanism that allows an external process (i.e. surveillance system) to alter the motion detectors state, and a lighting compensation process and a shadow detector to reduce errors caused by lighting inconsistencies. The technique is demonstrated using outdoor surveillance footage, and is shown to be able to effectively deal with real world lighting conditions and overlapping objects.
Resumo:
The detection of voice activity is a challenging problem, especially when the level of acoustic noise is high. Most current approaches only utilise the audio signal, making them susceptible to acoustic noise. An obvious approach to overcome this is to use the visual modality. The current state-of-the-art visual feature extraction technique is one that uses a cascade of visual features (i.e. 2D-DCT, feature mean normalisation, interstep LDA). In this paper, we investigate the effectiveness of this technique for the task of visual voice activity detection (VAD), and analyse each stage of the cascade and quantify the relative improvement in performance gained by each successive stage. The experiments were conducted on the CUAVE database and our results highlight that the dynamics of the visual modality can be used to good effect to improve visual voice activity detection performance.
Resumo:
This paper presents an efficient face detection method suitable for real-time surveillance applications. Improved efficiency is achieved by constraining the search window of an AdaBoost face detector to pre-selected regions. Firstly, the proposed method takes a sparse grid of sample pixels from the image to reduce whole image scan time. A fusion of foreground segmentation and skin colour segmentation is then used to select candidate face regions. Finally, a classifier-based face detector is applied only to selected regions to verify the presence of a face (the Viola-Jones detector is used in this paper). The proposed system is evaluated using 640 x 480 pixels test images and compared with other relevant methods. Experimental results show that the proposed method reduces the detection time to 42 ms, where the Viola-Jones detector alone requires 565 ms (on a desktop processor). This improvement makes the face detector suitable for real-time applications. Furthermore, the proposed method requires 50% of the computation time of the best competing method, while reducing the false positive rate by 3.2% and maintaining the same hit rate.
Resumo:
Moving shadow detection and removal from the extracted foreground regions of video frames, aim to limit the risk of misconsideration of moving shadows as a part of moving objects. This operation thus enhances the rate of accuracy in detection and classification of moving objects. With a similar reasoning, the present paper proposes an efficient method for the discrimination of moving object and moving shadow regions in a video sequence, with no human intervention. Also, it requires less computational burden and works effectively under dynamic traffic road conditions on highways (with and without marking lines), street ways (with and without marking lines). Further, we have used scale-invariant feature transform-based features for the classification of moving vehicles (with and without shadow regions), which enhances the effectiveness of the proposed method. The potentiality of the method is tested with various data sets collected from different road traffic scenarios, and its superiority is compared with the existing methods. (C) 2013 Elsevier GmbH. All rights reserved.
Resumo:
Salient object detection has become an important task in many image processing applications. The existing approaches exploit background prior and contrast prior to attain state of the art results. In this paper, instead of using background cues, we estimate the foreground regions in an image using objectness proposals and utilize it to obtain smooth and accurate saliency maps. We propose a novel saliency measure called `foreground connectivity' which determines how tightly a pixel or a region is connected to the estimated foreground. We use the values assigned by this measure as foreground weights and integrate these in an optimization framework to obtain the final saliency maps. We extensively evaluate the proposed approach on two benchmark databases and demonstrate that the results obtained are better than the existing state of the art approaches.
Resumo:
We introduce a new algorithm to automatically identify the time and pixel location of foot contact events in high speed video of sprinters. We use this information to autonomously synchronise and overlay multiple recorded performances to provide feedback to athletes and coaches during their training sessions. The algorithm exploits the variation in speed of different parts of the body during sprinting. We use an array of foreground accumulators to identify short-term static pixels and a temporal analysis of the associated static regions to identify foot contacts. We evaluated the technique using 13 videos of three sprinters. It successfully identifed 55 of the 56 contacts, with a mean localisation error of 1.39±1.05 pixels. Some videos were also seen to produce additional, spurious contacts. We present heuristics to help identify the true contacts. © 2011 Springer-Verlag Berlin Heidelberg.
Resumo:
Vision based tracking can provide the spatial location of construction entities such as equipment, workers, and materials in large scale, congested construction sites. It tracks entities in video streams by inferring their locations based on the entities’ visual features and motion histories. To initiate the process, it is necessary to determine the pixel areas corresponding to the construction entities to be tracked in the following consecutive video frames. In order to fully automate the process, an automated way of initialization is needed. This paper presents the method for construction worker detection which can automatically recognize and localize construction workers in video frames. The method first finds the foreground areas of moving objects using a background subtraction method. Within these foreground areas, construction workers are recognized based on the histogram of oriented gradients (HOG) and histogram of the HSV colors. HOG’s have proved to work effectively for detection of people, and the histogram of HSV colors helps differentiate between pedestrians and construction workers wearing safety vests. Preliminary experiments show that the proposed method has the potential to automate the initialization process of vision based tracking.
Resumo:
目前大多数水印算法采用线性相关的方法检测水印,但是,当原始媒体信号不服从高斯分布,或者水印不是以加嵌入方式嵌入到待保护的媒体对象中时,该方法存在一定的问题.数字水印的不可感知性约束决定了水印检测是一个弱信号检测问题,利用这一特性,首先从图像DCT(discrete cosine transfom)交流变换系数的统计特性出发,应用广义高斯分布来建立其统计分布模型,然后将水印检测问题转化为二元假设检验问题,以非高斯噪声中弱信号检测的基本理论作为乘嵌入水印的理论检测模型,推导出优化的乘嵌入水印检测算法,并对检测算法进行了实验.结果表明,对于未知嵌入强度的乘水印的盲检测,提出的水印检测器具有良好的检测性能.因此,该检测器能在数字媒体数据的版权保护方面得到了实际的应用.