316 resultados para Opencv, Zbar, Computer Vision
Resumo:
Surveillance networks are typically monitored by a few people, viewing several monitors displaying the camera feeds. It is then very difficult for a human operator to effectively detect events as they happen. Recently, computer vision research has begun to address ways to automatically process some of this data, to assist human operators. Object tracking, event recognition, crowd analysis and human identification at a distance are being pursued as a means to aid human operators and improve the security of areas such as transport hubs. The task of object tracking is key to the effective use of more advanced technologies. To recognize an event people and objects must be tracked. Tracking also enhances the performance of tasks such as crowd analysis or human identification. Before an object can be tracked, it must be detected. Motion segmentation techniques, widely employed in tracking systems, produce a binary image in which objects can be located. However, these techniques are prone to errors caused by shadows and lighting changes. Detection routines often fail, either due to erroneous motion caused by noise and lighting effects, or due to the detection routines being unable to split occluded regions into their component objects. Particle filters can be used as a self contained tracking system, and make it unnecessary for the task of detection to be carried out separately except for an initial (often manual) detection to initialise the filter. Particle filters use one or more extracted features to evaluate the likelihood of an object existing at a given point each frame. Such systems however do not easily allow for multiple objects to be tracked robustly, and do not explicitly maintain the identity of tracked objects. This dissertation investigates improvements to the performance of object tracking algorithms through improved motion segmentation and the use of a particle filter. A novel hybrid motion segmentation / optical flow algorithm, capable of simultaneously extracting multiple layers of foreground and optical flow in surveillance video frames is proposed. The algorithm is shown to perform well in the presence of adverse lighting conditions, and the optical flow is capable of extracting a moving object. The proposed algorithm is integrated within a tracking system and evaluated using the ETISEO (Evaluation du Traitement et de lInterpretation de Sequences vidEO - Evaluation for video understanding) database, and significant improvement in detection and tracking performance is demonstrated when compared to a baseline system. A Scalable Condensation Filter (SCF), a particle filter designed to work within an existing tracking system, is also developed. The creation and deletion of modes and maintenance of identity is handled by the underlying tracking system; and the tracking system is able to benefit from the improved performance in uncertain conditions arising from occlusion and noise provided by a particle filter. The system is evaluated using the ETISEO database. The dissertation then investigates fusion schemes for multi-spectral tracking systems. Four fusion schemes for combining a thermal and visual colour modality are evaluated using the OTCBVS (Object Tracking and Classification in and Beyond the Visible Spectrum) database. It is shown that a middle fusion scheme yields the best results and demonstrates a significant improvement in performance when compared to a system using either mode individually. Findings from the thesis contribute to improve the performance of semi-automated video processing and therefore improve security in areas under surveillance.
Resumo:
The relationship between multiple cameras viewing the same scene may be discovered automatically by finding corresponding points in the two views and then solving for the camera geometry. In camera networks with sparsely placed cameras, low resolution cameras or in scenes with few distinguishable features it may be difficult to find a sufficient number of reliable correspondences from which to compute geometry. This paper presents a method for extracting a larger number of correspondences from an initial set of putative correspondences without any knowledge of the scene or camera geometry. The method may be used to increase the number of correspondences and make geometry computations possible in cases where existing methods have produced insufficient correspondences.
Resumo:
In this study, the authors propose a novel video stabilisation algorithm for mobile platforms with moving objects in the scene. The quality of videos obtained from mobile platforms, such as unmanned airborne vehicles, suffers from jitter caused by several factors. In order to remove this undesired jitter, the accurate estimation of global motion is essential. However it is difficult to estimate global motions accurately from mobile platforms due to increased estimation errors and noises. Additionally, large moving objects in the video scenes contribute to the estimation errors. Currently, only very few motion estimation algorithms have been developed for video scenes collected from mobile platforms, and this paper shows that these algorithms fail when there are large moving objects in the scene. In this study, a theoretical proof is provided which demonstrates that the use of delta optical flow can improve the robustness of video stabilisation in the presence of large moving objects in the scene. The authors also propose to use sorted arrays of local motions and the selection of feature points to separate outliers from inliers. The proposed algorithm is tested over six video sequences, collected from one fixed platform, four mobile platforms and one synthetic video, of which three contain large moving objects. Experiments show our proposed algorithm performs well to all these video sequences.
Resumo:
One of the major challenges facing a present day game development company is the removal of bugs from such complex virtual environments. This work presents an approach for measuring the correctness of synthetic scenes generated by a rendering system of a 3D application, such as a computer game. Our approach builds a database of labelled point clouds representing the spatiotemporal colour distribution for the objects present in a sequence of bug-free frames. This is done by converting the position that the pixels take over time into the 3D equivalent points with associated colours. Once the space of labelled points is built, each new image produced from the same game by any rendering system can be analysed by measuring its visual inconsistency in terms of distance from the database. Objects within the scene can be relocated (manually or by the application engine); yet the algorithm is able to perform the image analysis in terms of the 3D structure and colour distribution of samples on the surface of the object. We applied our framework to the publicly available game RacingGame developed for Microsoft(R) Xna(R). Preliminary results show how this approach can be used to detect a variety of visual artifacts generated by the rendering system in a professional quality game engine.
Resumo:
Semi-automatic segmentation of still images has vast and varied practical applications. Recently, an approach "GrabCut" has managed to successfully build upon earlier approaches based on colour and gradient information in order to address the problem of efficient extraction of a foreground object in a complex environment. In this paper, we extend the GrabCut algorithm further by applying an unsupervised algorithm for modelling the Gaussian Mixtures that are used to define the foreground and background in the segmentation algorithm. We show examples where the optimisation of the GrabCut framework leads to further improvements in performance.
Resumo:
Many surveillance applications (object tracking, abandoned object detection) rely on detecting changes in a scene. Foreground segmentation is an effective way to extract the foreground from the scene, but these techniques cannot discriminate between objects that have temporarily stopped and those that are moving. We propose a series of modifications to an existing foreground segmentation system\cite{Butler2003} so that the foreground is further segmented into two or more layers. This yields an active layer of objects currently in motion and a passive layer of objects that have temporarily ceased motion which can itself be decomposed into multiple static layers. We also propose a variable threshold to cope with variable illumination, a feedback mechanism that allows an external process (i.e. surveillance system) to alter the motion detectors state, and a lighting compensation process and a shadow detector to reduce errors caused by lighting inconsistencies. The technique is demonstrated using outdoor surveillance footage, and is shown to be able to effectively deal with real world lighting conditions and overlapping objects.
Resumo:
Abandoned object detection (AOD) systems are required to run in high traffic situations, with high levels of occlusion. Systems rely on background segmentation techniques to locate abandoned objects, by detecting areas of motion that have stopped. This is often achieved by using a medium term motion detection routine to detect long term changes in the background. When AOD systems are integrated into person tracking system, this often results in two separate motion detectors being used to handle the different requirements. We propose a motion detection system that is capable of detecting medium term motion as well as regular motion. Multiple layers of medium term (static) motion can be detected and segmented. We demonstrate the performance of this motion detection system and as part of an abandoned object detection system.
Resumo:
Surveillance and tracking systems typically use a single colour modality for their input. These systems work well in controlled conditions but often fail with low lighting, shadowing, smoke, dust, unstable backgrounds or when the foreground object is of similar colouring to the background. With advances in technology and manufacturing techniques, sensors that allow us to see into the thermal infrared spectrum are becoming more affordable. By using modalities from both the visible and thermal infrared spectra, we are able to obtain more information from a scene and overcome the problems associated with using visible light only for surveillance and tracking. Thermal images are not affected by lighting or shadowing and are not overtly affected by smoke, dust or unstable backgrounds. We propose and evaluate three approaches for fusing visual and thermal images for person tracking. We also propose a modified condensation filter to track and aid in the fusion of the modalities. We compare the proposed fusion schemes with using the visual and thermal domains on their own, and demonstrate that significant improvements can be achieved by using multiple modalities.
Resumo:
Surveillance systems such as object tracking and abandoned object detection systems typically rely on a single modality of colour video for their input. These systems work well in controlled conditions but often fail when low lighting, shadowing, smoke, dust or unstable backgrounds are present, or when the objects of interest are a similar colour to the background. Thermal images are not affected by lighting changes or shadowing, and are not overtly affected by smoke, dust or unstable backgrounds. However, thermal images lack colour information which makes distinguishing between different people or objects of interest within the same scene difficult. ----- By using modalities from both the visible and thermal infrared spectra, we are able to obtain more information from a scene and overcome the problems associated with using either modality individually. We evaluate four approaches for fusing visual and thermal images for use in a person tracking system (two early fusion methods, one mid fusion and one late fusion method), in order to determine the most appropriate method for fusing multiple modalities. We also evaluate two of these approaches for use in abandoned object detection, and propose an abandoned object detection routine that utilises multiple modalities. To aid in the tracking and fusion of the modalities we propose a modified condensation filter that can dynamically change the particle count and features used according to the needs of the system. ----- We compare tracking and abandoned object detection performance for the proposed fusion schemes and the visual and thermal domains on their own. Testing is conducted using the OTCBVS database to evaluate object tracking, and data captured in-house to evaluate the abandoned object detection. Our results show that significant improvement can be achieved, and that a middle fusion scheme is most effective.
Resumo:
This paper presents an object tracking system that utilises a hybrid multi-layer motion segmentation and optical flow algorithm. While many tracking systems seek to combine multiple modalities such as motion and depth or multiple inputs within a fusion system to improve tracking robustness, current systems have avoided the combination of motion and optical flow. This combination allows the use of multiple modes within the object detection stage. Consequently, different categories of objects, within motion or stationary, can be effectively detected utilising either optical flow, static foreground or active foreground information. The proposed system is evaluated using the ETISEO database and evaluation metrics and compared to a baseline system utilising a single mode foreground segmentation technique. Results demonstrate a significant improvement in tracking results can be made through the incorporation of the additional motion information.
Resumo:
Automated crowd counting allows excessive crowding to be detected immediately, without the need for constant human surveillance. Current crowd counting systems are location specific, and for these systems to function properly they must be trained on a large amount of data specific to the target location. As such, configuring multiple systems to use is a tedious and time consuming exercise. We propose a scene invariant crowd counting system which can easily be deployed at a different location to where it was trained. This is achieved using a global scaling factor to relate crowd sizes from one scene to another. We demonstrate that a crowd counting system trained at one viewpoint can achieve a correct classification rate of 90% at a different viewpoint.