885 resultados para SIFT,Computer Vision,Python,Object Recognition,Feature Detection,Descriptor Computation
Resumo:
This paper presents an easy to use methodology and system for insurance companies targeting at managing traffic accidents reports process. The main objective is to facilitate and accelerate the process of creating and finalizing the necessary accident reports in cases without mortal victims involved. The diverse entities participating in the process from the moment an accident occurs until the related final actions needed are included. Nowadays, this market is limited to the consulting platforms offered by the insurance companies. Copyright 2014 ACM.
Resumo:
This thesis addresses the problem of detecting and describing the same scene points in different wide-angle images taken by the same camera at different viewpoints. This is a core competency of many vision-based localisation tasks including visual odometry and visual place recognition. Wide-angle cameras have a large field of view that can exceed a full hemisphere, and the images they produce contain severe radial distortion. When compared to traditional narrow field of view perspective cameras, more accurate estimates of camera egomotion can be found using the images obtained with wide-angle cameras. The ability to accurately estimate camera egomotion is a fundamental primitive of visual odometry, and this is one of the reasons for the increased popularity in the use of wide-angle cameras for this task. Their large field of view also enables them to capture images of the same regions in a scene taken at very different viewpoints, and this makes them suited for visual place recognition. However, the ability to estimate the camera egomotion and recognise the same scene in two different images is dependent on the ability to reliably detect and describe the same scene points, or ‘keypoints’, in the images. Most algorithms used for this purpose are designed almost exclusively for perspective images. Applying algorithms designed for perspective images directly to wide-angle images is problematic as no account is made for the image distortion. The primary contribution of this thesis is the development of two novel keypoint detectors, and a method of keypoint description, designed for wide-angle images. Both reformulate the Scale- Invariant Feature Transform (SIFT) as an image processing operation on the sphere. As the image captured by any central projection wide-angle camera can be mapped to the sphere, applying these variants to an image on the sphere enables keypoints to be detected in a manner that is invariant to image distortion. Each of the variants is required to find the scale-space representation of an image on the sphere, and they differ in the approaches they used to do this. Extensive experiments using real and synthetically generated wide-angle images are used to validate the two new keypoint detectors and the method of keypoint description. The best of these two new keypoint detectors is applied to vision based localisation tasks including visual odometry and visual place recognition using outdoor wide-angle image sequences. As part of this work, the effect of keypoint coordinate selection on the accuracy of egomotion estimates using the Direct Linear Transform (DLT) is investigated, and a simple weighting scheme is proposed which attempts to account for the uncertainty of keypoint positions during detection. A word reliability metric is also developed for use within a visual ‘bag of words’ approach to place recognition.
Resumo:
The earliest stages of human cortical visual processing can be conceived as extraction of local stimulus features. However, more complex visual functions, such as object recognition, require integration of multiple features. Recently, neural processes underlying feature integration in the visual system have been under intensive study. A specialized mid-level stage preceding the object recognition stage has been proposed to account for the processing of contours, surfaces and shapes as well as configuration. This thesis consists of four experimental, psychophysical studies on human visual feature integration. In two studies, classification image a recently developed psychophysical reverse correlation method was used. In this method visual noise is added to near-threshold stimuli. By investigating the relationship between random features in the noise and observer s perceptual decision in each trial, it is possible to estimate what features of the stimuli are critical for the task. The method allows visualizing the critical features that are used in a psychophysical task directly as a spatial correlation map, yielding an effective "behavioral receptive field". Visual context is known to modulate the perception of stimulus features. Some of these interactions are quite complex, and it is not known whether they reflect early or late stages of perceptual processing. The first study investigated the mechanisms of collinear facilitation, where nearby collinear Gabor flankers increase the detectability of a central Gabor. The behavioral receptive field of the mechanism mediating the detection of the central Gabor stimulus was measured by the classification image method. The results show that collinear flankers increase the extent of the behavioral receptive field for the central Gabor, in the direction of the flankers. The increased sensitivity at the ends of the receptive field suggests a low-level explanation for the facilitation. The second study investigated how visual features are integrated into percepts of surface brightness. A novel variant of the classification image method with brightness matching task was used. Many theories assume that perceived brightness is based on the analysis of luminance border features. Here, for the first time this assumption was directly tested. The classification images show that the perceived brightness of both an illusory Craik-O Brien-Cornsweet stimulus and a real uniform step stimulus depends solely on the border. Moreover, the spatial tuning of the features remains almost constant when the stimulus size is changed, suggesting that brightness perception is based on the output of a single spatial frequency channel. The third and fourth studies investigated global form integration in random-dot Glass patterns. In these patterns, a global form can be immediately perceived, if even a small proportion of random dots are paired to dipoles according to a geometrical rule. In the third study the discrimination of orientation structure in highly coherent concentric and Cartesian (straight) Glass patterns was measured. The results showed that the global form was more efficiently discriminated in concentric patterns. The fourth study investigated how form detectability depends on the global regularity of the Glass pattern. The local structure was either Cartesian or curved. It was shown that randomizing the local orientation deteriorated the performance only with the curved pattern. The results give support for the idea that curved and Cartesian patterns are processed in at least partially separate neural systems.
Resumo:
[EN]The human face provides useful information during interaction; therefore, any system integrating Vision- BasedHuman Computer Interaction requires fast and reliable face and facial feature detection. Different approaches have focused on this ability but only open source implementations have been extensively used by researchers. A good example is the Viola–Jones object detection framework that particularly in the context of facial processing has been frequently used.
Resumo:
Dissertação de Mestrado, Engenharia Informática, Faculdade de Ciências e Tecnologia, Universidade do Algarve, 2014
Resumo:
This paper presents visual detection and classification of light vehicles and personnel on a mine site.We capitalise on the rapid advances of ConvNet based object recognition but highlight that a naive black box approach results in a significant number of false positives. In particular, the lack of domain specific training data and the unique landscape in a mine site causes a high rate of errors. We exploit the abundance of background-only images to train a k-means classifier to complement the ConvNet. Furthermore, localisation of objects of interest and a reduction in computation is enabled through region proposals. Our system is tested on over 10km of real mine site data and we were able to detect both light vehicles and personnel. We show that the introduction of our background model can reduce the false positive rate by an order of magnitude.
Resumo:
The latest generation of Deep Convolutional Neural Networks (DCNN) have dramatically advanced challenging computer vision tasks, especially in object detection and object classification, achieving state-of-the-art performance in several computer vision tasks including text recognition, sign recognition, face recognition and scene understanding. The depth of these supervised networks has enabled learning deeper and hierarchical representation of features. In parallel, unsupervised deep learning such as Convolutional Deep Belief Network (CDBN) has also achieved state-of-the-art in many computer vision tasks. However, there is very limited research on jointly exploiting the strength of these two approaches. In this paper, we investigate the learning capability of both methods. We compare the output of individual layers and show that many learnt filters and outputs of the corresponding level layer are almost similar for both approaches. Stacking the DCNN on top of unsupervised layers or replacing layers in the DCNN with the corresponding learnt layers in the CDBN can improve the recognition/classification accuracy and training computational expense. We demonstrate the validity of the proposal on ImageNet dataset.
Resumo:
We describe a novel method for human activity segmentation and interpretation in surveillance applications based on Gabor filter-bank features. A complex human activity is modeled as a sequence of elementary human actions like walking, running, jogging, boxing, hand-waving etc. Since human silhouette can be modeled by a set of rectangles, the elementary human actions can be modeled as a sequence of a set of rectangles with different orientations and scales. The activity segmentation is based on Gabor filter-bank features and normalized spectral clustering. The feature trajectories of an action category are learnt from training example videos using dynamic time warping. The combined segmentation and the recognition processes are very efficient as both the algorithms share the same framework and Gabor features computed for the former can be used for the later. We have also proposed a simple shadow detection technique to extract good silhouette which is necessary for good accuracy of an action recognition technique.
Resumo:
Object recognition requires that templates with canonical views are stored in memory. Such templates must somehow be normalised. In this paper we present a novel method for obtaining 2D translation, rotation and size invariance. Cortical simple, complex and end-stopped cells provide multi-scale maps of lines, edges and keypoints. These maps are combined such that objects are characterised. Dynamic routing in neighbouring neural layers allows feature maps of input objects and stored templates to converge. We illustrate the construction of group templates and the invariance method for object categorisation and recognition in the context of a cortical architecture, which can be applied in computer vision.
Resumo:
The classical computer vision methods can only weakly emulate some of the multi-level parallelisms in signal processing and information sharing that takes place in different parts of the primates’ visual system thus enabling it to accomplish many diverse functions of visual perception. One of the main functions of the primates’ vision is to detect and recognise objects in natural scenes despite all the linear and non-linear variations of the objects and their environment. The superior performance of the primates’ visual system compared to what machine vision systems have been able to achieve to date, motivates scientists and researchers to further explore this area in pursuit of more efficient vision systems inspired by natural models. In this paper building blocks for a hierarchical efficient object recognition model are proposed. Incorporating the attention-based processing would lead to a system that will process the visual data in a non-linear way focusing only on the regions of interest and hence reducing the time to achieve real-time performance. Further, it is suggested to modify the visual cortex model for recognizing objects by adding non-linearities in the ventral path consistent with earlier discoveries as reported by researchers in the neuro-physiology of vision.
Resumo:
The objective of this thesis work, is to propose an algorithm to detect the faces in a digital image with complex background. A lot of work has already been done in the area of face detection, but drawback of some face detection algorithms is the lack of ability to detect faces with closed eyes and open mouth. Thus facial features form an important basis for detection. The current thesis work focuses on detection of faces based on facial objects. The procedure is composed of three different phases: segmentation phase, filtering phase and localization phase. In segmentation phase, the algorithm utilizes color segmentation to isolate human skin color based on its chrominance properties. In filtering phase, Minkowski addition based object removal (Morphological operations) has been used to remove the non-skin regions. In the last phase, Image Processing and Computer Vision methods have been used to find the existence of facial components in the skin regions.This method is effective on detecting a face region with closed eyes, open mouth and a half profile face. The experiment’s results demonstrated that the detection accuracy is around 85.4% and the detection speed is faster when compared to neural network method and other techniques.
Resumo:
The project introduces an application using computer vision for Hand gesture recognition. A camera records a live video stream, from which a snapshot is taken with the help of interface. The system is trained for each type of count hand gestures (one, two, three, four, and five) at least once. After that a test gesture is given to it and the system tries to recognize it.A research was carried out on a number of algorithms that could best differentiate a hand gesture. It was found that the diagonal sum algorithm gave the highest accuracy rate. In the preprocessing phase, a self-developed algorithm removes the background of each training gesture. After that the image is converted into a binary image and the sums of all diagonal elements of the picture are taken. This sum helps us in differentiating and classifying different hand gestures.Previous systems have used data gloves or markers for input in the system. I have no such constraints for using the system. The user can give hand gestures in view of the camera naturally. A completely robust hand gesture recognition system is still under heavy research and development; the implemented system serves as an extendible foundation for future work.
Resumo:
[EN]In this paper, we experimentally study the combination of face and facial feature detectors to improve face detection performance. The face detection problem, as suggeted by recent face detection challenges, is still not solved. Face detectors traditionally fail in large-scale problems and/or when the face is occluded or di erent head rotations are present. The combination of face and facial feature detectors is evaluated with a public database. The obtained results evidence an improvement in the positive detection rate while reducing the false detection rate. Additionally, we prove that the integration of facial feature detectors provides useful information for pose estimation and face alignment.
Resumo:
A more natural, intuitive, user-friendly, and less intrusive Human–Computer interface for controlling an application by executing hand gestures is presented. For this purpose, a robust vision-based hand-gesture recognition system has been developed, and a new database has been created to test it. The system is divided into three stages: detection, tracking, and recognition. The detection stage searches in every frame of a video sequence potential hand poses using a binary Support Vector Machine classifier and Local Binary Patterns as feature vectors. These detections are employed as input of a tracker to generate a spatio-temporal trajectory of hand poses. Finally, the recognition stage segments a spatio-temporal volume of data using the obtained trajectories, and compute a video descriptor called Volumetric Spatiograms of Local Binary Patterns (VS-LBP), which is delivered to a bank of SVM classifiers to perform the gesture recognition. The VS-LBP is a novel video descriptor that constitutes one of the most important contributions of the paper, which is able to provide much richer spatio-temporal information than other existing approaches in the state of the art with a manageable computational cost. Excellent results have been obtained outperforming other approaches of the state of the art.
Resumo:
Objective
Pedestrian detection under video surveillance systems has always been a hot topic in computer vision research. These systems are widely used in train stations, airports, large commercial plazas, and other public places. However, pedestrian detection remains difficult because of complex backgrounds. Given its development in recent years, the visual attention mechanism has attracted increasing attention in object detection and tracking research, and previous studies have achieved substantial progress and breakthroughs. We propose a novel pedestrian detection method based on the semantic features under the visual attention mechanism.
Method
The proposed semantic feature-based visual attention model is a spatial-temporal model that consists of two parts: the static visual attention model and the motion visual attention model. The static visual attention model in the spatial domain is constructed by combining bottom-up with top-down attention guidance. Based on the characteristics of pedestrians, the bottom-up visual attention model of Itti is improved by intensifying the orientation vectors of elementary visual features to make the visual saliency map suitable for pedestrian detection. In terms of pedestrian attributes, skin color is selected as a semantic feature for pedestrian detection. The regional and Gaussian models are adopted to construct the skin color model. Skin feature-based visual attention guidance is then proposed to complete the top-down process. The bottom-up and top-down visual attentions are linearly combined using the proper weights obtained from experiments to construct the static visual attention model in the spatial domain. The spatial-temporal visual attention model is then constructed via the motion features in the temporal domain. Based on the static visual attention model in the spatial domain, the frame difference method is combined with optical flowing to detect motion vectors. Filtering is applied to process the field of motion vectors. The saliency of motion vectors can be evaluated via motion entropy to make the selected motion feature more suitable for the spatial-temporal visual attention model.
Result
Standard datasets and practical videos are selected for the experiments. The experiments are performed on a MATLAB R2012a platform. The experimental results show that our spatial-temporal visual attention model demonstrates favorable robustness under various scenes, including indoor train station surveillance videos and outdoor scenes with swaying leaves. Our proposed model outperforms the visual attention model of Itti, the graph-based visual saliency model, the phase spectrum of quaternion Fourier transform model, and the motion channel model of Liu in terms of pedestrian detection. The proposed model achieves a 93% accuracy rate on the test video.
Conclusion
This paper proposes a novel pedestrian method based on the visual attention mechanism. A spatial-temporal visual attention model that uses low-level and semantic features is proposed to calculate the saliency map. Based on this model, the pedestrian targets can be detected through focus of attention shifts. The experimental results verify the effectiveness of the proposed attention model for detecting pedestrians.