274 resultados para SIFT,Computer Vision,Python,Object Recognition,Feature Detection,Descriptor Computation
em Queensland University of Technology - ePrints Archive
Resumo:
Object detection is a fundamental task in many computer vision applications, therefore the importance of evaluating the quality of object detection is well acknowledged in this domain. This process gives insight into the capabilities of methods in handling environmental changes. In this paper, a new method for object detection is introduced that combines the Selective Search and EdgeBoxes. We tested these three methods under environmental variations. Our experiments demonstrate the outperformance of the combination method under illumination and view point variations.
Resumo:
This paper presents 'vSpeak', the first initiative taken in Pakistan for ICT enabled conversion of dynamic Sign Urdu gestures into natural language sentences. To realize this, vSpeak has adopted a novel approach for feature extraction using edge detection and image compression which gives input to the Artificial Neural Network that recognizes the gesture. This technique caters for the blurred images as well. The training and testing is currently being performed on a dataset of 200 patterns of 20 words from Sign Urdu with target accuracy of 90% and above.
Resumo:
Gabor representations have been widely used in facial analysis (face recognition, face detection and facial expression detection) due to their biological relevance and computational properties. Two popular Gabor representations used in literature are: 1) Log-Gabor and 2) Gabor energy filters. Even though these representations are somewhat similar, they also have distinct differences as the Log-Gabor filters mimic the simple cells in the visual cortex while the Gabor energy filters emulate the complex cells, which causes subtle differences in the responses. In this paper, we analyze the difference between these two Gabor representations and quantify these differences on the task of facial action unit (AU) detection. In our experiments conducted on the Cohn-Kanade dataset, we report an average area underneath the ROC curve (A`) of 92.60% across 17 AUs for the Gabor energy filters, while the Log-Gabor representation achieved an average A` of 96.11%. This result suggests that small spatial differences that the Log-Gabor filters pick up on are more useful for AU detection than the differences in contours and edges that the Gabor energy filters extract.
Resumo:
In automatic facial expression detection, very accurate registration is desired which can be achieved via a deformable model approach where a dense mesh of 60-70 points on the face is used, such as an active appearance model (AAM). However, for applications where manually labeling frames is prohibitive, AAMs do not work well as they do not generalize well to unseen subjects. As such, a more coarse approach is taken for person-independent facial expression detection, where just a couple of key features (such as face and eyes) are tracked using a Viola-Jones type approach. The tracked image is normally post-processed to encode for shift and illumination invariance using a linear bank of filters. Recently, it was shown that this preprocessing step is of no benefit when close to ideal registration has been obtained. In this paper, we present a system based on the Constrained Local Model (CLM) which is a generic or person-independent face alignment algorithm which gains high accuracy. We show these results against the LBP feature extraction on the CK+ and GEMEP datasets.
Resumo:
The low resolution of images has been one of the major limitations in recognising humans from a distance using their biometric traits, such as face and iris. Superresolution has been employed to improve the resolution and the recognition performance simultaneously, however the majority of techniques employed operate in the pixel domain, such that the biometric feature vectors are extracted from a super-resolved input image. Feature-domain superresolution has been proposed for face and iris, and is shown to further improve recognition performance by capitalising on direct super-resolving the features which are used for recognition. However, current feature-domain superresolution approaches are limited to simple linear features such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), which are not the most discriminant features for biometrics. Gabor-based features have been shown to be one of the most discriminant features for biometrics including face and iris. This paper proposes a framework to conduct super-resolution in the non-linear Gabor feature domain to further improve the recognition performance of biometric systems. Experiments have confirmed the validity of the proposed approach, demonstrating superior performance to existing linear approaches for both face and iris biometrics.
Resumo:
This paper presents an investigation into event detection in crowded scenes, where the event of interest co-occurs with other activities and only binary labels at the clip level are available. The proposed approach incorporates a fast feature descriptor from the MPEG domain, and a novel multiple instance learning (MIL) algorithm using sparse approximation and random sensing. MPEG motion vectors are used to build particle trajectories that represent the motion of objects in uniform video clips, and the MPEG DCT coefficients are used to compute a foreground map to remove background particles. Trajectories are transformed into the Fourier domain, and the Fourier representations are quantized into visual words using the K-Means algorithm. The proposed MIL algorithm models the scene as a linear combination of independent events, where each event is a distribution of visual words. Experimental results show that the proposed approaches achieve promising results for event detection compared to the state-of-the-art.
Resumo:
Due to the popularity of security cameras in public places, it is of interest to design an intelligent system that can efficiently detect events automatically. This paper proposes a novel algorithm for multi-person event detection. To ensure greater than real-time performance, features are extracted directly from compressed MPEG video. A novel histogram-based feature descriptor that captures the angles between extracted particle trajectories is proposed, which allows us to capture motion patterns of multi-person events in the video. To alleviate the need for fine-grained annotation, we propose the use of Labelled Latent Dirichlet Allocation, a “weakly supervised” method that allows the use of coarse temporal annotations which are much simpler to obtain. This novel system is able to run at approximately ten times real-time, while preserving state-of-theart detection performance for multi-person events on a 100-hour real-world surveillance dataset (TRECVid SED).
Resumo:
There is an increased interest on the use of Unmanned Aerial Vehicles (UAVs) for wildlife and feral animal monitoring around the world. This paper describes a novel system which uses a predictive dynamic application that places the UAV ahead of a user, with a low cost thermal camera, a small onboard computer that identifies heat signatures of a target animal from a predetermined altitude and transmits that target’s GPS coordinates. A map is generated and various data sets and graphs are displayed using a GUI designed for easy use. The paper describes the hardware and software architecture and the probabilistic model for downward facing camera for the detection of an animal. Behavioral dynamics of target movement for the design of a Kalman filter and Markov model based prediction algorithm are used to place the UAV ahead of the user. Geometrical concepts and Haversine formula are applied to the maximum likelihood case in order to make a prediction regarding a future state of the user, thus delivering a new way point for autonomous navigation. Results show that the system is capable of autonomously locating animals from a predetermined height and generate a map showing the location of the animals ahead of the user.
Resumo:
The use of UAVs for remote sensing tasks; e.g. agriculture, search and rescue is increasing. The ability for UAVs to autonomously find a target and perform on-board decision making, such as descending to a new altitude or landing next to a target is a desired capability. Computer-vision functionality allows the Unmanned Aerial Vehicle (UAV) to follow a designated flight plan, detect an object of interest, and change its planned path. In this paper we describe a low cost and an open source system where all image processing is achieved on-board the UAV using a Raspberry Pi 2 microprocessor interfaced with a camera. The Raspberry Pi and the autopilot are physically connected through serial and communicate via MAVProxy. The Raspberry Pi continuously monitors the flight path in real time through USB camera module. The algorithm checks whether the target is captured or not. If the target is detected, the position of the object in frame is represented in Cartesian coordinates and converted into estimate GPS coordinates. In parallel, the autopilot receives the target location approximate GPS and makes a decision to guide the UAV to a new location. This system also has potential uses in the field of Precision Agriculture, plant pest detection and disease outbreaks which cause detrimental financial damage to crop yields if not detected early on. Results show the algorithm is accurate to detect 99% of object of interest and the UAV is capable of navigation and doing on-board decision making.
Resumo:
Acoustically, vehicles are extremely noisy environments and as a consequence audio-only in-car voice recognition systems perform very poorly. Seeing that the visual modality is immune to acoustic noise, using the visual lip information from the driver is seen as a viable strategy in circumventing this problem. However, implementing such an approach requires a system being able to accurately locate and track the driver’s face and facial features in real-time. In this paper we present such an approach using the Viola-Jones algorithm. Using this system, we present our results which show that using the Viola-Jones approach is a suitable method of locating and tracking the driver’s lips despite the visual variability of illumination and head pose.
Resumo:
Computer vision is much more than a technique to sense and recover environmental information from an UAV. It should play a main role regarding UAVs’ functionality because of the big amount of information that can be extracted, its possible uses and applications, and its natural connection to human driven tasks, taking into account that vision is our main interface to world understanding. Our current research’s focus lays on the development of techniques that allow UAVs to maneuver in spaces using visual information as their main input source. This task involves the creation of techniques that allow an UAV to maneuver towards features of interest whenever a GPS signal is not reliable or sufficient, e.g. when signal dropouts occur (which usually happens in urban areas, when flying through terrestrial urban canyons or when operating on remote planetary bodies), or when tracking or inspecting visual targets—including moving ones—without knowing their exact UMT coordinates. This paper also investigates visual serving control techniques that use velocity and position of suitable image features to compute the references for flight control. This paper aims to give a global view of the main aspects related to the research field of computer vision for UAVs, clustered in four main active research lines: visual serving and control, stereo-based visual navigation, image processing algorithms for detection and tracking, and visual SLAM. Finally, the results of applying these techniques in several applications are presented and discussed: this study will encompass power line inspection, mobile target tracking, stereo distance estimation, mapping and positioning.
Resumo:
The quick detection of abrupt (unknown) parameter changes in an observed hidden Markov model (HMM) is important in several applications. Motivated by the recent application of relative entropy concepts in the robust sequential change detection problem (and the related model selection problem), this paper proposes a sequential unknown change detection algorithm based on a relative entropy based HMM parameter estimator. Our proposed approach is able to overcome the lack of knowledge of post-change parameters, and is illustrated to have similar performance to the popular cumulative sum (CUSUM) algorithm (which requires knowledge of the post-change parameter values) when examined, on both simulated and real data, in a vision-based aircraft manoeuvre detection problem.
Resumo:
In this paper we use the algorithm SeqSLAM to address the question, how little and what quality of visual information is needed to localize along a familiar route? We conduct a comprehensive investigation of place recognition performance on seven datasets while varying image resolution (primarily 1 to 512 pixel images), pixel bit depth, field of view, motion blur, image compression and matching sequence length. Results confirm that place recognition using single images or short image sequences is poor, but improves to match or exceed current benchmarks as the matching sequence length increases. We then present place recognition results from two experiments where low-quality imagery is directly caused by sensor limitations; in one, place recognition is achieved along an unlit mountain road by using noisy, long-exposure blurred images, and in the other, two single pixel light sensors are used to localize in an indoor environment. We also show failure modes caused by pose variance and sequence aliasing, and discuss ways in which they may be overcome. By showing how place recognition along a route is feasible even with severely degraded image sequences, we hope to provoke a re-examination of how we develop and test future localization and mapping systems.