274 resultados para SIFT,Computer Vision,Python,Object Recognition,Feature Detection,Descriptor Computation
Resumo:
We propose an approach to employ eigen light-fields for face recognition across pose on video. Faces of a subject are collected from video frames and combined based on the pose to obtain a set of probe light-fields. These probe data are then projected to the principal subspace of the eigen light-fields within which the classification takes place. We modify the original light-field projection and found that it is more robust in the proposed system. Evaluation on VidTIMIT dataset has demonstrated that the eigen light-fields method is able to take advantage of multiple observations contained in the video.
Resumo:
The Toolbox, combined with MATLAB ® and a modern workstation computer, is a useful and convenient environment for investigation of machine vision algorithms. For modest image sizes the processing rate can be sufficiently ``real-time'' to allow for closed-loop control. Focus of attention methods such as dynamic windowing (not provided) can be used to increase the processing rate. With input from a firewire or web camera (support provided) and output to a robot (not provided) it would be possible to implement a visual servo system entirely in MATLAB. Provides many functions that are useful in machine vision and vision-based control. Useful for photometry, photogrammetry, colorimetry. It includes over 100 functions spanning operations such as image file reading and writing, acquisition, display, filtering, blob, point and line feature extraction, mathematical morphology, homographies, visual Jacobians, camera calibration and color space conversion.
Resumo:
In a commercial environment, it is advantageous to know how long it takes customers to move between different regions, how long they spend in each region, and where they are likely to go as they move from one location to another. Presently, these measures can only be determined manually, or through the use of hardware tags (i.e. RFID). Soft biometrics are characteristics that can be used to describe, but not uniquely identify an individual. They include traits such as height, weight, gender, hair, skin and clothing colour. Unlike traditional biometrics, soft biometrics can be acquired by surveillance cameras at range without any user cooperation. While these traits cannot provide robust authentication, they can be used to provide identification at long range, and aid in object tracking and detection in disjoint camera networks. In this chapter we propose using colour, height and luggage soft biometrics to determine operational statistics relating to how people move through a space. A novel average soft biometric is used to locate people who look distinct, and these people are then detected at various locations within a disjoint camera network to gradually obtain operational statistics
Resumo:
Thermal-infrared imagery is relatively robust to many of the failure conditions of visual and laser-based SLAM systems, such as fog, dust and smoke. The ability to use thermal-infrared video for localization is therefore highly appealing for many applications. However, operating in thermal-infrared is beyond the capacity of existing SLAM implementations. This paper presents the first known monocular SLAM system designed and tested for hand-held use in the thermal-infrared modality. The implementation includes a flexible feature detection layer able to achieve robust feature tracking in high-noise, low-texture thermal images. A novel approach for structure initialization is also presented. The system is robust to irregular motion and capable of handling the unique mechanical shutter interruptions common to thermal-infrared cameras. The evaluation demonstrates promising performance of the algorithm in several environments.
Resumo:
The increasingly widespread use of large-scale 3D virtual environments has translated into an increasing effort required from designers, developers and testers. While considerable research has been conducted into assisting the design of virtual world content and mechanics, to date, only limited contributions have been made regarding the automatic testing of the underpinning graphics software and hardware. In the work presented in this paper, two novel neural network-based approaches are presented to predict the correct visualization of 3D content. Multilayer perceptrons and self-organizing maps are trained to learn the normal geometric and color appearance of objects from validated frames and then used to detect novel or anomalous renderings in new images. Our approach is general, for the appearance of the object is learned rather than explicitly represented. Experiments were conducted on a game engine to determine the applicability and effectiveness of our algorithms. The results show that the neural network technology can be effectively used to address the problem of automatic and reliable visual testing of 3D virtual environments.
Resumo:
In this paper we present a fast power line detection and localisation algorithm as well as propose a high-level guidance architecture for active vision-based Unmanned Aerial Vehicle (UAV) guidance. The detection stage is based on steerable filters for edge ridge detection, followed by a line fitting algorithm to refine candidate power lines in images. The guidance architecture assumes an UAV with an onboard Gimbal camera. We first control the position of the Gimbal such that the power line is in the field of view of the camera. Then its pose is used to generate the appropriate control commands such that the aircraft moves and flies above the lines. We present initial experimental results for the detection stage which shows that the proposed algorithm outperforms two state-of-the-art line detection algorithms for power line detection from aerial imagery.
Resumo:
The increasing popularity of video consumption from mobile devices requires an effective video coding strategy. To overcome diverse communication networks, video services often need to maintain sustainable quality when the available bandwidth is limited. One of the strategy for a visually-optimised video adaptation is by implementing a region-of-interest (ROI) based scalability, whereby important regions can be encoded at a higher quality while maintaining sufficient quality for the rest of the frame. The result is an improved perceived quality at the same bit rate as normal encoding, which is particularly obvious at the range of lower bit rate. However, because of the difficulties of predicting region-of-interest (ROI) accurately, there is a limited research and development of ROI-based video coding for general videos. In this paper, the phase spectrum quaternion of Fourier Transform (PQFT) method is adopted to determine the ROI. To improve the results of ROI detection, the saliency map from the PQFT is augmented with maps created from high level knowledge of factors that are known to attract human attention. Hence, maps that locate faces and emphasise the centre of the screen are used in combination with the saliency map to determine the ROI. The contribution of this paper lies on the automatic ROI detection technique for coding a low bit rate videos which include the ROI prioritisation technique to give different level of encoding qualities for multiple ROIs, and the evaluation of the proposed automatic ROI detection that is shown to have a close performance to human ROI, based on the eye fixation data.
Resumo:
Modelling video sequences by subspaces has recently shown promise for recognising human actions. Subspaces are able to accommodate the effects of various image variations and can capture the dynamic properties of actions. Subspaces form a non-Euclidean and curved Riemannian manifold known as a Grassmann manifold. Inference on manifold spaces usually is achieved by embedding the manifolds in higher dimensional Euclidean spaces. In this paper, we instead propose to embed the Grassmann manifolds into reproducing kernel Hilbert spaces and then tackle the problem of discriminant analysis on such manifolds. To achieve efficient machinery, we propose graph-based local discriminant analysis that utilises within-class and between-class similarity graphs to characterise intra-class compactness and inter-class separability, respectively. Experiments on KTH, UCF Sports, and Ballet datasets show that the proposed approach obtains marked improvements in discrimination accuracy in comparison to several state-of-the-art methods, such as the kernel version of affine hull image-set distance, tensor canonical correlation analysis, spatial-temporal words and hierarchy of discriminative space-time neighbourhood features.
Resumo:
This paper is concerned with the unsupervised learning of object representations by fusing visual and motor information. The problem is posed for a mobile robot that develops its representations as it incrementally gathers data. The scenario is problematic as the robot only has limited information at each time step with which it must generate and update its representations. Object representations are refined as multiple instances of sensory data are presented; however, it is uncertain whether two data instances are synonymous with the same object. This process can easily diverge from stability. The premise of the presented work is that a robot's motor information instigates successful generation of visual representations. An understanding of self-motion enables a prediction to be made before performing an action, resulting in a stronger belief of data association. The system is implemented as a data-driven partially observable semi-Markov decision process. Object representations are formed as the process's hidden states and are coordinated with motor commands through state transitions. Experiments show the prediction process is essential in enabling the unsupervised learning method to converge to a solution - improving precision and recall over using sensory data alone.
Resumo:
The future emergence of many types of airborne vehicles and unpiloted aircraft in the national airspace means collision avoidance is of primary concern in an uncooperative airspace environment. The ability to replicate a pilot’s see and avoid capability using cameras coupled with vision based avoidance control is an important part of an overall collision avoidance strategy. But unfortunately without range collision avoidance has no direct way to guarantee a level of safety. Collision scenario flight tests with two aircraft and a monocular camera threat detection and tracking system were used to study the accuracy of image-derived angle measurements. The effect of image-derived angle errors on reactive vision-based avoidance performance was then studied by simulation. The results show that whilst large angle measurement errors can significantly affect minimum ranging characteristics across a variety of initial conditions and closing speeds, the minimum range is always bounded and a collision never occurs.
Resumo:
In the field of face recognition, Sparse Representation (SR) has received considerable attention during the past few years. Most of the relevant literature focuses on holistic descriptors in closed-set identification applications. The underlying assumption in SR-based methods is that each class in the gallery has sufficient samples and the query lies on the subspace spanned by the gallery of the same class. Unfortunately, such assumption is easily violated in the more challenging face verification scenario, where an algorithm is required to determine if two faces (where one or both have not been seen before) belong to the same person. In this paper, we first discuss why previous attempts with SR might not be applicable to verification problems. We then propose an alternative approach to face verification via SR. Specifically, we propose to use explicit SR encoding on local image patches rather than the entire face. The obtained sparse signals are pooled via averaging to form multiple region descriptors, which are then concatenated to form an overall face descriptor. Due to the deliberate loss spatial relations within each region (caused by averaging), the resulting descriptor is robust to misalignment & various image deformations. Within the proposed framework, we evaluate several SR encoding techniques: l1-minimisation, Sparse Autoencoder Neural Network (SANN), and an implicit probabilistic technique based on Gaussian Mixture Models. Thorough experiments on AR, FERET, exYaleB, BANCA and ChokePoint datasets show that the proposed local SR approach obtains considerably better and more robust performance than several previous state-of-the-art holistic SR methods, in both verification and closed-set identification problems. The experiments also show that l1-minimisation based encoding has a considerably higher computational than the other techniques, but leads to higher recognition rates.
Resumo:
Our daily lives become more and more dependent upon smartphones due to their increased capabilities. Smartphones are used in various ways, e.g. for payment systems or assisting the lives of elderly or disabled people. Security threats for these devices become more and more dangerous since there is still a lack of proper security tools for protection. Android emerges as an open smartphone platform which allows modification even on operating system level and where third-party developers first time have the opportunity to develop kernel-based low-level security tools. Android quickly gained its popularity among smartphone developers and even beyond since it bases on Java on top of "open" Linux in comparison to former proprietary platforms which have very restrictive SDKs and corresponding APIs. Symbian OS, holding the greatest market share among all smartphone OSs, was even closing critical APIs to common developers and introduced application certification. This was done since this OS was the main target for smartphone malwares in the past. In fact, more than 290 malwares designed for Symbian OS appeared from July 2004 to July 2008. Android, in turn, promises to be completely open source. Together with the Linux-based smartphone OS OpenMoko, open smartphone platforms may attract malware writers for creating malicious applications endangering the critical smartphone applications and owners privacy. Since signature-based approaches mainly detect known malwares, anomaly-based approaches can be a valuable addition to these systems. They base on mathematical algorithms processing data that describe the state of a certain device. For gaining this data, a monitoring client is needed that has to extract usable information (features) from the monitored system. Our approach follows a dual system for analyzing these features. On the one hand, functionality for on-device light-weight detection is provided. But since most algorithms are resource exhaustive, remote feature analysis is provided on the other hand. Having this dual system enables event-based detection that can react to the current detection need. In our ongoing research we aim to investigates the feasibility of light-weight on-device detection for certain occasions. On other occasions, whenever significant changes are detected on the device, the system can trigger remote detection with heavy-weight algorithms for better detection results. In the absence of the server respectively as a supplementary approach, we also consider a collaborative scenario. Here, mobile devices sharing a common objective are enabled by a collaboration module to share information, such as intrusion detection data and results. This is based on an ad-hoc network mode that can be provided by a WiFi or Bluetooth adapter nearly every smartphone possesses.
Resumo:
Abstract. In recent years, sparse representation based classification(SRC) has received much attention in face recognition with multipletraining samples of each subject. However, it cannot be easily applied toa recognition task with insufficient training samples under uncontrolledenvironments. On the other hand, cohort normalization, as a way of mea-suring the degradation effect under challenging environments in relationto a pool of cohort samples, has been widely used in the area of biometricauthentication. In this paper, for the first time, we introduce cohort nor-malization to SRC-based face recognition with insufficient training sam-ples. Specifically, a user-specific cohort set is selected to normalize theraw residual, which is obtained from comparing the test sample with itssparse representations corresponding to the gallery subject, using poly-nomial regression. Experimental results on AR and FERET databases show that cohort normalization can bring SRC much robustness against various forms of degradation factors for undersampled face recognition.
Resumo:
To recognize faces in video, face appearances have been widely modeled as piece-wise local linear models which linearly approximate the smooth yet non-linear low dimensional face appearance manifolds. The choice of representations of the local models is crucial. Most of the existing methods learn each local model individually meaning that they only anticipate variations within each class. In this work, we propose to represent local models as Gaussian distributions which are learned simultaneously using the heteroscedastic probabilistic linear discriminant analysis (PLDA). Each gallery video is therefore represented as a collection of such distributions. With the PLDA, not only the within-class variations are estimated during the training, the separability between classes is also maximized leading to an improved discrimination. The heteroscedastic PLDA itself is adapted from the standard PLDA to approximate face appearance manifolds more accurately. Instead of assuming a single global within-class covariance, the heteroscedastic PLDA learns different within-class covariances specific to each local model. In the recognition phase, a probe video is matched against gallery samples through the fusion of point-to-model distances. Experiments on the Honda and MoBo datasets have shown the merit of the proposed method which achieves better performance than the state-of-the-art technique.
Resumo:
This paper presents an alternative approach to image segmentation by using the spatial distribution of edge pixels as opposed to pixel intensities. The segmentation is achieved by a multi-layered approach and is intended to find suitable landing areas for an aircraft emergency landing. We combine standard techniques (edge detectors) with novel developed algorithms (line expansion and geometry test) to design an original segmentation algorithm. Our approach removes the dependency on environmental factors that traditionally influence lighting conditions, which in turn have negative impact on pixel-based segmentation techniques. We present test outcomes on realistic visual data collected from an aircraft, reporting on preliminary feedback about the performance of the detection. We demonstrate consistent performances over 97% detection rate.