19 resultados para SIFT keypoints
em Queensland University of Technology - ePrints Archive
Resumo:
Wide-angle images exhibit significant distortion for which existing scale-space detectors such as the scale-invariant feature transform (SIFT) are inappropriate. The required scale-space images for feature detection are correctly obtained through the convolution of the image, mapped to the sphere, with the spherical Gaussian. A new visual key-point detector, based on this principle, is developed and several computational approaches to the convolution are investigated in both the spatial and frequency domain. In particular, a close approximation is developed that has comparable computation time to conventional SIFT but with improved matching performance. Results are presented for monocular wide-angle outdoor image sequences obtained using fisheye and equiangular catadioptric cameras. We evaluate the overall matching performance (recall versus 1-precision) of these methods compared to conventional SIFT. We also demonstrate the use of the technique for variable frame-rate visual odometry and its application to place recognition.
Resumo:
This thesis addresses the problem of detecting and describing the same scene points in different wide-angle images taken by the same camera at different viewpoints. This is a core competency of many vision-based localisation tasks including visual odometry and visual place recognition. Wide-angle cameras have a large field of view that can exceed a full hemisphere, and the images they produce contain severe radial distortion. When compared to traditional narrow field of view perspective cameras, more accurate estimates of camera egomotion can be found using the images obtained with wide-angle cameras. The ability to accurately estimate camera egomotion is a fundamental primitive of visual odometry, and this is one of the reasons for the increased popularity in the use of wide-angle cameras for this task. Their large field of view also enables them to capture images of the same regions in a scene taken at very different viewpoints, and this makes them suited for visual place recognition. However, the ability to estimate the camera egomotion and recognise the same scene in two different images is dependent on the ability to reliably detect and describe the same scene points, or ‘keypoints’, in the images. Most algorithms used for this purpose are designed almost exclusively for perspective images. Applying algorithms designed for perspective images directly to wide-angle images is problematic as no account is made for the image distortion. The primary contribution of this thesis is the development of two novel keypoint detectors, and a method of keypoint description, designed for wide-angle images. Both reformulate the Scale- Invariant Feature Transform (SIFT) as an image processing operation on the sphere. As the image captured by any central projection wide-angle camera can be mapped to the sphere, applying these variants to an image on the sphere enables keypoints to be detected in a manner that is invariant to image distortion. Each of the variants is required to find the scale-space representation of an image on the sphere, and they differ in the approaches they used to do this. Extensive experiments using real and synthetically generated wide-angle images are used to validate the two new keypoint detectors and the method of keypoint description. The best of these two new keypoint detectors is applied to vision based localisation tasks including visual odometry and visual place recognition using outdoor wide-angle image sequences. As part of this work, the effect of keypoint coordinate selection on the accuracy of egomotion estimates using the Direct Linear Transform (DLT) is investigated, and a simple weighting scheme is proposed which attempts to account for the uncertainty of keypoint positions during detection. A word reliability metric is also developed for use within a visual ‘bag of words’ approach to place recognition.
Resumo:
Robust descriptor matching across varying lighting conditions is important for vision-based robotics. We present a novel strategy for quantifying the lighting variance of descriptors. The strategy works by utilising recovered low dimensional mappings from Isomap and our measure of the lighting variance of each of these mappings. The resultant metric allows different descriptors to be compared given a dataset and a set of keypoints. We demonstrate that the SIFT descriptor typically has lower lighting variance than other descriptors, although the result depends on semantic class and lighting conditions.
Resumo:
This paper describes the real time global vision system for the robot soccer team the RoboRoos. It has a highly optimised pipeline that includes thresholding, segmenting, colour normalising, object recognition and perspective and lens correction. It has a fast ‘paint’ colour calibration system that can calibrate in any face of the YUV or HSI cube. It also autonomously selects both an appropriate camera gain and colour gains robot regions across the field to achieve colour uniformity. Camera geometry calibration is performed automatically from selection of keypoints on the field. The system achieves a position accuracy of better than 15mm over a 4m × 5.5m field, and orientation accuracy to within 1°. It processes 614 × 480 pixels at 60Hz on a 2.0GHz Pentium 4 microprocessor.
Resumo:
Spontaneous facial expressions differ from posed ones in appearance, timing and accompanying head movements. Still images cannot provide timing or head movement information directly. However, indirectly the distances between key points on a face extracted from a still image using active shape models can capture some movement and pose changes. This information is superposed on information about non-rigid facial movement that is also part of the expression. Does geometric information improve the discrimination between spontaneous and posed facial expressions arising from discrete emotions? We investigate the performance of a machine vision system for discrimination between posed and spontaneous versions of six basic emotions that uses SIFT appearance based features and FAP geometric features. Experimental results on the NVIE database demonstrate that fusion of geometric information leads only to marginal improvement over appearance features. Using fusion features, surprise is the easiest emotion (83.4% accuracy) to be distinguished, while disgust is the most difficult (76.1%). Our results find different important facial regions between discriminating posed versus spontaneous version of one emotion and classifying the same emotion versus other emotions. The distribution of the selected SIFT features shows that mouth is more important for sadness, while nose is more important for surprise, however, both the nose and mouth are important for disgust, fear, and happiness. Eyebrows, eyes, nose and mouth are important for anger.
Resumo:
Facial expression recognition (FER) algorithms mainly focus on classification into a small discrete set of emotions or representation of emotions using facial action units (AUs). Dimensional representation of emotions as continuous values in an arousal-valence space is relatively less investigated. It is not fully known whether fusion of geometric and texture features will result in better dimensional representation of spontaneous emotions. Moreover, the performance of many previously proposed approaches to dimensional representation has not been evaluated thoroughly on publicly available databases. To address these limitations, this paper presents an evaluation framework for dimensional representation of spontaneous facial expressions using texture and geometric features. SIFT, Gabor and LBP features are extracted around facial fiducial points and fused with FAP distance features. The CFS algorithm is adopted for discriminative texture feature selection. Experimental results evaluated on the publicly accessible NVIE database demonstrate that fusion of texture and geometry does not lead to a much better performance than using texture alone, but does result in a significant performance improvement over geometry alone. LBP features perform the best when fused with geometric features. Distributions of arousal and valence for different emotions obtained via the feature extraction process are compared with those obtained from subjective ground truth values assigned by viewers. Predicted valence is found to have a more similar distribution to ground truth than arousal in terms of covariance or Bhattacharya distance, but it shows a greater distance between the means.
Resumo:
Feature extraction and selection are critical processes in developing facial expression recognition (FER) systems. While many algorithms have been proposed for these processes, direct comparison between texture, geometry and their fusion, as well as between multiple selection algorithms has not been found for spontaneous FER. This paper addresses this issue by proposing a unified framework for a comparative study on the widely used texture (LBP, Gabor and SIFT) and geometric (FAP) features, using Adaboost, mRMR and SVM feature selection algorithms. Our experiments on the Feedtum and NVIE databases demonstrate the benefits of fusing geometric and texture features, where SIFT+FAP shows the best performance, while mRMR outperforms Adaboost and SVM. In terms of computational time, LBP and Gabor perform better than SIFT. The optimal combination of SIFT+FAP+mRMR also exhibits a state-of-the-art performance.
Resumo:
Image representations derived from simplified models of the primary visual cortex (V1), such as HOG and SIFT, elicit good performance in a myriad of visual classification tasks including object recognition/detection, pedestrian detection and facial expression classification. A central question in the vision, learning and neuroscience communities regards why these architectures perform so well. In this paper, we offer a unique perspective to this question by subsuming the role of V1-inspired features directly within a linear support vector machine (SVM). We demonstrate that a specific class of such features in conjunction with a linear SVM can be reinterpreted as inducing a weighted margin on the Kronecker basis expansion of an image. This new viewpoint on the role of V1-inspired features allows us to answer fundamental questions on the uniqueness and redundancies of these features, and offer substantial improvements in terms of computational and storage efficiency.
Resumo:
Automated crowd counting has become an active field of computer vision research in recent years. Existing approaches are scene-specific, as they are designed to operate in the single camera viewpoint that was used to train the system. Real world camera networks often span multiple viewpoints within a facility, including many regions of overlap. This paper proposes a novel scene invariant crowd counting algorithm that is designed to operate across multiple cameras. The approach uses camera calibration to normalise features between viewpoints and to compensate for regions of overlap. This compensation is performed by constructing an 'overlap map' which provides a measure of how much an object at one location is visible within other viewpoints. An investigation into the suitability of various feature types and regression models for scene invariant crowd counting is also conducted. The features investigated include object size, shape, edges and keypoints. The regression models evaluated include neural networks, K-nearest neighbours, linear and Gaussian process regresion. Our experiments demonstrate that accurate crowd counting was achieved across seven benchmark datasets, with optimal performance observed when all features were used and when Gaussian process regression was used. The combination of scene invariance and multi camera crowd counting is evaluated by training the system on footage obtained from the QUT camera network and testing it on three cameras from the PETS 2009 database. Highly accurate crowd counting was observed with a mean relative error of less than 10%. Our approach enables a pre-trained system to be deployed on a new environment without any additional training, bringing the field one step closer toward a 'plug and play' system.
Resumo:
This work aims to contribute to the reliability and integrity of perceptual systems of unmanned ground vehicles (UGV). A method is proposed to evaluate the quality of sensor data prior to its use in a perception system by utilising a quality metric applied to heterogeneous sensor data such as visual and infrared camera images. The concept is illustrated specifically with sensor data that is evaluated prior to the use of the data in a standard SIFT feature extraction and matching technique. The method is then evaluated using various experimental data sets that were collected from a UGV in challenging environmental conditions, represented by the presence of airborne dust and smoke. In the first series of experiments, a motionless vehicle is observing a ’reference’ scene, then the method is extended to the case of a moving vehicle by compensating for its motion. This paper shows that it is possible to anticipate degradation of a perception algorithm by evaluating the input data prior to any actual execution of the algorithm.
Resumo:
We present a determination of Delta(f)H(298)(HOO) based upon a negative. ion thermodynamic cycle. The photoelectron spectra of HOO- and DOO- were used to measure the molecular electron affinities (EAs). In a separate experiment, a tandem flowing afterglow-selected ion flow tube (FA-SIFT) was used to measure the forward and reverse rate constants for HOO- + HCdropCH reversible arrow HOOH + HCdropC(-) at 298 K, which gave a value for Delta(acid)H(298)(HOO-H). The experiments yield the following values: EA(HOO) = 1.078 +/- 0.006 eV; T-0((X) over tilde HOO - (A) over tilde HOO) = 0.872 +/- 0.007 eV; EA(DOO) = 1.077 +/- 0.005 eV; T-0((X) over tilde DOO - (A) over tilde DOO) = 0.874 +/- 0.007 eV; Delta(acid)G(298)(HOO-H) = 369.5 +/- 0.4 kcal mol(-1); and Delta(acid)H(298)(HOO-H) = 376.5 +/- 0.4 kcal mol(-1). The acidity/EA thermochemical cycle yields values for the bond enthalpies of DH298(HOO-H) = 87.8 +/- 0.5 kcal mol(-1) and Do(HOO-H) = 86.6 +/- 0.5 kcal mol(-1). We recommend the following values for the heats of formation of the hydroperoxyl radical: Delta(f)H(298)(HOO) = 3.2 +/- 0.5 kcal mol(-1) and Delta(f)H(0)(HOO) = 3.9 +/- 0.5 kcal mol(-1); we recommend that these values supersede those listed in the current NIST-JANAF thermochemical tables.
Resumo:
Methyl, methyl-d(3), and ethyl hydroperoxide anions (CH3OO-, CD3OO-, and CH3CH2OO-) have been prepared by deprotonation of their respective hydroperoxides in a stream of helium buffer, gas. Photodetachment with 364 nm (3.408 eV) radiation was used to measure the adiabatic electron affinities: EA[CH3OO, (X) over tilde (2)A"] = 1.161 +/- 0.005 eV, EA[CD3OO, (X) over tilde (2)A"] = 1.154 +/- 0.004 eV, and EA[CH3CH2OO, (X) over tilde (2)A"] = 1.186 +/- 0.004 eV. The photoelectron spectra yield values for the term energies: DeltaE((X) over tilde 2A"-(A) over tilde 2A')[CH3OO] = 0.914 +/- 0.005 eV, DeltaE((X) over tilde (2)A"-(A) over tilde 2A') [CD3OO] = 0.913 +/- 0.004 eV, and DeltaE((X) over tilde (2)A"-(A) over tilde (2)A')[CH3CH2OO] = 0.938 +/- 0.004 eV. A localized RO-O stretching mode was observed near 1100 cm(-1) for the ground state of all three radicals, and low-frequency R-O-O bending modes are also reported. Proton-transfer kinetics of the hydroperoxides have been measured in a tandem flowing afterglow-selected ion flow tube k(FA-SIFT) to determine the gas-phase acidity of the parent hydroperoxides: Delta (acid)G(298)(CH3OOH) = 367.6 +/- 0.7 kcal mol(-1), Delta (acid)G(298)(CD3OOH) = 367.9 +/- 0.9 kcal mol(-1), and Delta (acid)G(298)(CH3CH2OOH) = 363.9 +/- 2.0 kcal mol(-1). From these acidities we have derived the enthalpies of deprotonation: Delta H-acid(298)(CH3OOH) = 374.6 +/- 1.0 kcal mol(-1), Delta H-acid(298)(CD3OOH) = 374.9 +/- 1.1 kcal mol(-1), and Delta H-acid(298)(CH2CH3OOH) = 371.0 +/- 2.2 kcal mol(-1). Use of the negative-ion acidity/EA cycle provides the ROO-H bond enthalpies: DH298(CH3OO-H) 87.8 +/- 1.0 kcal mol(-1), DH298(CD3OO-H) = 87.9 +/- 1.1 kcal mol(-1), and DH298(CH3CH2OO-H) = 84.8 +/- 2.2 kcal mol(-1). We review the thermochemistry of the peroxyl radicals, CH3OO and CH3CH2OO. Using experimental bond enthalpies, DH298(ROO-H), and CBS/APNO ab initio electronic structure calculations for the energies of the corresponding hydroperoxides, we derive the heats of formation of the peroxyl radicals. The "electron affinity/acidity/CBS" cycle yields Delta H-f(298)[CH3OO] = 4.8 +/- 1.2 kcal mol(-1) and Delta H-f(298)[CH3CH2OO] = -6.8 +/- 2.3 kcal mol(-1).
Resumo:
The collision-induced dissociation ( CID) mass spectra of the \[M-H](-) anions of methyl, ethyl, and tert-butyl hydroperoxides have been measured over a range of collision energies in a flowing afterglow - selected ion flow tube (FA-SIFT) mass spectrometer. Activation of the CH3OO- anion is found to give predominantly HO- fragment anions whilst CH3CH2OO- and (CH3)(3)COO- produce HOO- as the major ionic fragment. These results, and other minor fragmentation pathways, can be rationalized in terms of unimolecular rearrangement of the activated anions with subsequent decomposition. The rearrangement reactions occur via initial abstraction of a proton from the alpha-carbon in the case of CH3OO- or the beta-carbon for CH3CH2OO- and (CH3)(3)COO-. Electronic structure calculations suggest that for the CH3CH2OO- anion, which can theoretically undergo both alpha- and beta-proton abstraction, the latter pathway, resulting in HOO- + CH2CH2, is energetically preferred.
Resumo:
Sparse optical flow algorithms, such as the Lucas-Kanade approach, provide more robustness to noise than dense optical flow algorithms and are the preferred approach in many scenarios. Sparse optical flow algorithms estimate the displacement for a selected number of pixels in the image. These pixels can be chosen randomly. However, pixels in regions with more variance between the neighbours will produce more reliable displacement estimates. The selected pixel locations should therefore be chosen wisely. In this study, the suitability of Harris corners, Shi-Tomasi's “Good features to track", SIFT and SURF interest point extractors, Canny edges, and random pixel selection for the purpose of frame-by-frame tracking using a pyramidical Lucas-Kanade algorithm is investigated. The evaluation considers the important factors of processing time, feature count, and feature trackability in indoor and outdoor scenarios using ground vehicles and unmanned aerial vehicles, and for the purpose of visual odometry estimation.
Resumo:
Existing crowd counting algorithms rely on holistic, local or histogram based features to capture crowd properties. Regression is then employed to estimate the crowd size. Insufficient testing across multiple datasets has made it difficult to compare and contrast different methodologies. This paper presents an evaluation across multiple datasets to compare holistic, local and histogram based methods, and to compare various image features and regression models. A K-fold cross validation protocol is followed to evaluate the performance across five public datasets: UCSD, PETS 2009, Fudan, Mall and Grand Central datasets. Image features are categorised into five types: size, shape, edges, keypoints and textures. The regression models evaluated are: Gaussian process regression (GPR), linear regression, K nearest neighbours (KNN) and neural networks (NN). The results demonstrate that local features outperform equivalent holistic and histogram based features; optimal performance is observed using all image features except for textures; and that GPR outperforms linear, KNN and NN regression