608 resultados para invariance
Resumo:
Gaining invariance to camera and illumination variations has been a well investigated topic in Active Appearance Model (AAM) fitting literature. The major problem lies in the inability of the appearance parameters of the AAM to generalize to unseen conditions. An attractive approach for gaining invariance is to fit an AAM to a multiple filter response (e.g. Gabor) representation of the input image. Naively applying this concept with a traditional AAM is computationally prohibitive, especially as the number of filter responses increase. In this paper, we present a computationally efficient AAM fitting algorithm based on the Lucas-Kanade (LK) algorithm posed in the Fourier domain that affords invariance to both expression and illumination. We refer to this as a Fourier AAM (FAAM), and show that this method gives substantial improvement in person specific AAM fitting performance over traditional AAM fitting methods.
Resumo:
A new algorithm for extracting features from images for object recognition is described. The algorithm uses higher order spectra to provide desirable invariance properties, to provide noise immunity, and to incorporate nonlinearity into the feature extraction procedure thereby allowing the use of simple classifiers. An image can be reduced to a set of 1D functions via the Radon transform, or alternatively, the Fourier transform of each 1D projection can be obtained from a radial slice of the 2D Fourier transform of the image according to the Fourier slice theorem. A triple product of Fourier coefficients, referred to as the deterministic bispectrum, is computed for each 1D function and is integrated along radial lines in bifrequency space. Phases of the integrated bispectra are shown to be translation- and scale-invariant. Rotation invariance is achieved by a regrouping of these invariants at a constant radius followed by a second stage of invariant extraction. Rotation invariance is thus converted to translation invariance in the second step. Results using synthetic and actual images show that isolated, compact clusters are formed in feature space. These clusters are linearly separable, indicating that the nonlinearity required in the mapping from the input space to the classification space is incorporated well into the feature extraction stage. The use of higher order spectra results in good noise immunity, as verified with synthetic and real images. Classification of images using the higher order spectra-based algorithm compares favorably to classification using the method of moment invariants
Resumo:
A new approach to recognition of images using invariant features based on higher-order spectra is presented. Higher-order spectra are translation invariant because translation produces linear phase shifts which cancel. Scale and amplification invariance are satisfied by the phase of the integral of a higher-order spectrum along a radial line in higher-order frequency space because the contour of integration maps onto itself and both the real and imaginary parts are affected equally by the transformation. Rotation invariance is introduced by deriving invariants from the Radon transform of the image and using the cyclic-shift invariance property of the discrete Fourier transform magnitude. Results on synthetic and actual images show isolated, compact clusters in feature space and high classification accuracies
Resumo:
This paper describes a scene invariant crowd counting algorithm that uses local features to monitor crowd size. Unlike previous algorithms that require each camera to be trained separately, the proposed method uses camera calibration to scale between viewpoints, allowing a system to be trained and tested on different scenes. A pre-trained system could therefore be used as a turn-key solution for crowd counting across a wide range of environments. The use of local features allows the proposed algorithm to calculate local occupancy statistics, and Gaussian process regression is used to scale to conditions which are unseen in the training data, also providing confidence intervals for the crowd size estimate. A new crowd counting database is introduced to the computer vision community to enable a wider evaluation over multiple scenes, and the proposed algorithm is tested on seven datasets to demonstrate scene invariance and high accuracy. To the authors' knowledge this is the first system of its kind due to its ability to scale between different scenes and viewpoints.
Resumo:
Gait recognition approaches continue to struggle with challenges including view-invariance, low-resolution data, robustness to unconstrained environments, and fluctuating gait patterns due to subjects carrying goods or wearing different clothes. Although computationally expensive, model based techniques offer promise over appearance based techniques for these challenges as they gather gait features and interpret gait dynamics in skeleton form. In this paper, we propose a fast 3D ellipsoidal-based gait recognition algorithm using a 3D voxel model derived from multi-view silhouette images. This approach directly solves the limitations of view dependency and self-occlusion in existing ellipse fitting model-based approaches. Voxel models are segmented into four components (left and right legs, above and below the knee), and ellipsoids are fitted to each region using eigenvalue decomposition. Features derived from the ellipsoid parameters are modeled using a Fourier representation to retain the temporal dynamic pattern for classification. We demonstrate the proposed approach using the CMU MoBo database and show that an improvement of 15-20% can be achieved over a 2D ellipse fitting baseline.
Resumo:
In public places, crowd size may be an indicator of congestion, delay, instability, or of abnormal events, such as a fight, riot or emergency. Crowd related information can also provide important business intelligence such as the distribution of people throughout spaces, throughput rates, and local densities. A major drawback of many crowd counting approaches is their reliance on large numbers of holistic features, training data requirements of hundreds or thousands of frames per camera, and that each camera must be trained separately. This makes deployment in large multi-camera environments such as shopping centres very costly and difficult. In this chapter, we present a novel scene-invariant crowd counting algorithm that uses local features to monitor crowd size. The use of local features allows the proposed algorithm to calculate local occupancy statistics, scale to conditions which are unseen in the training data, and be trained on significantly less data. Scene invariance is achieved through the use of camera calibration, allowing the system to be trained on one or more viewpoints and then deployed on any number of new cameras for testing without further training. A pre-trained system could then be used as a ‘turn-key’ solution for crowd counting across a wide range of environments, eliminating many of the costly barriers to deployment which currently exist.
Resumo:
The rank transform is one non-parametric transform which has been applied to the stereo matching problem The advantages of this transform include its invariance to radio metric distortion and its amenability to hardware implementation. This paper describes the derivation of the rank constraint for matching using the rank transform Previous work has shown that this constraint was capable of resolving ambiguous matches thereby improving match reliability A new matching algorithm incorporating this constraint was also proposed. This paper extends on this previous work by proposing a matching algorithm which uses a dimensional match surface in which the match score is computed for every possible template and match window combination. The principal advantage of this algorithm is that the use of the match surface enforces the left�right consistency and uniqueness constraints thus improving the algorithms ability to remove invalid matches Experimental results for a number of test stereo pairs show that the new algorithm is capable of identifying and removing a large number of in incorrect matches particularly in the case of occlusions
Resumo:
The rank transform is a non-parametric technique which has been recently proposed for the stereo matching problem. The motivation behind its application to the matching problem is its invariance to certain types of image distortion and noise, as well as its amenability to real-time implementation. This paper derives an analytic expression for the process of matching using the rank transform, and then goes on to derive one constraint which must be satisfied for a correct match. This has been dubbed the rank order constraint or simply the rank constraint. Experimental work has shown that this constraint is capable of resolving ambiguous matches, thereby improving matching reliability. This constraint was incorporated into a new algorithm for matching using the rank transform. This modified algorithm resulted in an increased proportion of correct matches, for all test imagery used.
Resumo:
A fundamental problem faced by stereo matching algorithms is the matching or correspondence problem. A wide range of algorithms have been proposed for the correspondence problem. For all matching algorithms, it would be useful to be able to compute a measure of the probability of correctness, or reliability of a match. This paper focuses in particular on one class for matching algorithms, which are based on the rank transform. The interest in these algorithms for stereo matching stems from their invariance to radiometric distortion, and their amenability to fast hardware implementation. This work differs from previous work in that it derives, from first principles, an expression for the probability of a correct match. This method was based on an enumeration of all possible symbols for matching. The theoretical results for disparity error prediction, obtained using this method, were found to agree well with experimental results. However, disadvantages of the technique developed in this chapter are that it is not easily applicable to real images, and also that it is too computationally expensive for practical window sizes. Nevertheless, the exercise provides an interesting and novel analysis of match reliability.
Resumo:
In this paper we propose a framework for both gradient descent image and object alignment in the Fourier domain. Our method centers upon the classical Lucas & Kanade (LK) algorithm where we represent the source and template/model in the complex 2D Fourier domain rather than in the spatial 2D domain. We refer to our approach as the Fourier LK (FLK) algorithm. The FLK formulation is advantageous when one pre-processes the source image and template/model with a bank of filters (e.g. oriented edges, Gabor, etc.) as: (i) it can handle substantial illumination variations, (ii) the inefficient pre-processing filter bank step can be subsumed within the FLK algorithm as a sparse diagonal weighting matrix, (iii) unlike traditional LK the computational cost is invariant to the number of filters and as a result far more efficient, and (iv) this approach can be extended to the inverse compositional form of the LK algorithm where nearly all steps (including Fourier transform and filter bank pre-processing) can be pre-computed leading to an extremely efficient and robust approach to gradient descent image matching. Further, these computational savings translate to non-rigid object alignment tasks that are considered extensions of the LK algorithm such as those found in Active Appearance Models (AAMs).
Resumo:
Automated crowd counting has become an active field of computer vision research in recent years. Existing approaches are scene-specific, as they are designed to operate in the single camera viewpoint that was used to train the system. Real world camera networks often span multiple viewpoints within a facility, including many regions of overlap. This paper proposes a novel scene invariant crowd counting algorithm that is designed to operate across multiple cameras. The approach uses camera calibration to normalise features between viewpoints and to compensate for regions of overlap. This compensation is performed by constructing an 'overlap map' which provides a measure of how much an object at one location is visible within other viewpoints. An investigation into the suitability of various feature types and regression models for scene invariant crowd counting is also conducted. The features investigated include object size, shape, edges and keypoints. The regression models evaluated include neural networks, K-nearest neighbours, linear and Gaussian process regresion. Our experiments demonstrate that accurate crowd counting was achieved across seven benchmark datasets, with optimal performance observed when all features were used and when Gaussian process regression was used. The combination of scene invariance and multi camera crowd counting is evaluated by training the system on footage obtained from the QUT camera network and testing it on three cameras from the PETS 2009 database. Highly accurate crowd counting was observed with a mean relative error of less than 10%. Our approach enables a pre-trained system to be deployed on a new environment without any additional training, bringing the field one step closer toward a 'plug and play' system.
Resumo:
Consistency and invariance in movements are traditionally viewed as essential features of skill acquisition and elite sports performance. This emphasis on the stabilization of action has resulted in important processes of adaptation in movement coordination during performance being overlooked in investigations of elite sport performance. Here we investigate whether differences exist between the movement kinematics displayed by five, elite springboard divers (age 17 ± 2.4 years) in the preparation phases of baulked and completed take-offs. The two-dimensional kinematic characteristics of the reverse somersault take-off phases (approach and hurdle) were recorded during normal training sessions and used for intra-individual analysis. All participants displayed observable differences in movement patterns at key events during the approach phase; however, the presence of similar global topological characteristics suggested that, overall, participants did not perform distinctly different movement patterns during completed and baulked dives. These findings provide a powerful rationale for coaches to consider assessing functional variability or adaptability of motor behaviour as a key criterion of successful performance in sports such as diving.
Resumo:
Polarisation diversity is a technique to improve the quality of mobile communications, but its reliability is suboptimal because it depends on the mobile channel and the antenna orientations at both ends of the mobile link. A method to optimise the reliability is established by minimising the dependency on antenna orientations. While the mobile base station can have fixed antenna orientation, the mobile terminal is typically a handheld device with random orientations. This means orientation invariance needs to be established at the receiver in the downlink, and at the transmitter in the uplink. This research presents separate solutions for both cases, and is based on the transmission of an elliptically polarised signal synthesised from the channel statistics. Complete receiver orientation invariance is achieved in the downlink. Effects of the transmitter orientation are minimised in the uplink.
Resumo:
This paper presents Sequence Matching Across Route Traversals (SMART); a generally applicable sequence-based place recognition algorithm. SMART provides invariance to changes in illumination and vehicle speed while also providing moderate pose invariance and robustness to environmental aliasing. We evaluate SMART on vehicles travelling at highly variable speeds in two challenging environments; firstly, on an all-terrain vehicle in an off-road, forest track and secondly, using a passenger car traversing an urban environment across day and night. We provide comparative results to the current state-of-the-art SeqSLAM algorithm and investigate the effects of altering SMART’s image matching parameters. Additionally, we conduct an extensive study of the relationship between image sequence length and SMART’s matching performance. Our results show viable place recognition performance in both environments with short 10-metre sequences, and up to 96% recall at 100% precision across extreme day-night cycles when longer image sequences are used.
Resumo:
Gaining invariance to camera and illumination variations has been a well investigated topic in Active Appearance Model (AAM) fitting literature. The major problem lies in the inability of the appearance parameters of the AAM to generalize to unseen conditions. An attractive approach for gaining invariance is to fit an AAM to a multiple filter response (e.g. Gabor) representation of the input image. Naively applying this concept with a traditional AAM is computationally prohibitive, especially as the number of filter responses increase. In this paper, we present a computationally efficient AAM fitting algorithm based on the Lucas-Kanade (LK) algorithm posed in the Fourier domain that affords invariance to both expression and illumination. We refer to this as a Fourier AAM (FAAM), and show that this method gives substantial improvement in person specific AAM fitting performance over traditional AAM fitting methods.