9 resultados para Scale invariant

em Deakin Research Online - Australia


Relevância:

60.00% 60.00%

Publicador:

Resumo:

How to recognize human action from videos captured by modern cameras efficiently and effectively is a challenge in real applications. Traditional methods which need professional analysts are facing a bottleneck because of their shortcomings. To cope with the disadvantage, methods based on computer vision techniques, without or with only a few human interventions, have been proposed to analyse human actions in videos automatically. This paper provides a method combining the three dimensional Scale Invariant Feature Transform (SIFT) detector and the Latent Dirichlet Allocation (LDA) model for human motion analysis. To represent videos effectively and robustly, we extract the 3D SIFT descriptor around each interest point, which is sampled densely from 3D Space-time video volumes. After obtaining the representation of each video frame, the LDA model is adopted to discover the underlying structure-the categorization of human actions in the collection of videos. Public available standard datasets are used to test our method. The concluding part discusses the research challenges and future directions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we present a novel person detection system for public transport buses tackling the problem of changing illumination conditions. Our approach integrates a stable SIFT (Scale Invariant Feature Transform) background seat modeling mechanism with a human shape model into a weighted Bayesian framework to detect passengers on-board buses. SIFT background modeling extracts local stable features on the pre-annotated background seat areas and tracks these features over time to build a global statistical background model for each seat. Since SIFT features are partially invariant to lighting, this background model can be used robustly to detect the seat occupancy status even under severe lighting changes. The human shape model further confirms the existence of a passenger when a seat is occupied. This constructs a robust passenger monitoring system which is resilient to illumination changes. We evaluate the performance of our proposed system on a number of challenging video datasets obtained from bus cameras and the experimental results show that it is superior to state-of-art people detection systems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The paper presents the Visual Mouse (VM), a novel and simple system for interaction with displays via hand gestures. Our method includes detecting bare hands using the fast SIFT (Scale-Invariant Feature Transform) algorithm saving long training time of the Adaboost algorithm, tracking hands based on the CAMShift algorithm, recognizing hand gestures in cluttered background via Principle Components Analysis (PCA) without extracting clear-cut hand contour, and defining simple and robustly interpretable vocabularies of hand gestures, which are subsequently used to control a computer mouse. The system provides a fast and simple interaction experience without the need for more expensive hardware and software.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we compare the effectiveness of widely used approaches for representation of facial features in face images. Feature extraction is performed on face images for representation of four facial attributes, namely gender, age, race, and expression, by using discrete wavelet transform (DWT), Gabor wavelet, scale-invariant feature transform, local binary pattern (LBP), and Eigenfaces. After feature extraction and dimension reduction, demographic and expression classification is performed to identify the most discriminating techniques for representation of facial features. Extensive experiments are performed using publicly available face databases, namely Yale, Face95 Essex, and Cohn-Kanade (CK+) databases. Experimental results show that DWT, LBP, and Gabor wavelet methods are robust to variations of illumination, facial expression, and geometric transformations. Experimental results also show that race and expression are more difficult to predict than gender and age.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A multiresolution technique based on multiwavelets scale-space representation for stereo correspondence estimation is presented. The technique uses the well-known coarse-to-fine strategy, involving the calculation of stereo correspondences at the coarsest resolution level with consequent refinement up to the finest level. Vector coefficients of the multiwavelets transform modulus are used as corresponding features, where modulus maxima defines the shift invariant high-level features (multiscale edges) with phase pointing to the normal of the feature surface. The technique addresses the estimation of optimal corresponding points and the corresponding 2D disparity maps. Illuminative variation that can exist between the perspective views of the same scene is controlled using scale normalization at each decomposition level by dividing the details space coefficients with approximation space. The problems of ambiguity, explicitly, and occlusion, implicitly, are addressed by using a geometric topological refinement procedure. Geometric refinement is based on a symbolic tagging procedure introduced to keep only the most consistent matches in consideration. Symbolic tagging is performed based on probability of occurrence and multiple thresholds. The whole procedure is constrained by the uniqueness and continuity of the corresponding stereo features. The comparative performance of the proposed algorithm with eight famous existing algorithms, presented in the literature, is shown to validate the claims of promising performance of the proposed algorithm.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The problem of dimensional defects in aluminum die-castings is widespread throughout the foundry industry and their detection is of paramount importance in maintaining product quality. Due to the unpredictable factory environment and metallic with highly reflective nature, it is extremely hard to estimate true dimensionality of these metallic parts, autonomously. Some existing vision systems are capable of estimating depth to high accuracy, however are very much hardware dependent, involving the use of light and laser pattern projectors, integrated into vision systems or laser scanners. However, due to the reflective nature of these metallic parts and variable factory environments, the aforementioned vision systems tend to exhibit unpromising performance. Moreover, hardware dependency makes these systems cumbersome and costly. In this work, we propose a novel robust 3D reconstruction algorithm capable of reconstructing dimensionally accurate 3D depth models of the aluminum die-castings. The developed system is very simple and cost effective as it consists of only a pair of stereo cameras and a defused fluorescent light. The proposed vision system is capable of estimating surface depths within the accuracy of 0.5mm. In addition, the system is invariant to illuminative variations as well as orientation and location of the objects on the input image space, making the developed system highly robust. Due to its hardware simplicity and robustness, it can be implemented in different factory environments without a significant change in the setup. The proposed system is a major part of quality inspection system for the automotive manufacturing industry.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A multi-resolution technique for matching a stereo pair of images based on translation invariant discrete multi-wavelet transform is presented. The technique uses the well known coarse to fine strategy, involving the calculation of matching points at the coarsest level with consequent refinement up to the finest level. Vector coefficients of the wavelet transform modulus are used as matching features, where modulus maxima defines the shift invariant high-level features (multiscale edges) with phase pointing to the normal of the feature surface. The technique addresses the estimation of optimal corresponding points and the corresponding 2D disparity maps. Illuminative variation that can exist between the perspective views of the same scene is controlled using scale normalization at each decomposition level by dividing the details space coefficients with approximation space and then using normalized correlation. The problem of ambiguity, explicitly, and occlusion, implicitly, is addressed by using a geometric topological refinement procedure and symbolic tagging.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Over the course of the last decade, infrared (IR) and particularly thermal IR imaging based face recognition has emerged as a promising complement to conventional, visible spectrum based approaches which continue to struggle when applied in practice. While inherently insensitive to visible spectrum illumination changes, IR data introduces specific challenges of its own, most notably sensitivity to factors which affect facial heat emission patterns, e.g. emotional state, ambient temperature, and alcohol intake. In addition, facial expression and pose changes are more difficult to correct in IR images because they are less rich in high frequency detail which is an important cue for fitting any deformable model. In this paper we describe a novel method which addresses these major challenges. Specifically, when comparing two thermal IR images of faces, we mutually normalize their poses and facial expressions by using an active appearance model (AAM) to generate synthetic images of the two faces with a neutral facial expression and in the same view (the average of the two input views). This is achieved by piecewise affine warping which follows AAM fitting. A major contribution of our work is the use of an AAM ensemble in which each AAM is specialized to a particular range of poses and a particular region of the thermal IR face space. Combined with the contributions from our previous work which addressed the problem of reliable AAM fitting in the thermal IR spectrum, and the development of a person-specific representation robust to transient changes in the pattern of facial temperature emissions, the proposed ensemble framework accurately matches faces across the full range of yaw from frontal to profile, even in the presence of scale variation (e.g. due to the varying distance of a subject from the camera). The effectiveness of the proposed approach is demonstrated on the largest public database of thermal IR images of faces and a newly acquired data set of thermal IR motion videos. Our approach achieved perfect recognition performance on both data sets, significantly outperforming the current state of the art methods even when they are trained with multiple images spanning a range of head views.