920 resultados para Computer vision teaching


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Robust facial expression recognition (FER) under occluded face conditions is challenging. It requires robust algorithms of feature extraction and investigations into the effects of different types of occlusion on the recognition performance to gain insight. Previous FER studies in this area have been limited. They have spanned recovery strategies for loss of local texture information and testing limited to only a few types of occlusion and predominantly a matched train-test strategy. This paper proposes a robust approach that employs a Monte Carlo algorithm to extract a set of Gabor based part-face templates from gallery images and converts these templates into template match distance features. The resulting feature vectors are robust to occlusion because occluded parts are covered by some but not all of the random templates. The method is evaluated using facial images with occluded regions around the eyes and the mouth, randomly placed occlusion patches of different sizes, and near-realistic occlusion of eyes with clear and solid glasses. Both matched and mis-matched train and test strategies are adopted to analyze the effects of such occlusion. Overall recognition performance and the performance for each facial expression are investigated. Experimental results on the Cohn-Kanade and JAFFE databases demonstrate the high robustness and fast processing speed of our approach, and provide useful insight into the effects of occlusion on FER. The results on the parameter sensitivity demonstrate a certain level of robustness of the approach to changes in the orientation and scale of Gabor filters, the size of templates, and occlusions ratios. Performance comparisons with previous approaches show that the proposed method is more robust to occlusion with lower reductions in accuracy from occlusion of eyes or mouth.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A robust visual tracking system requires an object appearance model that is able to handle occlusion, pose, and illumination variations in the video stream. This can be difficult to accomplish when the model is trained using only a single image. In this paper, we first propose a tracking approach based on affine subspaces (constructed from several images) which are able to accommodate the abovementioned variations. We use affine subspaces not only to represent the object, but also the candidate areas that the object may occupy. We furthermore propose a novel approach to measure affine subspace-to-subspace distance via the use of non-Euclidean geometry of Grassmann manifolds. The tracking problem is then considered as an inference task in a Markov Chain Monte Carlo framework via particle filtering. Quantitative evaluation on challenging video sequences indicates that the proposed approach obtains considerably better performance than several recent state-of-the-art methods such as Tracking-Learning-Detection and MILtrack.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Recent advances suggest that encoding images through Symmetric Positive Definite (SPD) matrices and then interpreting such matrices as points on Riemannian manifolds can lead to increased classification performance. Taking into account manifold geometry is typically done via (1) embedding the manifolds in tangent spaces, or (2) embedding into Reproducing Kernel Hilbert Spaces (RKHS). While embedding into tangent spaces allows the use of existing Euclidean-based learning algorithms, manifold shape is only approximated which can cause loss of discriminatory information. The RKHS approach retains more of the manifold structure, but may require non-trivial effort to kernelise Euclidean-based learning algorithms. In contrast to the above approaches, in this paper we offer a novel solution that allows SPD matrices to be used with unmodified Euclidean-based learning algorithms, with the true manifold shape well-preserved. Specifically, we propose to project SPD matrices using a set of random projection hyperplanes over RKHS into a random projection space, which leads to representing each matrix as a vector of projection coefficients. Experiments on face recognition, person re-identification and texture classification show that the proposed approach outperforms several recent methods, such as Tensor Sparse Coding, Histogram Plus Epitome, Riemannian Locality Preserving Projection and Relational Divergence Classification.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present a novel approach to video summarisation that makes use of a Bag-of-visual-Textures (BoT) approach. Two systems are proposed, one based solely on the BoT approach and another which exploits both colour information and BoT features. On 50 short-term videos from the Open Video Project we show that our BoT and fusion systems both achieve state-of-the-art performance, obtaining an average F-measure of 0.83 and 0.86 respectively, a relative improvement of 9% and 13% when compared to the previous state-of-the-art. When applied to a new underwater surveillance dataset containing 33 long-term videos, the proposed system reduces the amount of footage by a factor of 27, with only minor degradation in the information content. This order of magnitude reduction in video data represents significant savings in terms of time and potential labour cost when manually reviewing such footage.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Traditional nearest points methods use all the samples in an image set to construct a single convex or affine hull model for classification. However, strong artificial features and noisy data may be generated from combinations of training samples when significant intra-class variations and/or noise occur in the image set. Existing multi-model approaches extract local models by clustering each image set individually only once, with fixed clusters used for matching with various image sets. This may not be optimal for discrimination, as undesirable environmental conditions (eg. illumination and pose variations) may result in the two closest clusters representing different characteristics of an object (eg. frontal face being compared to non-frontal face). To address the above problem, we propose a novel approach to enhance nearest points based methods by integrating affine/convex hull classification with an adapted multi-model approach. We first extract multiple local convex hulls from a query image set via maximum margin clustering to diminish the artificial variations and constrain the noise in local convex hulls. We then propose adaptive reference clustering (ARC) to constrain the clustering of each gallery image set by forcing the clusters to have resemblance to the clusters in the query image set. By applying ARC, noisy clusters in the query set can be discarded. Experiments on Honda, MoBo and ETH-80 datasets show that the proposed method outperforms single model approaches and other recent techniques, such as Sparse Approximated Nearest Points, Mutual Subspace Method and Manifold Discriminant Analysis.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper describes a novel system for automatic classification of images obtained from Anti-Nuclear Antibody (ANA) pathology tests on Human Epithelial type 2 (HEp-2) cells using the Indirect Immunofluorescence (IIF) protocol. The IIF protocol on HEp-2 cells has been the hallmark method to identify the presence of ANAs, due to its high sensitivity and the large range of antigens that can be detected. However, it suffers from numerous shortcomings, such as being subjective as well as time and labour intensive. Computer Aided Diagnostic (CAD) systems have been developed to address these problems, which automatically classify a HEp-2 cell image into one of its known patterns (eg. speckled, homogeneous). Most of the existing CAD systems use handpicked features to represent a HEp-2 cell image, which may only work in limited scenarios. We propose a novel automatic cell image classification method termed Cell Pyramid Matching (CPM), which is comprised of regional histograms of visual words coupled with the Multiple Kernel Learning framework. We present a study of several variations of generating histograms and show the efficacy of the system on two publicly available datasets: the ICPR HEp-2 cell classification contest dataset and the SNPHEp-2 dataset.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Recent advances in computer vision and machine learning suggest that a wide range of problems can be addressed more appropriately by considering non-Euclidean geometry. In this paper we explore sparse dictionary learning over the space of linear subspaces, which form Riemannian structures known as Grassmann manifolds. To this end, we propose to embed Grassmann manifolds into the space of symmetric matrices by an isometric mapping, which enables us to devise a closed-form solution for updating a Grassmann dictionary, atom by atom. Furthermore, to handle non-linearity in data, we propose a kernelised version of the dictionary learning algorithm. Experiments on several classification tasks (face recognition, action recognition, dynamic texture classification) show that the proposed approach achieves considerable improvements in discrimination accuracy, in comparison to state-of-the-art methods such as kernelised Affine Hull Method and graph-embedding Grassmann discriminant analysis.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Person re-identification is particularly challenging due to significant appearance changes across separate camera views. In order to re-identify people, a representative human signature should effectively handle differences in illumination, pose and camera parameters. While general appearance-based methods are modelled in Euclidean spaces, it has been argued that some applications in image and video analysis are better modelled via non-Euclidean manifold geometry. To this end, recent approaches represent images as covariance matrices, and interpret such matrices as points on Riemannian manifolds. As direct classification on such manifolds can be difficult, in this paper we propose to represent each manifold point as a vector of similarities to class representers, via a recently introduced form of Bregman matrix divergence known as the Stein divergence. This is followed by using a discriminative mapping of similarity vectors for final classification. The use of similarity vectors is in contrast to the traditional approach of embedding manifolds into tangent spaces, which can suffer from representing the manifold structure inaccurately. Comparative evaluations on benchmark ETHZ and iLIDS datasets for the person re-identification task show that the proposed approach obtains better performance than recent techniques such as Histogram Plus Epitome, Partial Least Squares, and Symmetry-Driven Accumulation of Local Features.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Existing multi-model approaches for image set classification extract local models by clustering each image set individually only once, with fixed clusters used for matching with other image sets. However, this may result in the two closest clusters to represent different characteristics of an object, due to different undesirable environmental conditions (such as variations in illumination and pose). To address this problem, we propose to constrain the clustering of each query image set by forcing the clusters to have resemblance to the clusters in the gallery image sets. We first define a Frobenius norm distance between subspaces over Grassmann manifolds based on reconstruction error. We then extract local linear subspaces from a gallery image set via sparse representation. For each local linear subspace, we adaptively construct the corresponding closest subspace from the samples of a probe image set by joint sparse representation. We show that by minimising the sparse representation reconstruction error, we approach the nearest point on a Grassmann manifold. Experiments on Honda, ETH-80 and Cambridge-Gesture datasets show that the proposed method consistently outperforms several other recent techniques, such as Affine Hull based Image Set Distance (AHISD), Sparse Approximated Nearest Points (SANP) and Manifold Discriminant Analysis (MDA).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis investigates the fusion of 3D visual information with 2D image cues to provide 3D semantic maps of large-scale environments in which a robot traverses for robotic applications. A major theme of this thesis was to exploit the availability of 3D information acquired from robot sensors to improve upon 2D object classification alone. The proposed methods have been evaluated on several indoor and outdoor datasets collected from mobile robotic platforms including a quadcopter and ground vehicle covering several kilometres of urban roads.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Multi-touch interfaces across a wide range of hardware platforms are becoming pervasive. This is due to the adoption of smart phones and tablets in both the consumer and corporate market place. This paper proposes a human-machine interface to interact with unmanned aerial systems based on the philosophy of multi-touch hardware-independent high-level interaction with multiple systems simultaneously. Our approach incorporates emerging development methods for multi-touch interfaces on mobile platforms. A framework is defined for supporting multiple protocols. An open source solution is presented that demonstrates: architecture supporting different communications hardware; an extensible approach for supporting multiple protocols; and the ability to monitor and interact with multiple UAVs from multiple clients simultaneously. Validation tests were conducted to assess the performance, scalability and impact on packet latency under different client configurations.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, we present an approach for image-based surface classification using multi-class Support Vector Machine (SVM). Classifying surfaces in aerial images is an important step towards an increased aircraft autonomy in emergency landing situations. We design a one-vs-all SVM classifier and conduct experiments on five data sets. Results demonstrate consistent overall performance figures over 88% and approximately 8% more accurate to those published on multi-class SVM on the KTH TIPS data set. We also show per-class performance values by using normalised confusion matrices. Our approach is designed to be executed online using a minimum set of feature attributes representing a feasible and ready-to-deploy system for onboard execution.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Aggressive behavior at the steering wheel has been indicated as a contributing factor in a majority of crashes and anger has been compared to alcohol impairment in terms of probability to cause a crash. It has been shown that being in a state of anger or excitement while driving can decrease the drivers’ performances. . This paper reports the evaluation of 6 novel design alternatives of In-Vehicle Information Systems (IVIS) aimed at mitigating driver aggression. Each application presented was designed to tackle the following contributing factors to driver aggression: competitiveness, anonymity, territoriality, stress as well as social and emotional isolation. The 6 applications were simulated using computer vision algorithm to automatically overlay the real traffic conditions with ‘Head-Up Display’ visualizations. Two applications emerged over the others from participant’s evaluation: shared music combined the known calming effect of music with the sense of sympathy and intimacy caused by hearing other drivers’ music. The Shared Snapshot application provided an immediate gratification and was evaluated as a potential prevention of roadside quarrels. The paper presents Theoretical foundation, participant’s evaluations, implications and limitations of the study.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Texture information in the iris image is not uniform in discriminatory information content for biometric identity verification. The bits in an iris code obtained from the image differ in their consistency from one sample to another for the same identity. In this work, errors in bit strings are systematically analysed in order to investigate the effect of light-induced and drug-induced pupil dilation and constriction on the consistency of iris texture information. The statistics of bit errors are computed for client and impostor distributions as functions of radius and angle. Under normal conditions, a V-shaped radial trend of decreasing bit errors towards the central region of the iris is obtained for client matching, and it is observed that the distribution of errors as a function of angle is uniform. When iris images are affected by pupil dilation or constriction the radial distribution of bit errors is altered. A decreasing trend from the pupil outwards is observed for constriction, whereas a more uniform trend is observed for dilation. The main increase in bit errors occurs closer to the pupil in both cases.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In elite sports, nearly all performances are captured on video. Despite the massive amounts of video that has been captured in this domain over the last 10-15 years, most of it remains in an 'unstructured' or 'raw' form, meaning it can only be viewed or manually annotated/tagged with higher-level event labels which is time consuming and subjective. As such, depending on the detail or depth of annotation, the value of the collected repositories of archived data is minimal as it does not lend itself to large-scale analysis and retrieval. One such example is swimming, where each race of a swimmer is captured on a camcorder and in-addition to the split-times (i.e., the time it takes for each lap), stroke rate and stroke-lengths are manually annotated. In this paper, we propose a vision-based system which effectively 'digitizes' a large collection of archived swimming races by estimating the location of the swimmer in each frame, as well as detecting the stroke rate. As the videos are captured from moving hand-held cameras which are located at different positions and angles, we show our hierarchical-based approach to tracking the swimmer and their different parts is robust to these issues and allows us to accurately estimate the swimmer location and stroke rates.