339 resultados para visual pattern recognition network


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the field of face recognition, Sparse Representation (SR) has received considerable attention during the past few years. Most of the relevant literature focuses on holistic descriptors in closed-set identification applications. The underlying assumption in SR-based methods is that each class in the gallery has sufficient samples and the query lies on the subspace spanned by the gallery of the same class. Unfortunately, such assumption is easily violated in the more challenging face verification scenario, where an algorithm is required to determine if two faces (where one or both have not been seen before) belong to the same person. In this paper, we first discuss why previous attempts with SR might not be applicable to verification problems. We then propose an alternative approach to face verification via SR. Specifically, we propose to use explicit SR encoding on local image patches rather than the entire face. The obtained sparse signals are pooled via averaging to form multiple region descriptors, which are then concatenated to form an overall face descriptor. Due to the deliberate loss spatial relations within each region (caused by averaging), the resulting descriptor is robust to misalignment & various image deformations. Within the proposed framework, we evaluate several SR encoding techniques: l1-minimisation, Sparse Autoencoder Neural Network (SANN), and an implicit probabilistic technique based on Gaussian Mixture Models. Thorough experiments on AR, FERET, exYaleB, BANCA and ChokePoint datasets show that the proposed local SR approach obtains considerably better and more robust performance than several previous state-of-the-art holistic SR methods, in both verification and closed-set identification problems. The experiments also show that l1-minimisation based encoding has a considerably higher computational than the other techniques, but leads to higher recognition rates.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Safety concerns in the operation of autonomous aerial systems require safe-landing protocols be followed during situations where the a mission should be aborted due to mechanical or other failure. On-board cameras provide information that can be used in the determination of potential landing sites, which are continually updated and ranked to prevent injury and minimize damage. Pulse Coupled Neural Networks have been used for the detection of features in images that assist in the classification of vegetation and can be used to minimize damage to the aerial vehicle. However, a significant drawback in the use of PCNNs is that they are computationally expensive and have been more suited to off-line applications on conventional computing architectures. As heterogeneous computing architectures are becoming more common, an OpenCL implementation of a PCNN feature generator is presented and its performance is compared across OpenCL kernels designed for CPU, GPU and FPGA platforms. This comparison examines the compute times required for network convergence under a variety of images obtained during unmanned aerial vehicle trials to determine the plausibility for real-time feature detection.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The research reported here addresses the problem of detecting and tracking independently moving objects from a moving observer in real time, using corners as object tokens. Local image-plane constraints are employed to solve the correspondence problem removing the need for a 3D motion model. The approach relaxes the restrictive static-world assumption conventionally made, and is therefore capable of tracking independently moving and deformable objects. The technique is novel in that feature detection and tracking is restricted to areas likely to contain meaningful image structure. Feature instantiation regions are defined from a combination of odometry informatin and a limited knowledge of the operating scenario. The algorithms developed have been tested on real image sequences taken from typical driving scenarios. Preliminary experiments on a parallel (transputer) architecture indication that real-time operation is achievable.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech recognition can be improved by using visual information in the form of lip movements of the speaker in addition to audio information. To date, state-of-the-art techniques for audio-visual speech recognition continue to use audio and visual data of the same database for training their models. In this paper, we present a new approach to make use of one modality of an external dataset in addition to a given audio-visual dataset. By so doing, it is possible to create more powerful models from other extensive audio-only databases and adapt them on our comparatively smaller multi-stream databases. Results show that the presented approach outperforms the widely adopted synchronous hidden Markov models (HMM) trained jointly on audio and visual data of a given audio-visual database for phone recognition by 29% relative. It also outperforms the external audio models trained on extensive external audio datasets and also internal audio models by 5.5% and 46% relative respectively. We also show that the proposed approach is beneficial in noisy environments where the audio source is affected by the environmental noise.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a novel crop detection system applied to the challenging task of field sweet pepper (capsicum) detection. The field-grown sweet pepper crop presents several challenges for robotic systems such as the high degree of occlusion and the fact that the crop can have a similar colour to the background (green on green). To overcome these issues, we propose a two-stage system that performs per-pixel segmentation followed by region detection. The output of the segmentation is used to search for highly probable regions and declares these to be sweet pepper. We propose the novel use of the local binary pattern (LBP) to perform crop segmentation. This feature improves the accuracy of crop segmentation from an AUC of 0.10, for previously proposed features, to 0.56. Using the LBP feature as the basis for our two-stage algorithm, we are able to detect 69.2% of field grown sweet peppers in three sites. This is an impressive result given that the average detection accuracy of people viewing the same colour imagery is 66.8%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The problem of determining the script and language of a document image has a number of important applications in the field of document analysis, such as indexing and sorting of large collections of such images, or as a precursor to optical character recognition (OCR). In this paper, we investigate the use of texture as a tool for determining the script of a document image, based on the observation that text has a distinct visual texture. An experimental evaluation of a number of commonly used texture features is conducted on a newly created script database, providing a qualitative measure of which features are most appropriate for this task. Strategies for improving classification results in situations with limited training data and multiple font types are also proposed.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Faces are complex patterns that often differ in only subtle ways. Face recognition algorithms have difficulty in coping with differences in lighting, cameras, pose, expression, etc. We propose a novel approach for facial recognition based on a new feature extraction method called fractal image-set encoding. This feature extraction method is a specialized fractal image coding technique that makes fractal codes more suitable for object and face recognition. A fractal code of a gray-scale image can be divided in two parts – geometrical parameters and luminance parameters. We show that fractal codes for an image are not unique and that we can change the set of fractal parameters without significant change in the quality of the reconstructed image. Fractal image-set coding keeps geometrical parameters the same for all images in the database. Differences between images are captured in the non-geometrical or luminance parameters – which are faster to compute. Results on a subset of the XM2VTS database are presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hybrid face recognition, using image (2D) and structural (3D) information, has explored the fusion of Nearest Neighbour classifiers. This paper examines the effectiveness of feature modelling for each individual modality, 2D and 3D. Furthermore, it is demonstrated that the fusion of feature modelling techniques for the 2D and 3D modalities yields performance improvements over the individual classifiers. By fusing the feature modelling classifiers for each modality with equal weights the average Equal Error Rate improves from 12.60% for the 2D classifier and 12.10% for the 3D classifier to 7.38% for the Hybrid 2D+3D clasiffier.