254 resultados para Face recognition from video
Resumo:
We propose a novel multiview fusion scheme for recognizing human identity based on gait biometric data. The gait biometric data is acquired from video surveillance datasets from multiple cameras. Experiments on publicly available CASIA dataset show the potential of proposed scheme based on fusion towards development and implementation of automatic identity recognition systems.
Resumo:
Faces are complex patterns that often differ in only subtle ways. Face recognition algorithms have difficulty in coping with differences in lighting, cameras, pose, expression, etc. We propose a novel approach for facial recognition based on a new feature extraction method called fractal image-set encoding. This feature extraction method is a specialized fractal image coding technique that makes fractal codes more suitable for object and face recognition. A fractal code of a gray-scale image can be divided in two parts – geometrical parameters and luminance parameters. We show that fractal codes for an image are not unique and that we can change the set of fractal parameters without significant change in the quality of the reconstructed image. Fractal image-set coding keeps geometrical parameters the same for all images in the database. Differences between images are captured in the non-geometrical or luminance parameters – which are faster to compute. Results on a subset of the XM2VTS database are presented.
Resumo:
This paper describes a novel framework for facial expression recognition from still images by selecting, optimizing and fusing ‘salient’ Gabor feature layers to recognize six universal facial expressions using the K nearest neighbor classifier. The recognition comparisons with all layer approach using JAFFE and Cohn-Kanade (CK) databases confirm that using ‘salient’ Gabor feature layers with optimized sizes can achieve better recognition performance and dramatically reduce computational time. Moreover, comparisons with the state of the art performances demonstrate the effectiveness of our approach.
Resumo:
A method of improving the security of biometric templates which satisfies desirable properties such as (a) irreversibility of the template, (b) revocability and assignment of a new template to the same biometric input, (c) matching in the secure transformed domain is presented. It makes use of an iterative procedure based on the bispectrum that serves as an irreversible transformation for biometric features because signal phase is discarded each iteration. Unlike the usual hash function, this transformation preserves closeness in the transformed domain for similar biometric inputs. A number of such templates can be generated from the same input. These properties are illustrated using synthetic data and applied to images from the FRGC 3D database with Gabor features. Verification can be successfully performed using these secure templates with an EER of 5.85%
Resumo:
This thesis introduces improved techniques towards automatically estimating the pose of humans from video. It examines a complete workflow to estimating pose, from the segmentation of the raw video stream to extract silhouettes, to using the silhouettes in order to determine the relative orientation of parts of the human body. The proposed segmentation algorithms have improved performance and reduced complexity, while the pose estimation shows superior accuracy during difficult cases of self occlusion.
Resumo:
Recent advances in computer vision and machine learning suggest that a wide range of problems can be addressed more appropriately by considering non-Euclidean geometry. In this paper we explore sparse dictionary learning over the space of linear subspaces, which form Riemannian structures known as Grassmann manifolds. To this end, we propose to embed Grassmann manifolds into the space of symmetric matrices by an isometric mapping, which enables us to devise a closed-form solution for updating a Grassmann dictionary, atom by atom. Furthermore, to handle non-linearity in data, we propose a kernelised version of the dictionary learning algorithm. Experiments on several classification tasks (face recognition, action recognition, dynamic texture classification) show that the proposed approach achieves considerable improvements in discrimination accuracy, in comparison to state-of-the-art methods such as kernelised Affine Hull Method and graph-embedding Grassmann discriminant analysis.
Resumo:
Novel computer vision techniques have been developed to automatically detect unusual events in crowded scenes from video feeds of surveillance cameras. The research is useful in the design of the next generation intelligent video surveillance systems. Two major contributions are the construction of a novel machine learning model for multiple instance learning through compressive sensing, and the design of novel feature descriptors in the compressed video domain.
Resumo:
The latest generation of Deep Convolutional Neural Networks (DCNN) have dramatically advanced challenging computer vision tasks, especially in object detection and object classification, achieving state-of-the-art performance in several computer vision tasks including text recognition, sign recognition, face recognition and scene understanding. The depth of these supervised networks has enabled learning deeper and hierarchical representation of features. In parallel, unsupervised deep learning such as Convolutional Deep Belief Network (CDBN) has also achieved state-of-the-art in many computer vision tasks. However, there is very limited research on jointly exploiting the strength of these two approaches. In this paper, we investigate the learning capability of both methods. We compare the output of individual layers and show that many learnt filters and outputs of the corresponding level layer are almost similar for both approaches. Stacking the DCNN on top of unsupervised layers or replacing layers in the DCNN with the corresponding learnt layers in the CDBN can improve the recognition/classification accuracy and training computational expense. We demonstrate the validity of the proposal on ImageNet dataset.
Resumo:
Automatic speech recognition from multiple distant micro- phones poses significant challenges because of noise and reverberations. The quality of speech acquisition may vary between microphones because of movements of speakers and channel distortions. This paper proposes a channel selection approach for selecting reliable channels based on selection criterion operating in the short-term modulation spectrum domain. The proposed approach quantifies the relative strength of speech from each microphone and speech obtained from beamforming modulations. The new technique is compared experimentally in the real reverb conditions in terms of perceptual evaluation of speech quality (PESQ) measures and word error rate (WER). Overall improvement in recognition rate is observed using delay-sum and superdirective beamformers compared to the case when the channel is selected randomly using circular microphone arrays.
Resumo:
Age estimation from facial images is increasingly receiving attention to solve age-based access control, age-adaptive targeted marketing, amongst other applications. Since even humans can be induced in error due to the complex biological processes involved, finding a robust method remains a research challenge today. In this paper, we propose a new framework for the integration of Active Appearance Models (AAM), Local Binary Patterns (LBP), Gabor wavelets (GW) and Local Phase Quantization (LPQ) in order to obtain a highly discriminative feature representation which is able to model shape, appearance, wrinkles and skin spots. In addition, this paper proposes a novel flexible hierarchical age estimation approach consisting of a multi-class Support Vector Machine (SVM) to classify a subject into an age group followed by a Support Vector Regression (SVR) to estimate a specific age. The errors that may happen in the classification step, caused by the hard boundaries between age classes, are compensated in the specific age estimation by a flexible overlapping of the age ranges. The performance of the proposed approach was evaluated on FG-NET Aging and MORPH Album 2 datasets and a mean absolute error (MAE) of 4.50 and 5.86 years was achieved respectively. The robustness of the proposed approach was also evaluated on a merge of both datasets and a MAE of 5.20 years was achieved. Furthermore, we have also compared the age estimation made by humans with the proposed approach and it has shown that the machine outperforms humans. The proposed approach is competitive with current state-of-the-art and it provides an additional robustness to blur, lighting and expression variance brought about by the local phase features.
Resumo:
State-of-the-art image-set matching techniques typically implicitly model each image-set with a Gaussian distribution. Here, we propose to go beyond these representations and model image-sets as probability distribution functions (PDFs) using kernel density estimators. To compare and match image-sets, we exploit Csiszar´ f-divergences, which bear strong connections to the geodesic distance defined on the space of PDFs, i.e., the statistical manifold. Furthermore, we introduce valid positive definite kernels on the statistical manifold, which let us make use of more powerful classification schemes to match image-sets. Finally, we introduce a supervised dimensionality reduction technique that learns a latent space where f-divergences reflect the class labels of the data. Our experiments on diverse problems, such as video-based face recognition and dynamic texture classification, evidence the benefits of our approach over the state-of-the-art image-set matching methods.
Resumo:
Deep convolutional neural networks (DCNNs) have been employed in many computer vision tasks with great success due to their robustness in feature learning. One of the advantages of DCNNs is their representation robustness to object locations, which is useful for object recognition tasks. However, this also discards spatial information, which is useful when dealing with topological information of the image (e.g. scene labeling, face recognition). In this paper, we propose a deeper and wider network architecture to tackle the scene labeling task. The depth is achieved by incorporating predictions from multiple early layers of the DCNN. The width is achieved by combining multiple outputs of the network. We then further refine the parsing task by adopting graphical models (GMs) as a post-processing step to incorporate spatial and contextual information into the network. The new strategy for a deeper, wider convolutional network coupled with graphical models has shown promising results on the PASCAL-Context dataset.
Resumo:
Shared Services (SS) involves the convergence and streamlining of an organisation’s functions to ensure timely service delivery as effectively and efficiently as possible. As a management structure designed to promote value generation, cost savings and improved service delivery by leveraging on economies of scale, the idea of SS is driven by cost reduction and improvements in quality levels of service and efficiency. Current conventional wisdom is that the potential for SS is increasing due to the increasing costs of changing systems and business requirements for organisations and in implementing and running information systems. In addition, due to commoditisation of large information systems such as enterprise systems, many common, supporting functions across organisations are becoming more similar than not, leading to an increasing overlap in processes and fuelling the notion that it is possible for organisations to derive benefits from collaborating and sharing their common services through an inter-organisational shared services (IOSS) arrangement. While there is some research on traditional SS, very little research has been done on IOSS. In particular, it is unclear what are the potential drivers and inhibitors of IOSS. As the concepts of IOSS and SS are closely related to that of Outsourcing, and their distinction is sometimes blurred, this research has the first objective of seeking a clear conceptual understanding of the differences between SS and Outsourcing (in motivators, arrangements, benefits, disadvantages, etc) and based on this conceptual understanding, the second objective of this research is to develop a decision model (Shared Services Potential model) which would aid organisations in deciding which arrangement would be more appropriate for them to adopt in pursuit of process improvements for their operations. As the context of the study is on universities in higher education sharing administrative services common to or across them and with the assumption that such services were homogenous in nature, this thesis also reports on a case study. The case study involved face to face interviews from representatives of an Australian university to explore the potential for IOSS. Our key findings suggest that it is possible for universities to share services common across them as most of them were currently using the same systems although independently.