25 resultados para FACE RECOGNITION
Resumo:
Ear recognition, as a biometric, has several advantages. In particular, ears can be measured remotely and are also relatively static in size and structure for each individual. Unfortunately, at present, good recognition rates require controlled conditions. For commercial use, these systems need to be much more robust. In particular, ears have to be recognized from different angles ( poses), under different lighting conditions, and with different cameras. It must also be possible to distinguish ears from background clutter and identify them when partly occluded by hair, hats, or other objects. The purpose of this paper is to suggest how progress toward such robustness might be achieved through a technique that improves ear registration. The approach focuses on 2-D images, treating the ear as a planar surface that is registered to a gallery using a homography transform calculated from scale-invariant feature-transform feature matches. The feature matches reduce the gallery size and enable a precise ranking using a simple 2-D distance algorithm. Analysis on a range of data sets demonstrates the technique to be robust to background clutter, viewing angles up to +/- 13 degrees, and up to 18% occlusion. In addition, recognition remains accurate with masked ear images as small as 20 x 35 pixels.
Resumo:
This paper presents a novel method of audio-visual feature-level fusion for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowledge about the corruption. Furthermore, we assume there are limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new multimodal feature representation and a modified cosine similarity are introduced to combine and compare bimodal features with limited training data, as well as vastly differing data rates and feature sizes. Optimal feature selection and multicondition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Experiments have been carried out on a bimodal dataset created from the SPIDRE speaker recognition database and AR face recognition database with variable noise corruption of speech and occlusion in the face images. The system's speaker identification performance on the SPIDRE database, and facial identification performance on the AR database, is comparable with the literature. Combining both modalities using the new method of multimodal fusion leads to significantly improved accuracy over the unimodal systems, even when both modalities have been corrupted. The new method also shows improved identification accuracy compared with the bimodal systems based on multicondition model training or missing-feature decoding alone.
Resumo:
CCTV systems are broadly deployed in the present world. Despite this, the impact on anti-social and criminal behaviour has been minimal. Subject reacquisition is a fundamental task to ensure in-time reaction for intelligent surveillance. However, traditional reacquisition based on face recognition is not scalable, hence in this paper we use reasoning techniques to reduce the computational effort which deploys the time-of-flight information between interested zones such as airport security corridors. Also, to improve accuracy of reacquisition, we introduce the idea of revision as a method of post-processing.We demonstrate the significance and usefulness of our framework with an experiment which shows much less computational effort and better accuracy.
Resumo:
The authors are concerned with the development of computer systems that are capable of using information from faces and voices to recognise people's emotions in real-life situations. The paper addresses the nature of the challenges that lie ahead, and provides an assessment of the progress that has been made in the areas of signal processing and analysis techniques (with regard to speech and face), and the psychological and linguistic analyses of emotion. Ongoing developmental work by the authors in each of these areas is described.
Resumo:
In this paper, a novel video-based multimodal biometric verification scheme using the subspace-based low-level feature fusion of face and speech is developed for specific speaker recognition for perceptual human--computer interaction (HCI). In the proposed scheme, human face is tracked and face pose is estimated to weight the detected facelike regions in successive frames, where ill-posed faces and false-positive detections are assigned with lower credit to enhance the accuracy. In the audio modality, mel-frequency cepstral coefficients are extracted for voice-based biometric verification. In the fusion step, features from both modalities are projected into nonlinear Laplacian Eigenmap subspace for multimodal speaker recognition and combined at low level. The proposed approach is tested on the video database of ten human subjects, and the results show that the proposed scheme can attain better accuracy in comparison with the conventional multimodal fusion using latent semantic analysis as well as the single-modality verifications. The experiment on MATLAB shows the potential of the proposed scheme to attain the real-time performance for perceptual HCI applications.
Resumo:
In this paper we demonstrate a simple and novel illumination model that can be used for illumination invariant facial recognition. This model requires no prior knowledge of the illumination conditions and can be used when there is only a single training image per-person. The proposed illumination model separates the effects of illumination over a small area of the face into two components; an additive component modelling the mean illumination and a multiplicative component, modelling the variance within the facial area. Illumination invariant facial recognition is performed in a piecewise manner, by splitting the face image into blocks, then normalizing the illumination within each block based on the new lighting model. The assumptions underlying this novel lighting model have been verified on the YaleB face database. We show that magnitude 2D Fourier features can be used as robust facial descriptors within the new lighting model. Using only a single training image per-person, our new method achieves high (in most cases 100%) identification accuracy on the YaleB, extended YaleB and CMU-PIE face databases.
Resumo:
The contradiction between acknowledgement of cultural differences and their accommodation in public has been a constant theme in studies of diverse societies. This review essay discusses five volumes that grapple with questions of Romani inclusion and the problems Roma face across Europe. The volumes under review point to problems faced by Romani communities and analyse the various legal, political and social challenges that situation of the Roma poses to institutions of contemporary societies. The essay reviews the challenging nature of the status of Roma as we move away from the one-sided towards more reciprocal relationship engagement of state with society in general, and the multiply excluded groups, in particular. The essay finds that the role Roma play in these relationships is either over-, or under-estimated by the literature, largely as a result of limited opportunities to acknowledge and, in effect, accommodate Roma who are rarely understood as actors in their own right.
Resumo:
This paper presents a novel method of audio-visual fusion for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowledge about the corruption. Furthermore, we assume there is a limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new representation and a modified cosine similarity are introduced for combining and comparing bimodal features with limited training data as well as vastly differing data rates and feature sizes. Optimal feature selection and multicondition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Experiments have been carried out on a bimodal data set created from the SPIDRE and AR databases with variable noise corruption of speech and occlusion in the face images. The new method has demonstrated improved recognition accuracy.
Resumo:
The subject of identity continues to attract widespread interest and debate in the social sciences. The nature of who we are, our potential to be different, and our similarity with others, underpins many present-day social issues. This paper contributes to this debate by examining critically the work of Axel Honneth on optimal identity-formation. Although broadly supporting Honneth’s chief construct of inter-personal recognition, a gap in his thinking is highlighted and addressed through proffering a fourth dimension to his tripartite model. This additional dimension requires demonstrations of recognition that instil hope in the face of hardship and empower positive transformations in identity. The implications of this reworked model for social work are then considered in terms of a range of approaches that can be utilised to build flourishing identities characterised by self-esteem, self-confidence, self-respect and self-belief.