800 resultados para Face recognition from video
Resumo:
A new neural network architecture is introduced for the recognition of pattern classes after supervised and unsupervised learning. Applications include spatio-temporal image understanding and prediction and 3-D object recognition from a series of ambiguous 2-D views. The architecture, called ART-EMAP, achieves a synthesis of adaptive resonance theory (ART) and spatial and temporal evidence integration for dynamic predictive mapping (EMAP). ART-EMAP extends the capabilities of fuzzy ARTMAP in four incremental stages. Stage 1 introduces distributed pattern representation at a view category field. Stage 2 adds a decision criterion to the mapping between view and object categories, delaying identification of ambiguous objects when faced with a low confidence prediction. Stage 3 augments the system with a field where evidence accumulates in medium-term memory (MTM). Stage 4 adds an unsupervised learning process to fine-tune performance after the limited initial period of supervised network training. Each ART-EMAP stage is illustrated with a benchmark simulation example, using both noisy and noise-free data. A concluding set of simulations demonstrate ART-EMAP performance on a difficult 3-D object recognition problem.
Resumo:
For many applications of emotion recognition, such as virtual agents, the system must select responses while the user is speaking. This requires reliable on-line recognition of the user’s affect. However most emotion recognition systems are based on turnwise processing. We present a novel approach to on-line emotion recognition from speech using Long Short-Term Memory Recurrent Neural Networks. Emotion is recognised frame-wise in a two-dimensional valence-activation continuum. In contrast to current state-of-the-art approaches, recognition is performed on low-level signal frames, similar to those used for speech recognition. No statistical functionals are applied to low-level feature contours. Framing at a higher level is therefore unnecessary and regression outputs can be produced in real-time for every low-level input frame. We also investigate the benefits of including linguistic features on the signal frame level obtained by a keyword spotter.
Resumo:
The objective of this multicentre study was to undertake a systematic comparison of face-to-face consultations and teleconsultations performed using low-cost videoconferencing equipment. One hundred and twenty-six patients were enrolled by their general practitioners across three sites. Each patient underwent a teleconsultation with a distant dermatologist followed by a traditional face-to-face consultation with a dermatologist. The main outcome measures were diagnostic concordance rates, management plans and patient and doctor satisfaction. One hundred and fifty-five diagnoses were identified by the face-to-face consultations from the sample of 126 patients. Identical diagnoses were recorded from both types of consultation in 59% of cases. Teledermatology consultations missed a secondary diagnosis in 6% of cases and were unable to make a useful diagnosis in 11% of cases. Wrong diagnoses were made by the teledermatologist in 4% of cases. Dermatologists were able to make a definitive diagnosis by face-to-face consultations in significantly more cases than by teleconsultations (P = 0.001). Where both types of consultation resulted in a single diagnosis there was a high level of agreement (kappa = 0.96, lower 95% confidence limit 0.91-1.00). Overall follow-up rates from both types of consultation were almost identical. Fifty per cent of patients seen could have been managed using a single videoconferenced teleconsultation without any requirement for further specialist intervention. Patients reported high levels of satisfaction with the teleconsultations. General practitioners reported that 75% of the teleconsultations were of educational benefit. This study illustrates the potential of telemedicine to diagnose and manage dermatology cases referred from primary care. Once the problem of image quality has been addressed, further studies will be required to investigate the cost-effectiveness of a teledermatology service and the potential consequences for the provision of dermatological services in the U.K.
Resumo:
This paper presents a novel method of audio-visual feature-level fusion for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowledge about the corruption. Furthermore, we assume there are limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new multimodal feature representation and a modified cosine similarity are introduced to combine and compare bimodal features with limited training data, as well as vastly differing data rates and feature sizes. Optimal feature selection and multicondition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Experiments have been carried out on a bimodal dataset created from the SPIDRE speaker recognition database and AR face recognition database with variable noise corruption of speech and occlusion in the face images. The system's speaker identification performance on the SPIDRE database, and facial identification performance on the AR database, is comparable with the literature. Combining both modalities using the new method of multimodal fusion leads to significantly improved accuracy over the unimodal systems, even when both modalities have been corrupted. The new method also shows improved identification accuracy compared with the bimodal systems based on multicondition model training or missing-feature decoding alone.
Resumo:
Empirical studies concerning face recognition suggest that faces may be stored in memory by a few canonical representations. Models of visual perception are based on image representations in cortical area V1 and beyond, which contain many cell layers for feature extractions. Simple, complex and end-stopped cells tuned to different spatial frequencies (scales) and/or orientations provide input for line, edge and keypoint detection. This yields a rich, multi-scale object representation that can be stored in memory in order to identify objects. The multi-scale, keypoint-based saliency maps for Focus-of-Attention can be explored to obtain face detection and normalization, after which face recognition can be achieved using the line/edge representation. In this paper, we focus only on face normalization, showing that multi-scale keypoints can be used to construct canonical representations of faces in memory.
Resumo:
In this paper we present an improved model for line and edge detection in cortical area V1. This model is based on responses of simple and complex cells, and it is multi-scale with no free parameters. We illustrate the use of the multi-scale line/edge representation in different processes: visual reconstruction or brightness perception, automatic scale selection and object segregation. A two-level object categorization scenario is tested in which pre-categorization is based on coarse scales only and final categorization on coarse plus fine scales. We also present a multi-scale object and face recognition model. Processing schemes are discussed in the framework of a complete cortical architecture. The fact that brightness perception and object recognition may be based on the same symbolic image representation is an indication that the entire (visual) cortex is involved in consciousness.
Resumo:
Empirical studies concerning face recognition suggest that faces may be stored in memory by a few canonical representations. Models of visual perception are based on image representations in cortical area V1 and beyond, which contain many cell layers for feature extraction. Simple, complex and end-stopped cells provide input for line, edge and keypoint detection. Detected events provide a rich, multi-scale object representation, and this representation can be stored in memory in order to identify objects. In this paper, the above context is applied to face recognition. The multi-scale line/edge representation is explored in conjunction with keypoint-based saliency maps for Focus-of-Attention. Recognition rates of up to 96% were achieved by combining frontal and 3/4 views, and recognition was quite robust against partial occlusions.
Resumo:
Investigations into the evolutionary origins of human cognition has shown that individuals’ memory for others is influenced by the latter’s behaviour in social contracts. Such research is primarily based on hypothetical or more abstract forms of social contracts, whereas an application of this knowledge to everyday health behaviours can be of great value. To address this, the current study investigated whether participants who were asked to imagine themselves in a hypothetical hazardous health scenario showed differential response sensitivity (d’) and latency (RT) to faces of hospital staff tagged with contrasting hand hygiene before touching patients: clean hands, dirty hands, or unknown hand-washing behaviour (control). The test used a two alternative forced-choice (2AFC: “old/new”) face recognition paradigm. The findings showed that d’ to dirty and clean hands was similar, but higher than for controls. Moreover, d’ was not affected by the occupation of hospital staff (nurses vs porters). The absence of memory gains towards clean or dirty hands points to the need for new strategies to remind patients to observe (and remember) the hand hygiene of others when exposed to hazardous health environments.
Resumo:
Thesis (Master's)--University of Washington, 2014
Resumo:
Vivemos cada vez mais numa era de crescentes avanços tecnológicos em diversas áreas. O que há uns anos atrás era considerado como praticamente impossível, em muitos dos casos, já se tornou realidade. Todos usamos tecnologias como, por exemplo, a Internet, Smartphones e GPSs de uma forma natural. Esta proliferação da tecnologia permitiu tanto ao cidadão comum como a organizações a sua utilização de uma forma cada vez mais criativa e simples de utilizar. Além disso, a cada dia que passa surgem novos negócios e startups, o que demonstra o dinamismo que este crescimento veio trazer para a indústria. A presente dissertação incide sobre duas áreas em forte crescimento: Reconhecimento Facial e Business Intelligence (BI), assim como a respetiva combinação das duas com o objetivo de ser criado um novo módulo para um produto já existente. Tratando-se de duas áreas distintas, é primeiramente feito um estudo sobre cada uma delas. A área de Business Intelligence é vocacionada para organizações e trata da recolha de informação sobre o negócio de determinada empresa, seguindo-se de uma posterior análise. A grande finalidade da área de Business Intelligence é servir como forma de apoio ao processo de tomada de decisão por parte dos analistas e gestores destas organizações. O Reconhecimento Facial, por sua vez, encontra-se mais presente na sociedade. Tendo surgido no passado através da ficção científica, cada vez mais empresas implementam esta tecnologia que tem evoluído ao longo dos anos, chegando mesmo a ser usada pelo consumidor final, como por exemplo em Smartphones. As suas aplicações são, portanto, bastante diversas, desde soluções de segurança até simples entretenimento. Para estas duas áreas será assim feito um estudo com base numa pesquisa de publicações de autores da respetiva área. Desde os cenários de utilização, até aspetos mais específicos de cada uma destas áreas, será assim transmitido este conhecimento para o leitor, o que permitirá uma maior compreensão por parte deste nos aspetos relativos ao desenvolvimento da solução. Com o estudo destas duas áreas efetuado, é então feita uma contextualização do problema em relação à área de atuação da empresa e quais as abordagens possíveis. É também descrito todo o processo de análise e conceção, assim como o próprio desenvolvimento numa vertente mais técnica da solução implementada. Por fim, são apresentados alguns exemplos de resultados obtidos já após a implementação da solução.
Resumo:
The traditional role of justice is to arbitrate where the good will of people is not enough, if even present, to settle a dispute between the concerned parties. It is a procedural approach that assumes a fractured relationship between those involved. Recognition, at first glance, would not seem to mirror these aspects of justice. Yet recognition is very much a subject of justice these days. The aim of this paper is to question the applicability of justice to the practice of recognition. The methodological orientation of this paper is a Kantian-style critique of the institution of justice, highlighting the limits of its reach and the dangers of overextension. The critique unfolds in the following three steps: 1) There is an immediate appeal to justice as a practice of recognition through its commitment to universality. This allure is shown to be deceptive in providing no prescription for the actual practice of this universality. 2) The interventionist character of justice is designed to address divided relationships. If recognition is only given expression through this channel, then we can only assume division as our starting ground. 3) The outcome of justice in respect to recognition is identification. This identification is left vulnerable to misrecognition itself, creating a cycle of injustice that demands recognition from anew. It seems to be well accepted that recognition is essentjustice, but less clear how to do justice to recognition. This paper is an effort in clarification.
Resumo:
La zeitgesit contemporaine sur la reconnaissance des visages suggère que le processus de reconnaissance reposerait essentiellement sur le traitement des distances entre les attributs internes du visage. Il est toutefois surprenant de noter que cette hypothèse n’a jamais été évaluée directement dans la littérature. Pour ce faire, 515 photographies de visages ont été annotées afin d’évaluer l’information véhiculée par de telles distances. Les résultats obtenus suggèrent que les études précédentes ayant utilisé des modifications de ces distances ont présenté 4 fois plus d’informations que les distances inter-attributs du monde réel. De plus, il semblerait que les observateurs humains utilisent difficilement les distances inter-attributs issues de visages réels pour reconnaître leurs semblables à plusieurs distances de visionnement (pourcentage correct maximal de 65%). Qui plus est, la performance des observateurs est presque parfaitement restaurée lorsque l’information des distances inter-attributs n’est pas utilisable mais que les observateurs peuvent utiliser les autres sources d’information de visages réels. Nous concluons que des indices faciaux autre que les distances inter-attributs tel que la forme des attributs et les propriétés de la peau véhiculent l’information utilisée par le système visuel pour opérer la reconnaissance des visages.
Resumo:
Le but de l’expérience décrite dans ce mémoire est d'arriver à inculquer inconsciemment aux sujets une stratégie visuelle leur permettant d'utiliser seulement une partie spécifique de l'information visuelle disponible dans le visage humain pour en reconnaître le genre. Normalement, le genre d’un visage est reconnu au moyen de certaines régions, comme la bouche et les yeux (Dupuis-Roy, Fortin, Fiset et Gosselin, 2009). La tâche accomplie par les sujets permettait un apprentissage perceptuel implicite qui se faisait par conditionnement opérant. Ces derniers étaient informés qu'un nombre de points leur serait attribué selon leur performance à la tâche. Au terme de l’entraînement, les sujets renforcés pour l’utilisation de l’oeil gauche utilisaient davantage l’oeil gauche que l’oeil droit et ceux renforcés pour l’utilisation de l’oeil droit utilisaient davantage l’oeil droit. Nous discuterons de potentielles applications cliniques de cette procédure de conditionnement.
Resumo:
L’objectif de cette recherche est la création d’une plateforme en ligne qui permettrait d’examiner les différences individuelles de stratégies de traitement de l’information visuelle dans différentes tâches de catégorisation des visages. Le but d’une telle plateforme est de récolter des données de participants géographiquement dispersés et dont les habiletés en reconnaissance des visages sont variables. En effet, de nombreuses études ont montré qu’il existe de grande variabilité dans le spectre des habiletés à reconnaître les visages, allant de la prosopagnosie développementale (Susilo & Duchaine, 2013), un trouble de reconnaissance des visages en l’absence de lésion cérébrale, aux super-recognizers, des individus dont les habiletés en reconnaissance des visages sont au-dessus de la moyenne (Russell, Duchaine & Nakayama, 2009). Entre ces deux extrêmes, les habiletés en reconnaissance des visages dans la population normale varient. Afin de démontrer la faisabilité de la création d’une telle plateforme pour des individus d’habiletés très variables, nous avons adapté une tâche de reconnaissance de l’identité des visages de célébrités utilisant la méthode Bubbles (Gosselin & Schyns, 2001) et avons recruté 14 sujets contrôles et un sujet présentant une prosopagnosie développementale. Nous avons pu mettre en évidence l’importance des yeux et de la bouche dans l’identification des visages chez les sujets « normaux ». Les meilleurs participants semblent, au contraire, utiliser majoritairement le côté gauche du visage (l’œil gauche et le côté gauche de la bouche).
Resumo:
Artifacts made by humans, such as items of furniture and houses, exhibit an enormous amount of variability in shape. In this paper, we concentrate on models of the shapes of objects that are made up of fixed collections of sub-parts whose dimensions and spatial arrangement exhibit variation. Our goals are: to learn these models from data and to use them for recognition. Our emphasis is on learning and recognition from three-dimensional data, to test the basic shape-modeling methodology. In this paper we also demonstrate how to use models learned in three dimensions for recognition of two-dimensional sketches of objects.