93 resultados para Computer vision teaching
Resumo:
Automatic gender classification has many security and commercial applications. Various modalities have been investigated for gender classification with face-based classification being the most popular. In some real-world scenarios the face may be partially occluded. In these circumstances a classification based on individual parts of the face known as local features must be adopted. We investigate gender classification using lip movements. We show for the first time that important gender specific information can be obtained from the way in which a person moves their lips during speech. Furthermore our study indicates that the lip dynamics during speech provide greater gender discriminative information than simply lip appearance. We also show that the lip dynamics and appearance contain complementary gender information such that a model which captures both traits gives the highest overall classification result. We use Discrete Cosine Transform based features and Gaussian Mixture Modelling to model lip appearance and dynamics and employ the XM2VTS database for our experiments. Our experiments show that a model which captures lip dynamics along with appearance can improve gender classification rates by between 16-21% compared to models of only lip appearance.
Resumo:
This paper presents a novel method that leverages reasoning capabilities in a computer vision system dedicated to human action recognition. The proposed methodology is decomposed into two stages. First, a machine learning based algorithm - known as bag of words - gives a first estimate of action classification from video sequences, by performing an image feature analysis. Those results are afterward passed to a common-sense reasoning system, which analyses, selects and corrects the initial estimation yielded by the machine learning algorithm. This second stage resorts to the knowledge implicit in the rationality that motivates human behaviour. Experiments are performed in realistic conditions, where poor recognition rates by the machine learning techniques are significantly improved by the second stage in which common-sense knowledge and reasoning capabilities have been leveraged. This demonstrates the value of integrating common-sense capabilities into a computer vision pipeline. © 2012 Elsevier B.V. All rights reserved.
Resumo:
Laughter is a frequently occurring social signal and an important part of human non-verbal communication. However it is often overlooked as a serious topic of scientific study. While the lack of research in this area is mostly due to laughter’s non-serious nature, it is also a particularly difficult social signal to produce on demand in a convincing manner; thus making it a difficult topic for study in laboratory settings. In this paper we provide some techniques and guidance for inducing both hilarious laughter and conversational laughter. These techniques were devised with the goal of capturing mo- tion information related to laughter while the person laughing was either standing or seated. Comments on the value of each of the techniques and general guidance as to the importance of atmosphere, environment and social setting are provided.
Resumo:
For the first time in this paper we present results showing the effect of speaker head pose angle on automatic lip-reading performance over a wide range of closely spaced angles. We analyse the effect head pose has upon the features themselves and show that by selecting coefficients with minimum variance w.r.t. pose angle, recognition performance can be improved when train-test pose angles differ. Experiments are conducted using the initial phase of a unique multi view Audio-Visual database designed specifically for research and development of pose-invariant lip-reading systems. We firstly show that it is the higher order horizontal spatial frequency components that become most detrimental as the pose deviates. Secondly we assess the performance of different feature selection masks across a range of pose angles including a new mask based on Minimum Cross-Pose Variance coefficients. We report a relative improvement of 50% in Word Error Rate when using our selection mask over a common energy based selection during profile view lip-reading.
Resumo:
In recent years, gradient vector flow (GVF) based algorithms have been successfully used to segment a variety of 2-D and 3-D imagery. However, due to the compromise of internal and external energy forces within the resulting partial differential equations, these methods may lead to biased segmentation results. In this paper, we propose MSGVF, a mean shift based GVF segmentation algorithm that can successfully locate the correct borders. MSGVF is developed so that when the contour reaches equilibrium, the various forces resulting from the different energy terms are balanced. In addition, the smoothness constraint of image pixels is kept so that over- or under-segmentation can be reduced. Experimental results on publicly accessible datasets of dermoscopic and optic disc images demonstrate that the proposed method effectively detects the borders of the objects of interest.
Resumo:
Distinct neural populations carry signals from short-wave (S) cones. We used individual differences to test whether two types of pathways, those that receive excitatory input (S+) and those that receive inhibitory input (S-), contribute independently to psychophysical performance. We also conducted a genome-wide association study (GWAS) to look for genetic correlates of the individual differences. Our psychophysical test was based on the Cambridge Color Test, but detection thresholds were measured separately for S-cone spatial increments and decrements. Our participants were 1060 healthy adults aged 16-40. Test-retest reliabilities for thresholds were good (ρ=0.64 for S-cone increments, 0.67 for decrements and 0.73 for the average of the two). "Regression scores," isolating variability unique to incremental or decremental sensitivity, were also reliable (ρ=0.53 for increments and ρ=0.51 for decrements). The correlation between incremental and decremental thresholds was ρ=0.65. No genetic markers reached genome-wide significance (p-7). We identified 18 "suggestive" loci (p-5). The significant test-retest reliabilities show stable individual differences in S-cone sensitivity in a normal adult population. Though a portion of the variance in sensitivity is shared between incremental and decremental sensitivity, over 26% of the variance is stable across individuals, but unique to increments or decrements, suggesting distinct neural substrates. Some of the variability in sensitivity is likely to be genetic. We note that four of the suggestive associations found in the GWAS are with genes that are involved in glucose metabolism or have been associated with diabetes.
Resumo:
The OSCAR test, a clinical device that uses counterphase flicker photometry, is believed to be sensitive to the relative numbers of long-wavelength and middle-wavelength cones in the retina, as well as to individual variations in the spectral positions of the photopigments. As part of a population study of individual variations in perception, we obtained OSCAR settings from 1058 participants. We report the distribution characteristics for this cohort. A randomly selected subset of participants was tested twice at an interval of at least one week: the test-retest reliability (Spearman's rho) was 0.80. In a whole-genome association analysis we found a provisional association with a single nucleotide polymorphism (rs16844995). This marker is close to the gene RXRG, which encodes a nuclear receptor, retinoid X receptor γ. This nuclear receptor is already known to have a role in the differentiation of cones during the development of the eye, and we suggest that polymorphisms in or close to RXRG influence the relative probability with which long-wave and middle-wave opsin genes are expressed in human cones.
Resumo:
Human action recognition is an important problem in computer vision, which has been applied to many applications. However, how to learn an accurate and discriminative representation of videos based on the features extracted from videos still remains to be a challenging problem. In this paper, we propose a novel method named low-rank representation based action recognition to recognize human actions. Given a dictionary, low-rank representation aims at finding the lowestrank representation of all data, which can capture the global data structures. According to its characteristics, low-rank representation is robust against noises. Experimental results demonstrate the effectiveness of the proposed approach on several publicly available datasets.
Resumo:
In this paper a 3D human pose tracking framework is presented. A new dimensionality reduction method (Hierarchical Temporal Laplacian Eigenmaps) is introduced to represent activities in hierarchies of low dimensional spaces. Such a hierarchy provides increasing independence between limbs, allowing higher flexibility and adaptability that result in improved accuracy. Moreover, a novel deterministic optimisation method (Hierarchical Manifold Search) is applied to estimate efficiently the position of the corresponding body parts. Finally, evaluation on public datasets such as HumanEva demonstrates that our approach achieves a 62.5mm-65mm average joint error for the walking activity and outperforms state-of-the-art methods in terms of accuracy and computational cost.
Resumo:
Ear recognition, as a biometric, has several advantages. In particular, ears can be measured remotely and are also relatively static in size and structure for each individual. Unfortunately, at present, good recognition rates require controlled conditions. For commercial use, these systems need to be much more robust. In particular, ears have to be recognized from different angles ( poses), under different lighting conditions, and with different cameras. It must also be possible to distinguish ears from background clutter and identify them when partly occluded by hair, hats, or other objects. The purpose of this paper is to suggest how progress toward such robustness might be achieved through a technique that improves ear registration. The approach focuses on 2-D images, treating the ear as a planar surface that is registered to a gallery using a homography transform calculated from scale-invariant feature-transform feature matches. The feature matches reduce the gallery size and enable a precise ranking using a simple 2-D distance algorithm. Analysis on a range of data sets demonstrates the technique to be robust to background clutter, viewing angles up to +/- 13 degrees, and up to 18% occlusion. In addition, recognition remains accurate with masked ear images as small as 20 x 35 pixels.
Resumo:
Recent work suggests that the human ear varies significantly between different subjects and can be used for identification. In principle, therefore, using ears in addition to the face within a recognition system could improve accuracy and robustness, particularly for non-frontal views. The paper describes work that investigates this hypothesis using an approach based on the construction of a 3D morphable model of the head and ear. One issue with creating a model that includes the ear is that existing training datasets contain noise and partial occlusion. Rather than exclude these regions manually, a classifier has been developed which automates this process. When combined with a robust registration algorithm the resulting system enables full head morphable models to be constructed efficiently using less constrained datasets. The algorithm has been evaluated using registration consistency, model coverage and minimalism metrics, which together demonstrate the accuracy of the approach. To make it easier to build on this work, the source code has been made available online.