911 resultados para Opencv, Zbar, Computer Vision
Resumo:
In this paper we propose a statistical model for detection and tracking of human silhouette and the corresponding 3D skeletal structure in gait sequences. We follow a point distribution model (PDM) approach using a Principal Component Analysis (PCA). The problem of non-lineal PCA is partially resolved by applying a different PDM depending of pose estimation; frontal, lateral and diagonal, estimated by Fisher's linear discriminant. Additionally, the fitting is carried out by selecting the closest allowable shape from the training set by means of a nearest neighbor classifier. To improve the performance of the model we develop a human gait analysis to take into account temporal dynamic to track the human body. The incorporation of temporal constraints on the model increase reliability and robustness.
Resumo:
In this paper, we consider the problem of tracking similar objects. We show how a mean field approach can be used to deal with interacting targets and we compare it with Markov Chain Monte Carlo (MCMC). Two mean field implementations are presented. The first one is more general and uses particle filtering. We discuss some simplifications of the base algorithm that reduce the computation time. The second one is based on suitable Gaussian approximations of probability densities that lead to a set of self-consistent equations for the means and covariances. These equations give the Kalman solution if there is no interaction. Experiments have been performed on two kinds of sequences. The first kind is composed of a single long sequence of twenty roaming ants and was previously analysed using MCMC. In this case, our mean field algorithms obtain substantially better results. The second kind corresponds to selected sequences of a football match in which the interaction avoids tracker coalescence in situations where independent trackers fail.
Resumo:
In this paper, we show how interacting and occluding targets can be tackled successfully within a Gaussian approximation. For that purpose, we develop a general expansion of the mean and covariance of the posterior and we consider a first order approximation of it. The proposed method differs from EKF in that neither a non-linear dynamical model nor a non-linear measurement vector to state relation have to be defined, so it works with any kind of interaction potential and likelihood. The approach has been tested on three sequences (10400, 2500, and 400 frames each one). The results show that our approach helps to reduce the number of failures without increasing too much the computation time with respect to methods that do not take into account target interactions.
Resumo:
Support vector machines (SVMs), though accurate, are not preferred in applications requiring high classification speed or when deployed in systems of limited computational resources, due to the large number of support vectors involved in the model. To overcome this problem we have devised a primal SVM method with the following properties: (1) it solves for the SVM representation without the need to invoke the representer theorem, (2) forward and backward selections are combined to approach the final globally optimal solution, and (3) a criterion is introduced for identification of support vectors leading to a much reduced support vector set. In addition to introducing this method the paper analyzes the complexity of the algorithm and presents test results on three public benchmark problems and a human activity recognition application. These applications demonstrate the effectiveness and efficiency of the proposed algorithm.
--------------------------------------------------------------------------------
Resumo:
The development of an automated system for the quality assessment of aerodrome ground lighting (AGL), in accordance with associated standards and recommendations, is presented. The system is composed of an image sensor, placed inside the cockpit of an aircraft to record images of the AGL during a normal descent to an aerodrome. A model-based methodology is used to ascertain the optimum match between a template of the AGL and the actual image data in order to calculate the position and orientation of the camera at the instant the image was acquired. The camera position and orientation data are used along with the pixel grey level for each imaged luminaire, to estimate a value for the luminous intensity of a given luminaire. This can then be compared with the expected brightness for that luminaire to ensure it is operating to the required standards. As such, a metric for the quality of the AGL pattern is determined. Experiments on real image data is presented to demonstrate the application and effectiveness of the system.
Resumo:
Automatic gender classification has many security and commercial applications. Various modalities have been investigated for gender classification with face-based classification being the most popular. In some real-world scenarios the face may be partially occluded. In these circumstances a classification based on individual parts of the face known as local features must be adopted. We investigate gender classification using lip movements. We show for the first time that important gender specific information can be obtained from the way in which a person moves their lips during speech. Furthermore our study indicates that the lip dynamics during speech provide greater gender discriminative information than simply lip appearance. We also show that the lip dynamics and appearance contain complementary gender information such that a model which captures both traits gives the highest overall classification result. We use Discrete Cosine Transform based features and Gaussian Mixture Modelling to model lip appearance and dynamics and employ the XM2VTS database for our experiments. Our experiments show that a model which captures lip dynamics along with appearance can improve gender classification rates by between 16-21% compared to models of only lip appearance.
Resumo:
This paper presents a novel method that leverages reasoning capabilities in a computer vision system dedicated to human action recognition. The proposed methodology is decomposed into two stages. First, a machine learning based algorithm - known as bag of words - gives a first estimate of action classification from video sequences, by performing an image feature analysis. Those results are afterward passed to a common-sense reasoning system, which analyses, selects and corrects the initial estimation yielded by the machine learning algorithm. This second stage resorts to the knowledge implicit in the rationality that motivates human behaviour. Experiments are performed in realistic conditions, where poor recognition rates by the machine learning techniques are significantly improved by the second stage in which common-sense knowledge and reasoning capabilities have been leveraged. This demonstrates the value of integrating common-sense capabilities into a computer vision pipeline. © 2012 Elsevier B.V. All rights reserved.
Resumo:
Laughter is a frequently occurring social signal and an important part of human non-verbal communication. However it is often overlooked as a serious topic of scientific study. While the lack of research in this area is mostly due to laughter’s non-serious nature, it is also a particularly difficult social signal to produce on demand in a convincing manner; thus making it a difficult topic for study in laboratory settings. In this paper we provide some techniques and guidance for inducing both hilarious laughter and conversational laughter. These techniques were devised with the goal of capturing mo- tion information related to laughter while the person laughing was either standing or seated. Comments on the value of each of the techniques and general guidance as to the importance of atmosphere, environment and social setting are provided.
Resumo:
For the first time in this paper we present results showing the effect of speaker head pose angle on automatic lip-reading performance over a wide range of closely spaced angles. We analyse the effect head pose has upon the features themselves and show that by selecting coefficients with minimum variance w.r.t. pose angle, recognition performance can be improved when train-test pose angles differ. Experiments are conducted using the initial phase of a unique multi view Audio-Visual database designed specifically for research and development of pose-invariant lip-reading systems. We firstly show that it is the higher order horizontal spatial frequency components that become most detrimental as the pose deviates. Secondly we assess the performance of different feature selection masks across a range of pose angles including a new mask based on Minimum Cross-Pose Variance coefficients. We report a relative improvement of 50% in Word Error Rate when using our selection mask over a common energy based selection during profile view lip-reading.