23 resultados para 3D-object recognition
Resumo:
A scale invariant feature transform (SIFT) based mean shift algorithm is presented for object tracking in real scenarios. SIFT features are used to correspond the region of interests across frames. Meanwhile, mean shift is applied to conduct similarity search via color histograms. The probability distributions from these two measurements are evaluated in an expectation–maximization scheme so as to achieve maximum likelihood estimation of similar regions. This mutual support mechanism can lead to consistent tracking performance if one of the two measurements becomes unstable. Experimental work demonstrates that the proposed mean shift/SIFT strategy improves the tracking performance of the classical mean shift and SIFT tracking algorithms in complicated real scenarios.
Resumo:
For many applications of emotion recognition, such as virtual agents, the system must select responses while the user is speaking. This requires reliable on-line recognition of the user’s affect. However most emotion recognition systems are based on turnwise processing. We present a novel approach to on-line emotion recognition from speech using Long Short-Term Memory Recurrent Neural Networks. Emotion is recognised frame-wise in a two-dimensional valence-activation continuum. In contrast to current state-of-the-art approaches, recognition is performed on low-level signal frames, similar to those used for speech recognition. No statistical functionals are applied to low-level feature contours. Framing at a higher level is therefore unnecessary and regression outputs can be produced in real-time for every low-level input frame. We also investigate the benefits of including linguistic features on the signal frame level obtained by a keyword spotter.
Resumo:
This paper addresses the pose recovery problem of a particular articulated object: the human body. In this model-based approach, the 2D-shape is associated to the corresponding stick figure allowing the joint segmentation and pose recovery of the subject observed in the scene. The main disadvantage of 2D-models is their restriction to the viewpoint. To cope with this limitation, local spatio-temporal 2D-models corresponding to many views of the same sequences are trained, concatenated and sorted in a global framework. Temporal and spatial constraints are then considered to build the probabilistic transition matrix (PTM) that gives a frame to frame estimation of the most probable local models to use during the fitting procedure, thus limiting the feature space. This approach takes advantage of 3D information avoiding the use of a complex 3D human model. The experiments carried out on both indoor and outdoor sequences have demonstrated the ability of this approach to adequately segment pedestrians and estimate their poses independently of the direction of motion during the sequence. (c) 2008 Elsevier Ltd. All rights reserved.
Resumo:
In this paper we propose a statistical model for detection and tracking of human silhouette and the corresponding 3D skeletal structure in gait sequences. We follow a point distribution model (PDM) approach using a Principal Component Analysis (PCA). The problem of non-lineal PCA is partially resolved by applying a different PDM depending of pose estimation; frontal, lateral and diagonal, estimated by Fisher's linear discriminant. Additionally, the fitting is carried out by selecting the closest allowable shape from the training set by means of a nearest neighbor classifier. To improve the performance of the model we develop a human gait analysis to take into account temporal dynamic to track the human body. The incorporation of temporal constraints on the model increase reliability and robustness.
Resumo:
Ear recognition, as a biometric, has several advantages. In particular, ears can be measured remotely and are also relatively static in size and structure for each individual. Unfortunately, at present, good recognition rates require controlled conditions. For commercial use, these systems need to be much more robust. In particular, ears have to be recognized from different angles ( poses), under different lighting conditions, and with different cameras. It must also be possible to distinguish ears from background clutter and identify them when partly occluded by hair, hats, or other objects. The purpose of this paper is to suggest how progress toward such robustness might be achieved through a technique that improves ear registration. The approach focuses on 2-D images, treating the ear as a planar surface that is registered to a gallery using a homography transform calculated from scale-invariant feature-transform feature matches. The feature matches reduce the gallery size and enable a precise ranking using a simple 2-D distance algorithm. Analysis on a range of data sets demonstrates the technique to be robust to background clutter, viewing angles up to +/- 13 degrees, and up to 18% occlusion. In addition, recognition remains accurate with masked ear images as small as 20 x 35 pixels.
Resumo:
Significant recent progress has shown ear recognition to be a viable biometric. Good recognition rates have been demonstrated under controlled conditions, using manual registration or with specialised equipment. This paper describes a new technique which improves the robustness of ear registration and recognition, addressing issues of pose variation, background clutter and occlusion. By treating the ear as a planar surface and creating a homography transform using SIFT feature matches, ears can be registered accurately. The feature matches reduce the gallery size and enable a precise ranking using a simple 2D distance algorithm. When applied to the XM2VTS database it gives results comparable to PCA with manual registration. Further analysis on more challenging datasets demonstrates the technique to be robust to background clutter, viewing angles up to +/- 13 degrees and with over 20% occlusion.
Resumo:
Despite pattern recognition methods for human behavioral analysis has flourished in the last decade, animal behavioral analysis has been almost neglected. Those few approaches are mostly focused on preserving livestock economic value while attention on the welfare of companion animals, like dogs, is now emerging as a social need. In this work, following the analogy with human behavior recognition, we propose a system for recognizing body parts of dogs kept in pens. We decide to adopt both 2D and 3D features in order to obtain a rich description of the dog model. Images are acquired using the Microsoft Kinect to capture the depth map images of the dog. Upon depth maps a Structural Support Vector Machine (SSVM) is employed to identify the body parts using both 3D features and 2D images. The proposal relies on a kernelized discriminative structural classificator specifically tailored for dogs independently from the size and breed. The classification is performed in an online fashion using the LaRank optimization technique to obtaining real time performances. Promising results have emerged during the experimental evaluation carried out at a dog shelter, managed by IZSAM, in Teramo, Italy.