19 resultados para 3D object recognition
em Indian Institute of Science - Bangalore - Índia
Resumo:
Some experimental results on the recognition of three-dimensional wire-frame objects are presented. In order to overcome the limitations of a recent model, which employs radial basis functions-based neural networks, we have proposed a hybrid learning system for object recognition, featuring: an optimization strategy (simulated annealing) in order to avoid local minima of an energy functional; and an appropriate choice of centers of the units. Further, in an attempt to achieve improved generalization ability, and to reduce the time for training, we invoke the principle of self-organization which utilises an unsupervised learning algorithm.
Resumo:
3D Face Recognition is an active area of research for past several years. For a 3D face recognition system one would like to have an accurate as well as low cost setup for constructing 3D face model. In this paper, we use Profilometry approach to obtain a 3D face model.This method gives a low cost solution to the problem of acquiring 3D data and the 3D face models generated by this method are sufficiently accurate. We also develop an algorithm that can use the 3D face model generated by the above method for the recognition purpose.
Resumo:
This paper presents an enhanced relational description for the prescription of the grasp requirement and evolution of the posture of a digital human hand towards satisfaction of this requirement. Precise relational description needs anatomical segmentation of the hand geometry into palmar, dorsal and lateral patches using the palm-plane and joint locations information, and operational segmentation of the object geometry into pull,push and lateral patches with due consideration to the effect of friction. Relational description identifies appropriate patches for a desired grasp condition. Satisfaction of this requirement occurs in two discrete stages,namely,contact establishment and post-contact force exertion for object capturing. Contact establishment occurs in four potentially overlapping phases,namely,re-orientation,transfer,pre- shaping,and closing-in. The novel h and re-orientation phase,enables the palm to face the object in a task sequence scenario, transfer takes the wrist to the ball park ; pre-shaping and close-in finally achieves the contact. In this paper, an anatomically pertinent closed-form formulation is presented for the closing-in phase for identification of the point of contact on the patches ,prescribed by the relational description. Since mere contact does not ensure grasp and slip phenomenon at the point of contact on application of force is a common occurrence, the effect of slip in presence of friction has been studied for 2D and 3D object grasping endeavours and a computational generation of the slip locus is presented.A general slip locus is found to be a non-linear curve even on planar faces.Two varieties of slip phenomena,namely,stabilizing and non-stabilizing slips, and their local characteristics have been identified.Study of the evolution of this slip characteristic over the slip locus exhibited diverse grasping behaviour possibilities. Thus, the relational description paradigm not only makes the requirement specification easy and meaningful but also enables high fidelity hand object interaction studies possible.
Resumo:
We are addressing a new problem of improving automatic speech recognition performance, given multiple utterances of patterns from the same class. We have formulated the problem of jointly decoding K multiple patterns given a single Hidden Markov Model. It is shown that such a solution is possible by aligning the K patterns using the proposed Multi Pattern Dynamic Time Warping algorithm followed by the Constrained Multi Pattern Viterbi Algorithm The new formulation is tested in the context of speaker independent isolated word recognition for both clean and noisy patterns. When 10 percent of speech is affected by a burst noise at -5 dB Signal to Noise Ratio (local), it is shown that joint decoding using only two noisy patterns reduces the noisy speech recognition error rate to about 51 percent, when compared to the single pattern decoding using the Viterbi Algorithm. In contrast a simple maximization of individual pattern likelihoods, provides only about 7 percent reduction in error rate.
Resumo:
Visual tracking has been a challenging problem in computer vision over the decades. The applications of Visual Tracking are far-reaching, ranging from surveillance and monitoring to smart rooms. Mean-shift (MS) tracker, which gained more attention recently, is known for tracking objects in a cluttered environment and its low computational complexity. The major problem encountered in histogram-based MS is its inability to track rapidly moving objects. In order to track fast moving objects, we propose a new robust mean-shift tracker that uses both spatial similarity measure and color histogram-based similarity measure. The inability of MS tracker to handle large displacements is circumvented by the spatial similarity-based tracking module, which lacks robustness to object's appearance change. The performance of the proposed tracker is better than the individual trackers for tracking fast-moving objects with better accuracy.
Resumo:
We describe a novel method for human activity segmentation and interpretation in surveillance applications based on Gabor filter-bank features. A complex human activity is modeled as a sequence of elementary human actions like walking, running, jogging, boxing, hand-waving etc. Since human silhouette can be modeled by a set of rectangles, the elementary human actions can be modeled as a sequence of a set of rectangles with different orientations and scales. The activity segmentation is based on Gabor filter-bank features and normalized spectral clustering. The feature trajectories of an action category are learnt from training example videos using dynamic time warping. The combined segmentation and the recognition processes are very efficient as both the algorithms share the same framework and Gabor features computed for the former can be used for the later. We have also proposed a simple shadow detection technique to extract good silhouette which is necessary for good accuracy of an action recognition technique.
Resumo:
Feature track matrix factorization based methods have been attractive solutions to the Structure-front-motion (Sfnl) problem. Group motion of the feature points is analyzed to get the 3D information. It is well known that the factorization formulations give rise to rank deficient system of equations. Even when enough constraints exist, the extracted models are sparse due the unavailability of pixel level tracks. Pixel level tracking of 3D surfaces is a difficult problem, particularly when the surface has very little texture as in a human face. Only sparsely located feature points can be tracked and tracking error arc inevitable along rotating lose texture surfaces. However, the 3D models of an object class lie in a subspace of the set of all possible 3D models. We propose a novel solution to the Structure-from-motion problem which utilizes the high-resolution 3D obtained from range scanner to compute a basis for this desired subspace. Adding subspace constraints during factorization also facilitates removal of tracking noise which causes distortions outside the subspace. We demonstrate the effectiveness of our formulation by extracting dense 3D structure of a human face and comparing it with a well known Structure-front-motion algorithm due to Brand.
Resumo:
In this paper, we describe a system for the automatic recognition of isolated handwritten Devanagari characters obtained by linearizing consonant conjuncts. Owing to the large number of characters and resulting demands on data acquisition, we use structural recognition techniques to reduce some characters to others. The residual characters are then classified using the subspace method. Finally the results of structural recognition and feature-based matching are mapped to give final output. The proposed system Ifs evaluated for the writer dependent scenario.
Resumo:
The increasing use of 3D modeling of Human Face in Face Recognition systems, User Interfaces, Graphics, Gaming and the like has made it an area of active study. Majority of the 3D sensors rely on color coded light projection for 3D estimation. Such systems fail to generate any response in regions covered by Facial Hair (like beard, mustache), and hence generate holes in the model which have to be filled manually later on. We propose the use of wavelet transform based analysis to extract the 3D model of Human Faces from a sinusoidal white light fringe projected image. Our method requires only a single image as input. The method is robust to texture variations on the face due to space-frequency localization property of the wavelet transform. It can generate models to pixel level refinement as the phase is estimated for each pixel by a continuous wavelet transform. In cases of sparse Facial Hair, the shape distortions due to hairs can be filtered out, yielding an estimate for the underlying face. We use a low-pass filtering approach to estimate the face texture from the same image. We demonstrate the method on several Human Faces both with and without Facial Hairs. Unseen views of the face are generated by texture mapping on different rotations of the obtained 3D structure. To the best of our knowledge, this is the first attempt to estimate 3D for Human Faces in presence of Facial hair structures like beard and mustache without generating holes in those areas.
Resumo:
Sinusoidal structured light projection (SSLP) technique, specifically-phase stepping method, is in widespread use to obtain accurate, dense 3-D data. But, if the object under investigation possesses surface discontinuities, phase unwrapping (an intermediate step in SSLP) stage mandatorily require several additional images, of the object with projected fringes (of different spatial frequencies), as input to generate a reliable 3D shape. On the other hand, Color-coded structured light projection (CSLP) technique is known to require a single image as in put, but generates sparse 3D data. Thus we propose the use of CSLP in conjunction with SSLP to obtain dense 3D data with minimum number of images as input. This approach is shown to be significantly faster and reliable than temporal phase unwrapping procedure that uses a complete exponential sequence. For example, if a measurement with the accuracy obtained by interrogating the object with 32 fringes in the projected pattern is carried out with both the methods, new strategy proposed requires only 5 frames as compared to 24 frames required by the later method.
Resumo:
This paper presents an algorithm for generating the Interior Medial Axis Transform (iMAT) of 3D objects with free-form boundaries. The algorithm proposed uses the exact representation of the part and generates an approximate rational spline description of the iMAT. The algorithm generates the iMAT by a tracing technique that marches along the object's boundary. The level of approximation is controlled by the choice of the step size in the tracing procedure. Criteria based on distance and local curvature of boundary entities are used to identify the junction points and the search for these junction points is done in an efficient way. The algorithm works for multiply-connected objects as well. Results of the implementation are provided. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
A method to reliably extract object profiles even with height discontinuities (that leads to 2n pi phase jumps) is proposed. This method uses Fourier transform profilometry to extract wrapped phase, and an additional image formed by illuminating the object of interest by a novel gray coded pattern for phase unwrapping. Simulation results suggest that the proposed approach not only retains the advantages of the original method, but also contributes significantly in the enhancement of its performance. Fundamental advantage of this method stems from the fact that both extraction of wrapped phase and unwrapping the same were done by gray scale images. Hence, unlike the methods that use colors, proposed method doesn't demand a color CCD camera and is ideal for profiling objects with multiple colors.
Resumo:
With the introduction of 2D flat-panel X-ray detectors, 3D image reconstruction using helical cone-beam tomography is fast replacing the conventional 2D reconstruction techniques. In 3D image reconstruction, the source orbit or scanning geometry should satisfy the data sufficiency or completeness condition for exact reconstruction. The helical scan geometry satisfies this condition and hence can give exact reconstruction. The theoretically exact helical cone-beam reconstruction algorithm proposed by Katsevich is a breakthrough and has attracted interest in the 3D reconstruction using helical cone-beam Computed Tomography.In many practical situations, the available projection data is incomplete. One such case is where the detector plane does not completely cover the full extent of the object being imaged in lateral direction resulting in truncated projections. This result in artifacts that mask small features near to the periphery of the ROI when reconstructed using the convolution back projection (CBP) method assuming that the projection data is complete. A number of techniques exist which deal with completion of missing data followed by the CBP reconstruction. In 2D, linear prediction (LP)extrapolation has been shown to be efficient for data completion, involving minimal assumptions on the nature of the data, producing smooth extensions of the missing projection data.In this paper, we propose to extend the LP approach for extrapolating helical cone beam truncated data. The projection on the multi row flat panel detectors has missing columns towards either ends in the lateral direction in truncated data situation. The available data from each detector row is modeled using a linear predictor. The available data is extrapolated and this completed projection data is backprojected using the Katsevich algorithm. Simulation results show the efficacy of the proposed method.
Resumo:
In this paper, we present a fast learning neural network classifier for human action recognition. The proposed classifier is a fully complex-valued neural network with a single hidden layer. The neurons in the hidden layer employ the fully complex-valued hyperbolic secant as an activation function. The parameters of the hidden layer are chosen randomly and the output weights are estimated analytically as a minimum norm least square solution to a set of linear equations. The fast leaning fully complex-valued neural classifier is used for recognizing human actions accurately. Optical flow-based features extracted from the video sequences are utilized to recognize 10 different human actions. The feature vectors are computationally simple first order statistics of the optical flow vectors, obtained from coarse to fine rectangular patches centered around the object. The results indicate the superior performance of the complex-valued neural classifier for action recognition. The superior performance of the complex neural network for action recognition stems from the fact that motion, by nature, consists of two components, one along each of the axes.
Resumo:
We consider the problem of extracting a signature representation of similar entities employing covariance descriptors. Covariance descriptors can efficiently represent objects and are robust to scale and pose changes. We posit that covariance descriptors corresponding to similar objects share a common geometrical structure which can be extracted through joint diagonalization. We term this diagonalizing matrix as the Covariance Profile (CP). CP can be used to measure the distance of a novel object to an object set through the diagonality measure. We demonstrate how CP can be employed on images as well as for videos, for applications such as face recognition and object-track clustering.