11 resultados para area-based matching
em Massachusetts Institute of Technology
Resumo:
We describe a method for modeling object classes (such as faces) using 2D example images and an algorithm for matching a model to a novel image. The object class models are "learned'' from example images that we call prototypes. In addition to the images, the pixelwise correspondences between a reference prototype and each of the other prototypes must also be provided. Thus a model consists of a linear combination of prototypical shapes and textures. A stochastic gradient descent algorithm is used to match a model to a novel image by minimizing the error between the model and the novel image. Example models are shown as well as example matches to novel images. The robustness of the matching algorithm is also evaluated. The technique can be used for a number of applications including the computation of correspondence between novel images of a certain known class, object recognition, image synthesis and image compression.
Resumo:
We describe a technique for finding pixelwise correspondences between two images by using models of objects of the same class to guide the search. The object models are 'learned' from example images (also called prototypes) of an object class. The models consist of a linear combination ofsprototypes. The flow fields giving pixelwise correspondences between a base prototype and each of the other prototypes must be given. A novel image of an object of the same class is matched to a model by minimizing an error between the novel image and the current guess for the closest modelsimage. Currently, the algorithm applies to line drawings of objects. An extension to real grey level images is discussed.
Resumo:
We consider the problem of matching model and sensory data features in the presence of geometric uncertainty, for the purpose of object localization and identification. The problem is to construct sets of model feature and sensory data feature pairs that are geometrically consistent given that there is uncertainty in the geometry of the sensory data features. If there is no geometric uncertainty, polynomial-time algorithms are possible for feature matching, yet these approaches can fail when there is uncertainty in the geometry of data features. Existing matching and recognition techniques which account for the geometric uncertainty in features either cannot guarantee finding a correct solution, or can construct geometrically consistent sets of feature pairs yet have worst case exponential complexity in terms of the number of features. The major new contribution of this work is to demonstrate a polynomial-time algorithm for constructing sets of geometrically consistent feature pairs given uncertainty in the geometry of the data features. We show that under a certain model of geometric uncertainty the feature matching problem in the presence of uncertainty is of polynomial complexity. This has important theoretical implications by demonstrating an upper bound on the complexity of the matching problem, an by offering insight into the nature of the matching problem itself. These insights prove useful in the solution to the matching problem in higher dimensional cases as well, such as matching three-dimensional models to either two or three-dimensional sensory data. The approach is based on an analysis of the space of feasible transformation parameters. This paper outlines the mathematical basis for the method, and describes the implementation of an algorithm for the procedure. Experiments demonstrating the method are reported.
Resumo:
A polynomial time algorithm (pruned correspondence search, PCS) with good average case performance for solving a wide class of geometric maximal matching problems, including the problem of recognizing 3D objects from a single 2D image, is presented. Efficient verification algorithms, based on a linear representation of location constraints, are given for the case of affine transformations among vector spaces and for the case of rigid 2D and 3D transformations with scale. Some preliminary experiments suggest that PCS is a practical algorithm. Its similarity to existing correspondence based algorithms means that a number of existing techniques for speedup can be incorporated into PCS to improve its performance.
Resumo:
We present an image-based approach to infer 3D structure parameters using a probabilistic "shape+structure'' model. The 3D shape of a class of objects may be represented by sets of contours from silhouette views simultaneously observed from multiple calibrated cameras. Bayesian reconstructions of new shapes can then be estimated using a prior density constructed with a mixture model and probabilistic principal components analysis. We augment the shape model to incorporate structural features of interest; novel examples with missing structure parameters may then be reconstructed to obtain estimates of these parameters. Model matching and parameter inference are done entirely in the image domain and require no explicit 3D construction. Our shape model enables accurate estimation of structure despite segmentation errors or missing views in the input silhouettes, and works even with only a single input view. Using a dataset of thousands of pedestrian images generated from a synthetic model, we can perform accurate inference of the 3D locations of 19 joints on the body based on observed silhouette contours from real images.
Resumo:
Machine translation has been a particularly difficult problem in the area of Natural Language Processing for over two decades. Early approaches to translation failed since interaction effects of complex phenomena in part made translation appear to be unmanageable. Later approaches to the problem have succeeded (although only bilingually), but are based on many language-specific rules of a context-free nature. This report presents an alternative approach to natural language translation that relies on principle-based descriptions of grammar rather than rule-oriented descriptions. The model that has been constructed is based on abstract principles as developed by Chomsky (1981) and several other researchers working within the "Government and Binding" (GB) framework. Thus, the grammar is viewed as a modular system of principles rather than a large set of ad hoc language-specific rules.
Resumo:
In model-based vision, there are a huge number of possible ways to match model features to image features. In addition to model shape constraints, there are important match-independent constraints that can efficiently reduce the search without the combinatorics of matching. I demonstrate two specific modules in the context of a complete recognition system, Reggie. The first is a region-based grouping mechanism to find groups of image features that are likely to come from a single object. The second is an interpretive matching scheme to make explicit hypotheses about occlusion and instabilities in the image features.
Resumo:
We present a statistical image-based shape + structure model for Bayesian visual hull reconstruction and 3D structure inference. The 3D shape of a class of objects is represented by sets of contours from silhouette views simultaneously observed from multiple calibrated cameras. Bayesian reconstructions of new shapes are then estimated using a prior density constructed with a mixture model and probabilistic principal components analysis. We show how the use of a class-specific prior in a visual hull reconstruction can reduce the effect of segmentation errors from the silhouette extraction process. The proposed method is applied to a data set of pedestrian images, and improvements in the approximate 3D models under various noise conditions are shown. We further augment the shape model to incorporate structural features of interest; unknown structural parameters for a novel set of contours are then inferred via the Bayesian reconstruction process. Model matching and parameter inference are done entirely in the image domain and require no explicit 3D construction. Our shape model enables accurate estimation of structure despite segmentation errors or missing views in the input silhouettes, and works even with only a single input view. Using a data set of thousands of pedestrian images generated from a synthetic model, we can accurately infer the 3D locations of 19 joints on the body based on observed silhouette contours from real images.
Resumo:
A persistent issue of debate in the area of 3D object recognition concerns the nature of the experientially acquired object models in the primate visual system. One prominent proposal in this regard has expounded the use of object centered models, such as representations of the objects' 3D structures in a coordinate frame independent of the viewing parameters [Marr and Nishihara, 1978]. In contrast to this is another proposal which suggests that the viewing parameters encountered during the learning phase might be inextricably linked to subsequent performance on a recognition task [Tarr and Pinker, 1989; Poggio and Edelman, 1990]. The 'object model', according to this idea, is simply a collection of the sample views encountered during training. Given that object centered recognition strategies have the attractive feature of leading to viewpoint independence, they have garnered much of the research effort in the field of computational vision. Furthermore, since human recognition performance seems remarkably robust in the face of imaging variations [Ellis et al., 1989], it has often been implicitly assumed that the visual system employs an object centered strategy. In the present study we examine this assumption more closely. Our experimental results with a class of novel 3D structures strongly suggest the use of a view-based strategy by the human visual system even when it has the opportunity of constructing and using object-centered models. In fact, for our chosen class of objects, the results seem to support a stronger claim: 3D object recognition is 2D view-based.
Resumo:
The image comparison operation ??sessing how well one image matches another ??rms a critical component of many image analysis systems and models of human visual processing. Two norms used commonly for this purpose are L1 and L2, which are specific instances of the Minkowski metric. However, there is often not a principled reason for selecting one norm over the other. One way to address this problem is by examining whether one metric better captures the perceptual notion of image similarity than the other. With this goal, we examined perceptual preferences for images retrieved on the basis of the L1 versus the L2 norm. These images were either small fragments without recognizable content, or larger patterns with recognizable content created via vector quantization. In both conditions the subjects showed a consistent preference for images matched using the L1 metric. These results suggest that, in the domain of natural images of the kind we have used, the L1 metric may better capture human notions of image similarity.
Resumo:
Building robust recognition systems requires a careful understanding of the effects of error in sensed features. Error in these image features results in a region of uncertainty in the possible image location of each additional model feature. We present an accurate, analytic approximation for this uncertainty region when model poses are based on matching three image and model points, for both Gaussian and bounded error in the detection of image points, and for both scaled-orthographic and perspective projection models. This result applies to objects that are fully three- dimensional, where past results considered only two-dimensional objects. Further, we introduce a linear programming algorithm to compute the uncertainty region when poses are based on any number of initial matches. Finally, we use these results to extend, from two-dimensional to three- dimensional objects, robust implementations of alignmentt interpretation- tree search, and ransformation clustering.