988 resultados para Shape classification
Resumo:
Structure from motion often refers to the computation of 3D structure from a matched sequence of images. However, a depth map of a surface is difficult to compute and may not be a good representation for storage and recognition. Given matched images, I will first show that the sign of the normal curvature in a given direction at a given point in the image can be computed from a simple difference of slopes of line-segments in one image. Using this result, local surface patches can be classified as convex, concave, parabolic (cylindrical), hyperbolic (saddle point) or planar. At the same time the translational component of the optical flow is obtained, from which the focus of expansion can be computed.
Resumo:
Similarity measurements between 3D objects and 2D images are useful for the tasks of object recognition and classification. We distinguish between two types of similarity metrics: metrics computed in image-space (image metrics) and metrics computed in transformation-space (transformation metrics). Existing methods typically use image and the nearest view of the object. Example for such a measure is the Euclidean distance between feature points in the image and corresponding points in the nearest view. (Computing this measure is equivalent to solving the exterior orientation calibration problem.) In this paper we introduce a different type of metrics: transformation metrics. These metrics penalize for the deformatoins applied to the object to produce the observed image. We present a transformation metric that optimally penalizes for "affine deformations" under weak-perspective. A closed-form solution, together with the nearest view according to this metric, are derived. The metric is shown to be equivalent to the Euclidean image metric, in the sense that they bound each other from both above and below. For Euclidean image metric we offier a sub-optimal closed-form solution and an iterative scheme to compute the exact solution.
Resumo:
In this paper we present some extensions to the k-means algorithm for vector quantization that permit its efficient use in image segmentation and pattern classification tasks. It is shown that by introducing state variables that correspond to certain statistics of the dynamic behavior of the algorithm, it is possible to find the representative centers fo the lower dimensional maniforlds that define the boundaries between classes, for clouds of multi-dimensional, mult-class data; this permits one, for example, to find class boundaries directly from sparse data (e.g., in image segmentation tasks) or to efficiently place centers for pattern classification (e.g., with local Gaussian classifiers). The same state variables can be used to define algorithms for determining adaptively the optimal number of centers for clouds of data with space-varying density. Some examples of the applicatin of these extensions are also given.
Resumo:
The correspondence problem in computer vision is basically a matching task between two or more sets of features. In this paper, we introduce a vectorized image representation, which is a feature-based representation where correspondence has been established with respect to a reference image. This representation has two components: (1) shape, or (x, y) feature locations, and (2) texture, defined as the image grey levels mapped onto the standard reference image. This paper explores an automatic technique for "vectorizing" face images. Our face vectorizer alternates back and forth between computation steps for shape and texture, and a key idea is to structure the two computations so that each one uses the output of the other. A hierarchical coarse-to-fine implementation is discussed, and applications are presented to the problems of facial feature detection and registration of two arbitrary faces.
Resumo:
This paper describes a representation of the dynamics of human walking action for the purpose of person identification and classification by gait appearance. Our gait representation is based on simple features such as moments extracted from video silhouettes of human walking motion. We claim that our gait dynamics representation is rich enough for the task of recognition and classification. The use of our feature representation is demonstrated in the task of person recognition from video sequences of orthogonal views of people walking. We demonstrate the accuracy of recognition on gait video sequences collected over different days and times, and under varying lighting environments. In addition, preliminary results are shown on gender classification using our gait dynamics features.
Resumo:
We introduce a new method to describe, in a single image, changes in shape over time. We acquire both range and image information with a stationary stereo camera. From the pictures taken, we display a composite image consisting of the image data from the surface closest to the camera at every pixel. This reveals the 3-d relationships over time by easy-to-interpret occlusion relationships in the composite image. We call the composite a shape-time photograph. Small errors in depth measurements cause artifacts in the shape-time images. We correct most of these using a Markov network to estimate the most probable front surface, taking into account the depth measurements, their uncertainties, and layer continuity assumptions.
Resumo:
In low-level vision, the representation of scene properties such as shape, albedo, etc., are very high dimensional as they have to describe complicated structures. The approach proposed here is to let the image itself bear as much of the representational burden as possible. In many situations, scene and image are closely related and it is possible to find a functional relationship between them. The scene information can be represented in reference to the image where the functional specifies how to translate the image into the associated scene. We illustrate the use of this representation for encoding shape information. We show how this representation has appealing properties such as locality and slow variation across space and scale. These properties provide a way of improving shape estimates coming from other sources of information like stereo.
Resumo:
The goal of low-level vision is to estimate an underlying scene, given an observed image. Real-world scenes (e.g., albedos or shapes) can be very complex, conventionally requiring high dimensional representations which are hard to estimate and store. We propose a low-dimensional representation, called a scene recipe, that relies on the image itself to describe the complex scene configurations. Shape recipes are an example: these are the regression coefficients that predict the bandpassed shape from bandpassed image data. We describe the benefits of this representation, and show two uses illustrating their properties: (1) we improve stereo shape estimates by learning shape recipes at low resolution and applying them at full resolution; (2) Shape recipes implicitly contain information about lighting and materials and we use them for material segmentation.