11 resultados para Perspective views
em Massachusetts Institute of Technology
Resumo:
We investigate the differences --- conceptually and algorithmically --- between affine and projective frameworks for the tasks of visual recognition and reconstruction from perspective views. It is shown that an affine invariant exists between any view and a fixed view chosen as a reference view. This implies that for tasks for which a reference view can be chosen, such as in alignment schemes for visual recognition, projective invariants are not really necessary. We then use the affine invariant to derive new algebraic connections between perspective views. It is shown that three perspective views of an object are connected by certain algebraic functions of image coordinates alone (no structure or camera geometry needs to be involved).
Resumo:
In the general case, a trilinear relationship between three perspective views is shown to exist. The trilinearity result is shown to be of much practical use in visual recognition by alignment --- yielding a direct method that cuts through the computations of camera transformation, scene structure and epipolar geometry. The proof of the central result may be of further interest as it demonstrates certain regularities across homographies of the plane and introduces new view invariants. Experiments on simulated and real image data were conducted, including a comparative analysis with epipolar intersection and the linear combination methods, with results indicating a greater degree of robustness in practice and a higher level of performance in re-projection tasks.
Resumo:
We propose an affine framework for perspective views, captured by a single extremely simple equation based on a viewer-centered invariant we call "relative affine structure". Via a number of corollaries of our main results we show that our framework unifies previous work --- including Euclidean, projective and affine --- in a natural and simple way, and introduces new, extremely simple, algorithms for the tasks of reconstruction from multiple views, recognition by alignment, and certain image coding applications.
Resumo:
The conceptual component of this work is about "reference surfaces'' which are the dual of reference frames often used for shape representation purposes. The theoretical component of this work involves the question of whether one can find a unique (and simple) mapping that aligns two arbitrary perspective views of an opaque textured quadric surface in 3D, given (i) few corresponding points in the two views, or (ii) the outline conic of the surface in one view (only) and few corresponding points in the two views. The practical component of this work is concerned with applying the theoretical results as tools for the task of achieving full correspondence between views of arbitrary objects.
Resumo:
A method for localization and positioning in an indoor environment is presented. The method is based on representing the scene as a set of 2D views and predicting the appearances of novel views by linear combinations of the model views. The method is accurate under weak perspective projection. Analysis of this projection as well as experimental results demonstrate that in many cases it is sufficient to accurately describe the scene. When weak perspective approximation is invalid, an iterative solution to account for the perspective distortions can be employed. A simple algorithm for repositioning, the task of returning to a previously visited position defined by a single view, is derived from this method.
Resumo:
The task of shape recovery from a motion sequence requires the establishment of correspondence between image points. The two processes, the matching process and the shape recovery one, are traditionally viewed as independent. Yet, information obtained during the process of shape recovery can be used to guide the matching process. This paper discusses the mutual relationship between the two processes. The paper is divided into two parts. In the first part we review the constraints imposed on the correspondence by rigid transformations and extend them to objects that undergo general affine (non rigid) transformation (including stretch and shear), as well as to rigid objects with smooth surfaces. In all these cases corresponding points lie along epipolar lines, and these lines can be recovered from a small set of corresponding points. In the second part of the paper we discuss the potential use of epipolar lines in the matching process. We present an algorithm that recovers the correspondence from three contour images. The algorithm was implemented and used to construct object models for recognition. In addition we discuss how epipolar lines can be used to solve the aperture problem.
Resumo:
The future of the software industry is today being shaped in the courtroom. Most discussions of intellectual property to date, however, have been frames as debates about how the existing law --- promulgated long before the computer revolution --- should be applied to software. This memo is a transcript of a panel discussion on what forms of legal protection should apply to software to best serve both the industry and society in general. After addressing that question we can consider what laws would bring this about.
Resumo:
Model-based object recognition commonly involves using a minimal set of matched model and image points to compute the pose of the model in image coordinates. Furthermore, recognition systems often rely on the "weak-perspective" imaging model in place of the perspective imaging model. This paper discusses computing the pose of a model from three corresponding points under weak-perspective projection. A new solution to the problem is proposed which, like previous solutins, involves solving a biquadratic equation. Here the biquadratic is motivate geometrically and its solutions, comprised of an actual and a false solution, are interpreted graphically. The final equations take a new form, which lead to a simple expression for the image position of any unmatched model point.
Resumo:
Any three-dimensional wire-frame object constructed out of parallelograms can be recovered from a single perspective two-dimensional image. A procedure for performing the recovery is given.
Resumo:
Stereopsis and motion parallax are two methods for recovering three dimensional shape. Theoretical analyses of each method show that neither alone can recover rigid 3D shapes correctly unless other information, such as perspective, is included. The solutions for recovering rigid structure from motion have a reflection ambiguity; the depth scale of the stereoscopic solution will not be known unless the fixation distance is specified in units of interpupil separation. (Hence the configuration will appear distorted.) However, the correct configuration and the disposition of a rigid 3D shape can be recovered if stereopsis and motion are integrated, for then a unique solution follows from a set of linear equations. The correct interpretation requires only three points and two stereo views.
Resumo:
The problem of automatic face recognition is to visually identify a person in an input image. This task is performed by matching the input face against the faces of known people in a database of faces. Most existing work in face recognition has limited the scope of the problem, however, by dealing primarily with frontal views, neutral expressions, and fixed lighting conditions. To help generalize existing face recognition systems, we look at the problem of recognizing faces under a range of viewpoints. In particular, we consider two cases of this problem: (i) many example views are available of each person, and (ii) only one view is available per person, perhaps a driver's license or passport photograph. Ideally, we would like to address these two cases using a simple view-based approach, where a person is represented in the database by using a number of views on the viewing sphere. While the view-based approach is consistent with case (i), for case (ii) we need to augment the single real view of each person with synthetic views from other viewpoints, views we call 'virtual views'. Virtual views are generated using prior knowledge of face rotation, knowledge that is 'learned' from images of prototype faces. This prior knowledge is used to effectively rotate in depth the single real view available of each person. In this thesis, I present the view-based face recognizer, techniques for synthesizing virtual views, and experimental results using real and virtual views in the recognizer.