13 resultados para VISUAL INSPECTION
em Massachusetts Institute of Technology
Resumo:
This report explores the relation between image intensity and object shape. It is shown that image intensity is related to surface orientation and that a variation in image intensity is related to surface curvature. Computational methods are developed which use the measured intensity variation across surfaces of smooth objects to determine surface orientation. In general, surface orientation is not determined locally by the intensity value recorded at each image point. Tools are needed to explore the problem of determining surface orientation from image intensity. The notion of gradient space , popularized by Huffman and Mackworth, is used to represent surface orientation. The notion of a reflectance map, originated by Horn, is used to represent the relation between surface orientation image intensity. The image Hessian is defined and used to represent surface curvature. Properties of surface curvature are expressed as constraints on possible surface orientations corresponding to a given image point. Methods are presented which embed assumptions about surface curvature in algorithms for determining surface orientation from the intensities recorded in a single view. If additional images of the same object are obtained by varying the direction of incident illumination, then surface orientation is determined locally by the intensity values recorded at each image point. This fact is exploited in a new technique called photometric stereo. The visual inspection of surface defects in metal castings is considered. Two casting applications are discussed. The first is the precision investment casting of turbine blades and vanes for aircraft jet engines. In this application, grain size is an important process variable. The existing industry standard for estimating the average grain size of metals is implemented and demonstrated on a sample turbine vane. Grain size can be computed form the measurements obtained in an image, once the foreshortening effects of surface curvature are accounted for. The second is the green sand mold casting of shuttle eyes for textile looms. Here, physical constraints inherent to the casting process translate into these constraints, it is necessary to interpret features of intensity as features of object shape. Both applications demonstrate that successful visual inspection requires the ability to interpret observed changes in intensity in the context of surface topography. The theoretical tools developed in this report provide a framework for this interpretation.
Resumo:
In this paper we present an approach to perceptual organization and attention based on Curved Inertia Frames (C.I.F.), a novel definition of "curved axis of inertia'' tolerant to noisy and spurious data. The definition is useful because it can find frames that correspond to large, smooth, convex, symmetric and central parts. It is novel because it is global and can detect curved axes. We discuss briefly the relation to human perception, the recognition of non-rigid objects, shape description, and extensions to finding "features", inside/outside relations, and long- smooth ridges in arbitrary surfaces.
Resumo:
In many different spatial discrimination tasks, such as in determining the sign of the offset in a vernier stimulus, the human visual system exhibits hyperacuity-level performance by evaluating spatial relations with the precision of a fraction of a photoreceptor"s diameter. We propose that this impressive performance depends in part on a fast learning process that uses relatively few examples and occurs at an early processing stage in the visual pathway. We show that this hypothesis is plausible by demonstrating that it is possible to synthesize, from a small number of examples of a given task, a simple (HyperBF) network that attains the required performance level. We then verify with psychophysical experiments some of the key predictions of our conjecture. In particular, we show that fast timulus-specific learning indeed takes place in the human visual system and that this learning does not transfer between two slightly different hyperacuity tasks.
Resumo:
A typical robot vision scenario might involve a vehicle moving with an unknown 3D motion (translation and rotation) while taking intensity images of an arbitrary environment. This paper describes the theory and implementation issues of tracking any desired point in the environment. This method is performed completely in software without any need to mechanically move the camera relative to the vehicle. This tracking technique is simple an inexpensive. Furthermore, it does not use either optical flow or feature correspondence. Instead, the spatio-temporal gradients of the input intensity images are used directly. The experimental results presented support the idea of tracking in software. The final result is a sequence of tracked images where the desired point is kept stationary in the images independent of the nature of the relative motion. Finally, the quality of these tracked images are examined using spatio-temporal gradient maps.
Resumo:
Recovering a volumetric model of a person, car, or other object of interest from a single snapshot would be useful for many computer graphics applications. 3D model estimation in general is hard, and currently requires active sensors, multiple views, or integration over time. For a known object class, however, 3D shape can be successfully inferred from a single snapshot. We present a method for generating a ``virtual visual hull''-- an estimate of the 3D shape of an object from a known class, given a single silhouette observed from an unknown viewpoint. For a given class, a large database of multi-view silhouette examples from calibrated, though possibly varied, camera rigs are collected. To infer a novel single view input silhouette's virtual visual hull, we search for 3D shapes in the database which are most consistent with the observed contour. The input is matched to component single views of the multi-view training examples. A set of viewpoint-aligned virtual views are generated from the visual hulls corresponding to these examples. The 3D shape estimate for the input is then found by interpolating between the contours of these aligned views. When the underlying shape is ambiguous given a single view silhouette, we produce multiple visual hull hypotheses; if a sequence of input images is available, a dynamic programming approach is applied to find the maximum likelihood path through the feasible hypotheses over time. We show results of our algorithm on real and synthetic images of people.
Resumo:
We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data, since each classifier requires the computation of many different image features. In particular, for independently trained detectors, the (run-time) computational complexity, and the (training-time) sample complexity, scales linearly with the number of classes to be detected. It seems unlikely that such an approach will scale up to allow recognition of hundreds or thousands of objects. We present a multi-class boosting procedure (joint boosting) that reduces the computational and sample complexity, by finding common features that can be shared across the classes (and/or views). The detectors for each class are trained jointly, rather than independently. For a given performance level, the total number of features required, and therefore the computational cost, is observed to scale approximately logarithmically with the number of classes. The features selected jointly are closer to edges and generic features typical of many natural structures instead of finding specific object parts. Those generic features generalize better and reduce considerably the computational cost of an algorithm for multi-class object detection.
Resumo:
This research project is a study of the role of fixation and visual attention in object recognition. In this project, we build an active vision system which can recognize a target object in a cluttered scene efficiently and reliably. Our system integrates visual cues like color and stereo to perform figure/ground separation, yielding candidate regions on which to focus attention. Within each image region, we use stereo to extract features that lie within a narrow disparity range about the fixation position. These selected features are then used as input to an alignment-style recognition system. We show that visual attention and fixation significantly reduce the complexity and the false identifications in model-based recognition using Alignment methods. We also demonstrate that stereo can be used effectively as a figure/ground separator without the need for accurate camera calibration.
Resumo:
The goal of this work is to navigate through an office environmentsusing only visual information gathered from four cameras placed onboard a mobile robot. The method is insensitive to physical changes within the room it is inspecting, such as moving objects. Forward and rotational motion vision are used to find doors and rooms, and these can be used to build topological maps. The map is built without the use of odometry or trajectory integration. The long term goal of the project described here is for the robot to build simple maps of its environment and to localize itself within this framework.
Resumo:
This report shows how knowledge about the visual world can be built into a shape representation in the form of a descriptive vocabulary making explicit the important geometrical relationships comprising objects' shapes. Two computational tools are offered: (1) Shapestokens are placed on a Scale-Space Blackboard, (2) Dimensionality-reduction captures deformation classes in configurations of tokens. Knowledge lies in the token types and deformation classes tailored to the constraints and regularities ofparticular shape worlds. A hierarchical shape vocabulary has been implemented supporting several later visual tasks in the two-dimensional shape domain of the dorsal fins of fishes.
Resumo:
We present the results of an implemented system for learning structural prototypes from grey-scale images. We show how to divide an object into subparts and how to encode the properties of these subparts and the relations between them. We discuss the importance of hierarchy and grouping in representing objects and show how a notion of visual similarities can be embedded in the description language. Finally we exhibit a learning algorithm that forms class models from the descriptions produced and uses these models to recognize new members of the class.
Resumo:
A system for visual recognition is described, with implications for the general problem of representation of knowledge to assist control. The immediate objective is a computer system that will recognize objects in a visual scene, specifically hammers. The computer receives an array of light intensities from a device like a television camera. It is to locate and identify the hammer if one is present. The computer must produce from the numerical "sensory data" a symbolic description that constitutes its perception of the scene. Of primary concern is the control of the recognition process. Control decisions should be guided by the partial results obtained on the scene. If a hammer handle is observed this should suggest that the handle is part of a hammer and advise where to look for the hammer head. The particular knowledge that a handle has been found combines with general knowledge about hammers to influence the recognition process. This use of knowledge to direct control is denoted here by the term "active knowledge". A descriptive formalism is presented for visual knowledge which identifies the relationships relevant to the active use of the knowledge. A control structure is provided which can apply knowledge organized in this fashion actively to the processing of a given scene.
Resumo:
Methods are presented (1) to partition or decompose a visual scene into the bodies forming it; (2) to position these bodies in three-dimensional space, by combining two scenes that make a stereoscopic pair; (3) to find the regions or zones of a visual scene that belong to its background; (4) to carry out the isolation of objects in (1) when the input has inaccuracies. Running computer programs implement the methods, and many examples illustrate their behavior. The input is a two-dimensional line-drawing of the scene, assumed to contain three-dimensional bodies possessing flat faces (polyhedra); some of them may be partially occluded. Suggestions are made for extending the work to curved objects. Some comparisons are made with human visual perception. The main conclusion is that it is possible to separate a picture or scene into the constituent objects exclusively on the basis of monocular geometric properties (on the basis of pure form); in fact, successful methods are shown.
Resumo:
The work reported here lies in the area of overlap between artificial intelligence software engineering. As research in artificial intelligence, it is a step towards a model of problem solving in the domain of programming. In particular, this work focuses on the routine aspects of programming which involve the application of previous experience with similar programs. I call this programming by inspection. Programming is viewed here as a kind of engineering activity. Analysis and synthesis by inspection area prominent part of expert problem solving in many other engineering disciplines, such as electrical and mechanical engineering. The notion of inspections methods in programming developed in this work is motivated by similar notions in other areas of engineering. This work is also motivated by current practical concerns in the area of software engineering. The inadequacy of current programming technology is universally recognized. Part of the solution to this problem will be to increase the level of automation in programming. I believe that the next major step in the evolution of more automated programming will be interactive systems which provide a mixture of partially automated program analysis, synthesis and verification. One such system being developed at MIT, called the programmer's apprentice, is the immediate intended application of this work. This report concentrates on the knowledge are of the programmer's apprentice, which is the form of a taxonomy of commonly used algorithms and data structures. To the extent that a programmer is able to construct and manipulate programs in terms of the forms in such a taxonomy, he may relieve himself of many details and generally raise the conceptual level of his interaction with the system, as compared with present day programming environments. Also, since it is practical to expand a great deal of effort pre-analyzing the entries in a library, the difficulty of verifying the correctness of programs constructed this way is correspondingly reduced. The feasibility of this approach is demonstrated by the design of an initial library of common techniques for manipulating symbolic data. This document also reports on the further development of a formalism called the plan calculus for specifying computations in a programming language independent manner. This formalism combines both data and control abstraction in a uniform framework that has facilities for representing multiple points of view and side effects.