10 resultados para Scene understanding

em Massachusetts Institute of Technology


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Rapid judgments about the properties and spatial relations of objects are the crux of visually guided interaction with the world. Vision begins, however, with essentially pointwise representations of the scene, such as arrays of pixels or small edge fragments. For adequate time-performance in recognition, manipulation, navigation, and reasoning, the processes that extract meaningful entities from the pointwise representations must exploit parallelism. This report develops a framework for the fast extraction of scene entities, based on a simple, local model of parallel computation.sAn image chunk is a subset of an image that can act as a unit in the course of spatial analysis. A parallel preprocessing stage constructs a variety of simple chunks uniformly over the visual array. On the basis of these chunks, subsequent serial processes locate relevant scene components and assemble detailed descriptions of them rapidly. This thesis defines image chunks that facilitate the most potentially time-consuming operations of spatial analysis---boundary tracing, area coloring, and the selection of locations at which to apply detailed analysis. Fast parallel processes for computing these chunks from images, and chunk-based formulations of indexing, tracing, and coloring, are presented. These processes have been simulated and evaluated on the lisp machine and the connection machine.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

What are the characteristics of the process by which an intent is transformed into a plan and then a program? How is a program debugged? This paper analyzes these questions in the context of understanding simple turtle programs. To understand and debug a program, a description of its intent is required. For turtle programs, this is a model of the desired geometric picture. a picture language is provided for this purpose. Annotation is necessary for documenting the performance of a program in such a way that the system can examine the procedures behavior as well as consider hypothetical lines of development due to tentative debugging edits. A descriptive framework representing both causality and teleology is developed. To understand the relation between program and model, the plan must be known. The plan is a description of the methodology for accomplishing the model. Concepts are explicated for translating the global intent of a declarative model into the local imperative code of a program. Given the plan, model and program, the system can interpret the picture and recognize inconsistencies. The description of the discrepancies between the picture actually produced by the program and the intended scene is the input to a debugging system. Repair of the program is based on a combination of general debugging techniques and specific fixing knowledge associated with the geometric model primitives. In both the plan and repairing the bugs, the system exhibits an interesting style of analysis. It is capable of debugging itself and reformulating its analysis of a plan or bug in response to self-criticism. In this fashion, it can qualitatively reformulate its theory of the program or error to account for surprises or anomalies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes a system for the computer understanding of English. The system answers questions, executes commands, and accepts information in normal English dialog. It uses semantic information and context to understand discourse and to disambiguate sentences. It combines a complete syntactic analysis of each sentence with a "heuristic understander" which uses different kinds of information about a sentence, other parts of the discourse, and general information about the world in deciding what the sentence means. It is based on the belief that a computer cannot deal reasonably with language unless it can "understand" the subject it is discussing. The program is given a detailed model of the knowledge needed by a simple robot having only a hand and an eye. We can give it instructions to manipulate toy objects, interrogate it about the scene, and give it information it will use in deduction. In addition to knowing the properties of toy objects, the program has a simple model of its own mentality. It can remember and discuss its plans and actions as well as carry them out. It enters into a dialog with a person, responding to English sentences with actions and English replies, and asking for clarification when its heuristic programs cannot understand a sentence through use of context and physical knowledge.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We describe a new method for motion estimation and 3D reconstruction from stereo image sequences obtained by a stereo rig moving through a rigid world. We show that given two stereo pairs one can compute the motion of the stereo rig directly from the image derivatives (spatial and temporal). Correspondences are not required. One can then use the images from both pairs combined to compute a dense depth map. The motion estimates between stereo pairs enable us to combine depth maps from all the pairs in the sequence to form an extended scene reconstruction and we show results from a real image sequence. The motion computation is a linear least squares computation using all the pixels in the image. Areas with little or no contrast are implicitly weighted less so one does not have to explicitly apply a confidence measure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The goal of low-level vision is to estimate an underlying scene, given an observed image. Real-world scenes (e.g., albedos or shapes) can be very complex, conventionally requiring high dimensional representations which are hard to estimate and store. We propose a low-dimensional representation, called a scene recipe, that relies on the image itself to describe the complex scene configurations. Shape recipes are an example: these are the regression coefficients that predict the bandpassed shape from bandpassed image data. We describe the benefits of this representation, and show two uses illustrating their properties: (1) we improve stereo shape estimates by learning shape recipes at low resolution and applying them at full resolution; (2) Shape recipes implicitly contain information about lighting and materials and we use them for material segmentation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In my research, I have performed an extensive experimental investigation of harmonic-drive properties such as stiffness, friction, and kinematic error. From my experimental results, I have found that these properties can be sharply non-linear and highly dependent on operating conditions. Due to the complex interaction of these poorly behaved transmission properties, dynamic response measurements showed surprisingly agitated behavior, especially around system resonance. Theoretical models developed to mimic the observed response illustrated that non-linear frictional effects cannot be ignored in any accurate harmonic-drive representation. Additionally, if behavior around system resonance must be replicated, kinematic error and transmission compliance as well as frictional dissipation from gear-tooth rubbing must all be incorporated into the model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The problems under consideration center around the interpretation of binocular stereo disparity. In particular, the goal is to establish a set of mappings from stereo disparity to corresponding three-dimensional scene geometry. An analysis has been developed that shows how disparity information can be interpreted in terms of three-dimensional scene properties, such as surface depth, discontinuities, and orientation. These theoretical developments have been embodied in a set of computer algorithms for the recovery of scene geometry from input stereo disparity. The results of applying these algorithms to several disparity maps are presented. Comparisons are made to the interpretation of stereo disparity by biological systems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work describes a program, called TOPLE, which uses a procedural model of the world to understand simple declarative sentences. It accepts sentences in a modified predicate calculus symbolism, and uses plausible reasoning to visualize scenes, resolve ambiguous pronoun and noun phrase references, explain events, and make conditional predications. Because it does plausible deduction, with tentative conclusions, it must contain a formalism for describing its reasons for its conclusions and what the alternatives are. When an inconsistency is detected in its world model, it uses its recorded information to resolve it, one way or another. It uses simulation techniques to make deductions about creatures motivation and behavior, assuming they are goal-directed beings like itself.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Artificial Intelligence research involves the creation of extremely complex programs which must possess the capability to introspect, learn, and improve their expertise. Any truly intelligent program must be able to create procedures and to modify them as it gathers information from its experience. [Sussman, 1975] produced such a system for a 'mini-world'; but truly intelligent programs must be considerably more complex. A crucial stepping stone in AI research is the development of a system which can understand complex programs well enough to modify them. There is also a complexity barrier in the world of commercial software which is making the cost of software production and maintenance prohibitive. Here too a system which is capable of understanding complex programs is a necessary step. The Programmer's Apprentice Project [Rich and Shrobe, 76] is attempting to develop an interactive programming tool which will help expert programmers deal with the complexity involved in engineering a large software system. This report describes REASON, the deductive component of the programmer's apprentice. REASON is intended to help expert programmers in the process of evolutionary program design. REASON utilizes the engineering techniques of modelling, decomposition, and analysis by inspection to determine how modules interact to achieve the desired overall behavior of a program. REASON coordinates its various sources of knowledge by using a dependency-directed structure which records the justification for each deduction it makes. Once a program has been analyzed these justifications can be summarized into a teleological structure called a plan which helps the system understand the impact of a proposed program modification.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Methods are presented (1) to partition or decompose a visual scene into the bodies forming it; (2) to position these bodies in three-dimensional space, by combining two scenes that make a stereoscopic pair; (3) to find the regions or zones of a visual scene that belong to its background; (4) to carry out the isolation of objects in (1) when the input has inaccuracies. Running computer programs implement the methods, and many examples illustrate their behavior. The input is a two-dimensional line-drawing of the scene, assumed to contain three-dimensional bodies possessing flat faces (polyhedra); some of them may be partially occluded. Suggestions are made for extending the work to curved objects. Some comparisons are made with human visual perception. The main conclusion is that it is possible to separate a picture or scene into the constituent objects exclusively on the basis of monocular geometric properties (on the basis of pure form); in fact, successful methods are shown.