14 resultados para depth perception

em Massachusetts Institute of Technology


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the absence of cues for absolute depth measurements as binocular disparity, motion, or defocus, the absolute distance between the observer and a scene cannot be measured. The interpretation of shading, edges and junctions may provide a 3D model of the scene but it will not inform about the actual "size" of the space. One possible source of information for absolute depth estimation is the image size of known objects. However, this is computationally complex due to the difficulty of the object recognition process. Here we propose a source of information for absolute depth estimation that does not rely on specific objects: we introduce a procedure for absolute depth estimation based on the recognition of the whole scene. The shape of the space of the scene and the structures present in the scene are strongly related to the scale of observation. We demonstrate that, by recognizing the properties of the structures present in the image, we can infer the scale of the scene, and therefore its absolute mean depth. We illustrate the interest in computing the mean depth of the scene with application to scene recognition and object detection.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The binocular perception of shape and depth relations between objects can change considerably if the viewing direction is changed only by a small angle. We explored this effect psychophysically and found a strong depth reduction effect for large disparity gradients. The effect is found to be strongest for horizontally oriented stimuli, and stronger for line stimuli than for points. This depth scaling effect is discussed in a computational framework of stereo based on a Baysian approach which allows integration of information from different types of matching primitives weighted according to their robustness.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is proposed that subjective contours are an artifact of the perception of natural three-dimensional surfaces. A recent theory of surface interpolation implies that "subjective surfaces" are constructed in the visual system by interpolation between three-dimensional values arising from interpretation of a variety of surface cues. We show that subjective surfaces can take any form, including singly and doubly curved surfaces, as well as the commonly discussed fronto-parallel planes. In addition, it is necessary in the context of computational vision to make explicit the discontinuities, both in depth and in surface orientation, in the surfaces constructed by interpolation. It is proposed that subjective surfaces and subjective contours are demonstrated. The role played by figure completion and enhanced brightness contrast in the determination of subjective surfaces is discussed. All considerations of surface perception apply equally to subjective surfaces.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We first pose the following problem: to develop a program which takes line-drawings as input and constructs three-dimensional objects as output, such that the output objects are the same as the ones we see when we look at the input line-drawing. We then introduce the principle of minimum standard-deviation of angles (MSDA) and discuss a program based on MSDA. We present the results of testing this program with a variety of line- drawings and show that the program constitutes a solution to the stated problem over the range of line-drawings tested. Finally, we relate this work to its historical antecedents in the psychological and computer-vision literature.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A key question regarding primate visual motion perception is whether the motion of 2D patterns is recovered by tracking distinctive localizable features [Lorenceau and Gorea, 1989; Rubin and Hochstein, 1992] or by integrating ambiguous local motion estimates [Adelson and Movshon, 1982; Wilson and Kim, 1992]. For a two-grating plaid pattern, this translates to either tracking the grating intersections or to appropriately combining the motion estimates for each grating. Since both component and feature information are simultaneously available in any plaid pattern made of contrast defined gratings, it is unclear how to determine which of the two schemes is actually used to recover the plaid"s motion. To address this problem, we have designed a plaid pattern made with subjective, rather than contrast defined, gratings. The distinguishing characteristic of such a plaid pattern is that it contains no contrast defined intersections that may be tracked. We find that notwithstanding the absence of such features, observers can accurately recover the pattern velocity. Additionally we show that the hypothesis of tracking "illusory features" to estimate pattern motion does not stand up to experimental test. These results present direct evidence in support of the idea that calls for the integration of component motions over the one that mandates tracking localized features to recover 2D pattern motion. The localized features, we suggest, are used primarily as providers of grouping information - which component motion signals to integrate and which not to.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The inferior temporal cortex (IT) of monkeys is thought to play an essential role in visual object recognition. Inferotemporal neurons are known to respond to complex visual stimuli, including patterns like faces, hands, or other body parts. What is the role of such neurons in object recognition? The present study examines this question in combined psychophysical and electrophysiological experiments, in which monkeys learned to classify and recognize novel visual 3D objects. A population of neurons in IT were found to respond selectively to such objects that the monkeys had recently learned to recognize. A large majority of these cells discharged maximally for one view of the object, while their response fell off gradually as the object was rotated away from the neuron"s preferred view. Most neurons exhibited orientation-dependent responses also during view-plane rotations. Some neurons were found tuned around two views of the same object, while a very small number of cells responded in a view- invariant manner. For five different objects that were extensively used during the training of the animals, and for which behavioral performance became view-independent, multiple cells were found that were tuned around different views of the same object. No selective responses were ever encountered for views that the animal systematically failed to recognize. The results of our experiments suggest that neurons in this area can develop a complex receptive field organization as a consequence of extensive training in the discrimination and recognition of objects. Simple geometric features did not appear to account for the neurons" selective responses. These findings support the idea that a population of neurons -- each tuned to a different object aspect, and each showing a certain degree of invariance to image transformations -- may, as an assembly, encode complex 3D objects. In such a system, several neurons may be active for any given vantage point, with a single unit acting like a blurred template for a limited neighborhood of a single view.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis proposes a computational model of how children may come to learn the meanings of words in their native language. The proposed model is divided into two separate components. One component produces semantic descriptions of visually observed events while the other correlates those descriptions with co-occurring descriptions of those events in natural language. The first part of this thesis describes three implementations of the correlation process whereby representations of the meanings of whole utterances can be decomposed into fragments assigned as representations of the meanings of individual words. The second part of this thesis describes an implemented computer program that recognizes the occurrence of simple spatial motion events in simulated video input.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Reconstructing a surface from sparse sensory data is a well known problem in computer vision. Early vision modules typically supply sparse depth, orientation and discontinuity information. The surface reconstruction module incorporates these sparse and possibly conflicting measurements of a surface into a consistent, dense depth map. The coupled depth/slope model developed here provides a novel computational solution to the surface reconstruction problem. This method explicitly computes dense slope representation as well as dense depth representations. This marked change from previous surface reconstruction algorithms allows a natural integration of orientation constraints into the surface description, a feature not easily incorporated into earlier algorithms. In addition, the coupled depth/ slope model generalizes to allow for varying amounts of smoothness at different locations on the surface. This computational model helps conceptualize the problem and leads to two possible implementations- analog and digital. The model can be implemented as an electrical or biological analog network since the only computations required at each locally connected node are averages, additions and subtractions. A parallel digital algorithm can be derived by using finite difference approximations. The resulting system of coupled equations can be solved iteratively on a mesh-pf-processors computer, such as the Connection Machine. Furthermore, concurrent multi-grid methods are designed to speed the convergence of this digital algorithm.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The visual analysis of surface shape from texture and surface contour is treated within a computational framework. The aim of this study is to determine valid constraints that are sufficient to allow surface orientation and distance (up to a multiplicative constant) to be computed from the image of surface texture and of surface contours.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a novel scheme ("Categorical Basis Functions", CBF) for object class representation in the brain and contrast it to the "Chorus of Prototypes" scheme recently proposed by Edelman. The power and flexibility of CBF is demonstrated in two examples. CBF is then applied to investigate the phenomenon of Categorical Perception, in particular the finding by Bulthoff et al. (1998) of categorization of faces by gender without corresponding Categorical Perception. Here, CBF makes predictions that can be tested in a psychophysical experiment. Finally, experiments are suggested to further test CBF.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The processes underlying the perceptual analysis of visual form are believed to have minimal interaction with those subserving the perception of visual motion (Livingstone and Hubel, 1987; Victor and Conte, 1990). Recent reports of functionally and anatomically segregated parallel streams in the primate visual cortex seem to support this hypothesis (Ungerlieder and Mishkin, 1982; VanEssen and Maunsell, 1983; Shipp and Zeki, 1985; Zeki and Shipp, 1988; De Yoe et al., 1994). Here we present perceptual evidence that is at odds with this view and instead suggests strong symmetric interactions between the form and motion processes. In one direction, we show that the introduction of specific static figural elements, say 'F', in a simple motion sequence biases an observer to perceive a particular motion field, say 'M'. In the reverse direction, the imposition of the same motion field 'M' on the original sequence leads the observer to perceive illusory static figural elements 'F'. A specific implication of these findings concerns the possible existence of (what we call) motion end-stopped units in the primate visual system. Such units might constitute part of a mechanism for signalling subjective occluding contours based on motion-field discontinuities.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We have developed a technique called RISE (Random Image Structure Evolution), by which one may systematically sample continuous paths in a high-dimensional image space. A basic RISE sequence depicts the evolution of an object's image from a random field, along with the reverse sequence which depicts the transformation of this image back into randomness. The processing steps are designed to ensure that important low-level image attributes such as the frequency spectrum and luminance are held constant throughout a RISE sequence. Experiments based on the RISE paradigm can be used to address some key open issues in object perception. These include determining the neural substrates underlying object perception, the role of prior knowledge and expectation in object perception, and the developmental changes in object perception skills from infancy to adulthood.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Brightness judgments are a key part of the primate brain's visual analysis of the environment. There is general consensus that the perceived brightness of an image region is based not only on its actual luminance, but also on the photometric structure of its neighborhood. However, it is unclear precisely how a region's context influences its perceived brightness. Recent research has suggested that brightness estimation may be based on a sophisticated analysis of scene layout in terms of transparency, illumination and shadows. This work has called into question the role of low-level mechanisms, such as lateral inhibition, as explanations for brightness phenomena. Here we describe experiments with displays for which low-level and high-level analyses make qualitatively different predictions, and with which we can quantitatively assess the trade-offs between low-level and high-level factors. We find that brightness percepts in these displays are governed by low-level stimulus properties, even when these percepts are inconsistent with higher-level interpretations of scene layout. These results point to the important role of low-level mechanisms in determining brightness percepts.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Human object recognition is generally considered to tolerate changes of the stimulus position in the visual field. A number of recent studies, however, have cast doubt on the completeness of translation invariance. In a new series of experiments we tried to investigate whether positional specificity of short-term memory is a general property of visual perception. We tested same/different discrimination of computer graphics models that were displayed at the same or at different locations of the visual field, and found complete translation invariance, regardless of the similarity of the animals and irrespective of direction and size of the displacement (Exp. 1 and 2). Decisions were strongly biased towards same decisions if stimuli appeared at a constant location, while after translation subjects displayed a tendency towards different decisions. Even if the spatial order of animal limbs was randomized ("scrambled animals"), no deteriorating effect of shifts in the field of view could be detected (Exp. 3). However, if the influence of single features was reduced (Exp. 4 and 5) small but significant effects of translation could be obtained. Under conditions that do not reveal an influence of translation, rotation in depth strongly interferes with recognition (Exp. 6). Changes of stimulus size did not reduce performance (Exp. 7). Tolerance to these object transformations seems to rely on different brain mechanisms, with translation and scale invariance being achieved in principle, while rotation invariance is not.