921 resultados para Visual Attention
Resumo:
In many different spatial discrimination tasks, such as in determining the sign of the offset in a vernier stimulus, the human visual system exhibits hyperacuity-level performance by evaluating spatial relations with the precision of a fraction of a photoreceptor"s diameter. We propose that this impressive performance depends in part on a fast learning process that uses relatively few examples and occurs at an early processing stage in the visual pathway. We show that this hypothesis is plausible by demonstrating that it is possible to synthesize, from a small number of examples of a given task, a simple (HyperBF) network that attains the required performance level. We then verify with psychophysical experiments some of the key predictions of our conjecture. In particular, we show that fast timulus-specific learning indeed takes place in the human visual system and that this learning does not transfer between two slightly different hyperacuity tasks.
Resumo:
A typical robot vision scenario might involve a vehicle moving with an unknown 3D motion (translation and rotation) while taking intensity images of an arbitrary environment. This paper describes the theory and implementation issues of tracking any desired point in the environment. This method is performed completely in software without any need to mechanically move the camera relative to the vehicle. This tracking technique is simple an inexpensive. Furthermore, it does not use either optical flow or feature correspondence. Instead, the spatio-temporal gradients of the input intensity images are used directly. The experimental results presented support the idea of tracking in software. The final result is a sequence of tracked images where the desired point is kept stationary in the images independent of the nature of the relative motion. Finally, the quality of these tracked images are examined using spatio-temporal gradient maps.
Resumo:
Recovering a volumetric model of a person, car, or other object of interest from a single snapshot would be useful for many computer graphics applications. 3D model estimation in general is hard, and currently requires active sensors, multiple views, or integration over time. For a known object class, however, 3D shape can be successfully inferred from a single snapshot. We present a method for generating a ``virtual visual hull''-- an estimate of the 3D shape of an object from a known class, given a single silhouette observed from an unknown viewpoint. For a given class, a large database of multi-view silhouette examples from calibrated, though possibly varied, camera rigs are collected. To infer a novel single view input silhouette's virtual visual hull, we search for 3D shapes in the database which are most consistent with the observed contour. The input is matched to component single views of the multi-view training examples. A set of viewpoint-aligned virtual views are generated from the visual hulls corresponding to these examples. The 3D shape estimate for the input is then found by interpolating between the contours of these aligned views. When the underlying shape is ambiguous given a single view silhouette, we produce multiple visual hull hypotheses; if a sequence of input images is available, a dynamic programming approach is applied to find the maximum likelihood path through the feasible hypotheses over time. We show results of our algorithm on real and synthetic images of people.
Resumo:
We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data, since each classifier requires the computation of many different image features. In particular, for independently trained detectors, the (run-time) computational complexity, and the (training-time) sample complexity, scales linearly with the number of classes to be detected. It seems unlikely that such an approach will scale up to allow recognition of hundreds or thousands of objects. We present a multi-class boosting procedure (joint boosting) that reduces the computational and sample complexity, by finding common features that can be shared across the classes (and/or views). The detectors for each class are trained jointly, rather than independently. For a given performance level, the total number of features required, and therefore the computational cost, is observed to scale approximately logarithmically with the number of classes. The features selected jointly are closer to edges and generic features typical of many natural structures instead of finding specific object parts. Those generic features generalize better and reduce considerably the computational cost of an algorithm for multi-class object detection.
Resumo:
This article describes a model for including scene/context priors in attention guidance. In the proposed scheme, visual context information can be available early in the visual processing chain, in order to modulate the saliency of image regions and to provide an efficient short cut for object detection and recognition. The scene is represented by means of a low-dimensional global description obtained from low-level features. The global scene features are then used to predict the probability of presence of the target object in the scene, and its location and scale, before exploring the image. Scene information can then be used to modulate the saliency of image regions early during the visual processing in order to provide an efficient short cut for object detection and recognition.
Resumo:
The goal of this work is to navigate through an office environmentsusing only visual information gathered from four cameras placed onboard a mobile robot. The method is insensitive to physical changes within the room it is inspecting, such as moving objects. Forward and rotational motion vision are used to find doors and rooms, and these can be used to build topological maps. The map is built without the use of odometry or trajectory integration. The long term goal of the project described here is for the robot to build simple maps of its environment and to localize itself within this framework.