6 resultados para Visual image

em Massachusetts Institute of Technology


Relevância:

40.00% 40.00%

Publicador:

Resumo:

We present a statistical image-based shape + structure model for Bayesian visual hull reconstruction and 3D structure inference. The 3D shape of a class of objects is represented by sets of contours from silhouette views simultaneously observed from multiple calibrated cameras. Bayesian reconstructions of new shapes are then estimated using a prior density constructed with a mixture model and probabilistic principal components analysis. We show how the use of a class-specific prior in a visual hull reconstruction can reduce the effect of segmentation errors from the silhouette extraction process. The proposed method is applied to a data set of pedestrian images, and improvements in the approximate 3D models under various noise conditions are shown. We further augment the shape model to incorporate structural features of interest; unknown structural parameters for a novel set of contours are then inferred via the Bayesian reconstruction process. Model matching and parameter inference are done entirely in the image domain and require no explicit 3D construction. Our shape model enables accurate estimation of structure despite segmentation errors or missing views in the input silhouettes, and works even with only a single input view. Using a data set of thousands of pedestrian images generated from a synthetic model, we can accurately infer the 3D locations of 19 joints on the body based on observed silhouette contours from real images.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The report describes a recognition system called GROPER, which performs grouping by using distance and relative orientation constraints that estimate the likelihood of different edges in an image coming from the same object. The thesis presents both a theoretical analysis of the grouping problem and a practical implementation of a grouping system. GROPER also uses an indexing module to allow it to make use of knowledge of different objects, any of which might appear in an image. We test GROPER by comparing it to a similar recognition system that does not use grouping.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Stimuli outside classical receptive fields have been shown to exert significant influence over the activities of neurons in primary visual cortexWe propose that contextual influences are used for pre-attentive visual segmentation, in a new framework called segmentation without classification. This means that segmentation of an image into regions occurs without classification of features within a region or comparison of features between regions. This segmentation framework is simpler than previous computational approaches, making it implementable by V1 mechanisms, though higher leve l visual mechanisms are needed to refine its output. However, it easily handles a class of segmentation problems that are tricky in conventional methods. The cortex computes global region boundaries by detecting the breakdown of homogeneity or translation invariance in the input, using local intra-cortical interactions mediated by the horizontal connections. The difference between contextual influences near and far from region boundaries makes neural activities near region boundaries higher than elsewhere, making boundaries more salient for perceptual pop-out. This proposal is implemented in a biologically based model of V1, and demonstrated using examples of texture segmentation and figure-ground segregation. The model performs segmentation in exactly the same neural circuit that solves the dual problem of the enhancement of contours, as is suggested by experimental observations. Its behavior is compared with psychophysical and physiological data on segmentation, contour enhancement, and contextual influences. We discuss the implications of segmentation without classification and the predictions of our V1 model, and relate it to other phenomena such as asymmetry in visual search.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The HMAX model has recently been proposed by Riesenhuber & Poggio as a hierarchical model of position- and size-invariant object recognition in visual cortex. It has also turned out to model successfully a number of other properties of the ventral visual stream (the visual pathway thought to be crucial for object recognition in cortex), and particularly of (view-tuned) neurons in macaque inferotemporal cortex, the brain area at the top of the ventral stream. The original modeling study only used ``paperclip'' stimuli, as in the corresponding physiology experiment, and did not explore systematically how model units' invariance properties depended on model parameters. In this study, we aimed at a deeper understanding of the inner workings of HMAX and its performance for various parameter settings and ``natural'' stimulus classes. We examined HMAX responses for different stimulus sizes and positions systematically and found a dependence of model units' responses on stimulus position for which a quantitative description is offered. Interestingly, we find that scale invariance properties of hierarchical neural models are not independent of stimulus class, as opposed to translation invariance, even though both are affine transformations within the image plane.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The image comparison operation ??sessing how well one image matches another ??rms a critical component of many image analysis systems and models of human visual processing. Two norms used commonly for this purpose are L1 and L2, which are specific instances of the Minkowski metric. However, there is often not a principled reason for selecting one norm over the other. One way to address this problem is by examining whether one metric better captures the perceptual notion of image similarity than the other. With this goal, we examined perceptual preferences for images retrieved on the basis of the L1 versus the L2 norm. These images were either small fragments without recognizable content, or larger patterns with recognizable content created via vector quantization. In both conditions the subjects showed a consistent preference for images matched using the L1 metric. These results suggest that, in the domain of natural images of the kind we have used, the L1 metric may better capture human notions of image similarity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A fundamental question in visual neuroscience is how to represent image structure. The most common representational schemes rely on differential operators that compare adjacent image regions. While well-suited to encoding local relationships, such operators have significant drawbacks. Specifically, each filter's span is confounded with the size of its sub-fields, making it difficult to compare small regions across large distances. We find that such long-distance comparisons are more tolerant to common image transformations than purely local ones, suggesting they may provide a useful vocabulary for image encoding. . We introduce the "Dissociated Dipole," or "Sticks" operator, for encoding non-local image relationships. This operator de-couples filter span from sub-field size, enabling parametric movement between edge and region-based representation modes. We report on the perceptual plausibility of the operator, and the computational advantages of non-local encoding. Our results suggest that non-local encoding may be an effective scheme for representing image structure.