985 resultados para Object vision


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis addresses a series of topics related to the question of how people find the foreground objects from complex scenes. With both computer vision modeling, as well as psychophysical analyses, we explore the computational principles for low- and mid-level vision.

We first explore the computational methods of generating saliency maps from images and image sequences. We propose an extremely fast algorithm called Image Signature that detects the locations in the image that attract human eye gazes. With a series of experimental validations based on human behavioral data collected from various psychophysical experiments, we conclude that the Image Signature and its spatial-temporal extension, the Phase Discrepancy, are among the most accurate algorithms for saliency detection under various conditions.

In the second part, we bridge the gap between fixation prediction and salient object segmentation with two efforts. First, we propose a new dataset that contains both fixation and object segmentation information. By simultaneously presenting the two types of human data in the same dataset, we are able to analyze their intrinsic connection, as well as understanding the drawbacks of today’s “standard” but inappropriately labeled salient object segmentation dataset. Second, we also propose an algorithm of salient object segmentation. Based on our novel discoveries on the connections of fixation data and salient object segmentation data, our model significantly outperforms all existing models on all 3 datasets with large margins.

In the third part of the thesis, we discuss topics around the human factors of boundary analysis. Closely related to salient object segmentation, boundary analysis focuses on delimiting the local contours of an object. We identify the potential pitfalls of algorithm evaluation for the problem of boundary detection. Our analysis indicates that today’s popular boundary detection datasets contain significant level of noise, which may severely influence the benchmarking results. To give further insights on the labeling process, we propose a model to characterize the principles of the human factors during the labeling process.

The analyses reported in this thesis offer new perspectives to a series of interrelating issues in low- and mid-level vision. It gives warning signs to some of today’s “standard” procedures, while proposing new directions to encourage future research.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a new online multi-classifier boosting algorithm for learning object appearance models. In many cases the appearance model is multi-modal, which we capture by training and updating multiple strong classifiers. The proposed algorithm jointly learns the classifiers and a soft partitioning of the input space, defining an area of expertise for each classifier. We show how this formulation improves the specificity of the strong classifiers, allowing simultaneous location and pose estimation in a tracking task. The proposed online scheme iteratively adapts the classifiers during tracking. Experiments show that the algorithm successfully learns multi-modal appearance models during a short initial training phase, subsequently updating them for tracking an object under rapid appearance changes. © 2010 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper tackles the novel challenging problem of 3D object phenotype recognition from a single 2D silhouette. To bridge the large pose (articulation or deformation) and camera viewpoint changes between the gallery images and query image, we propose a novel probabilistic inference algorithm based on 3D shape priors. Our approach combines both generative and discriminative learning. We use latent probabilistic generative models to capture 3D shape and pose variations from a set of 3D mesh models. Based on these 3D shape priors, we generate a large number of projections for different phenotype classes, poses, and camera viewpoints, and implement Random Forests to efficiently solve the shape and pose inference problems. By model selection in terms of the silhouette coherency between the query and the projections of 3D shapes synthesized using the galleries, we achieve the phenotype recognition result as well as a fast approximate 3D reconstruction of the query. To verify the efficacy of the proposed approach, we present new datasets which contain over 500 images of various human and shark phenotypes and motions. The experimental results clearly show the benefits of using the 3D priors in the proposed method over previous 2D-based methods. © 2011 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Tracking applications provide real time on-site information that can be used to detect travel path conflicts, calculate crew productivity and eliminate unnecessary processes at the site. This paper presents the validation of a novel vision based tracking methodology at the Egnatia Odos Motorway in Thessaloniki, Greece. Egnatia Odos is a motorway that connects Turkey with Italy through Greece. Its multiple open construction sites serves as an ideal multi-site test bed for validating construction site tracking methods. The vision based tracking methodology uses video cameras and computer algorithms to calculate the 3D position of project related entities (e.g. personnel, materials and equipment) in construction sites. The approach provides an unobtrusive, inexpensive way of effectively identifying and tracking the 3D location of entities. The process followed in this study starts by acquiring video data from multiple synchronous cameras at several large scale project sites of Egnatia Odos, such as tunnels, interchanges and bridges under construction. Subsequent steps include the evaluation of the collected data and finally, performing the 3D tracking operations on selected entities (heavy equipment and personnel). The accuracy and precision of the method's results is evaluated by comparing it with the actual 3D position of the object, thus assessing the 3D tracking method's effectiveness.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper addresses the basic problem of recovering the 3D surface of an object that is observed in motion by a single camera and under a static but unknown lighting condition. We propose a method to establish pixelwise correspondence between input images by way of depth search by investigating optimal subsets of intensities rather than employing all the relevant pixel values. The thrust of our algorithm is that it is capable of dealing with specularities which appear on the top of shading variance that is caused due to object motion. This is in terms of both stages of finding sparse point correspondence and dense depth search. We also propose that a linearised image basis can be directly computed by the procudure of finding the correspondence. We illustrate the performance of the theoretical propositions using images of real objects. © 2009. The copyright of this document resides with its authors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We report a series of psychophysical experiments that explore different aspects of the problem of object representation and recognition in human vision. Contrary to the paradigmatic view which holds that the representations are three-dimensional and object-centered, the results consistently support the notion of view-specific representations that include at most partial depth information. In simulated experiments that involved the same stimuli shown to the human subjects, computational models built around two-dimensional multiple-view representations replicated our main psychophysical results, including patterns of generalization errors and the time course of perceptual learning.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis presents a learning based approach for detecting classes of objects and patterns with variable image appearance but highly predictable image boundaries. It consists of two parts. In part one, we introduce our object and pattern detection approach using a concrete human face detection example. The approach first builds a distribution-based model of the target pattern class in an appropriate feature space to describe the target's variable image appearance. It then learns from examples a similarity measure for matching new patterns against the distribution-based target model. The approach makes few assumptions about the target pattern class and should therefore be fairly general, as long as the target class has predictable image boundaries. Because our object and pattern detection approach is very much learning-based, how well a system eventually performs depends heavily on the quality of training examples it receives. The second part of this thesis looks at how one can select high quality examples for function approximation learning tasks. We propose an {em active learning} formulation for function approximation, and show for three specific approximation function classes, that the active example selection strategy learns its target with fewer data samples than random sampling. We then simplify the original active learning formulation, and show how it leads to a tractable example selection paradigm, suitable for use in many object and pattern detection problems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research project is a study of the role of fixation and visual attention in object recognition. In this project, we build an active vision system which can recognize a target object in a cluttered scene efficiently and reliably. Our system integrates visual cues like color and stereo to perform figure/ground separation, yielding candidate regions on which to focus attention. Within each image region, we use stereo to extract features that lie within a narrow disparity range about the fixation position. These selected features are then used as input to an alignment-style recognition system. We show that visual attention and fixation significantly reduce the complexity and the false identifications in model-based recognition using Alignment methods. We also demonstrate that stereo can be used effectively as a figure/ground separator without the need for accurate camera calibration.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We address mid-level vision for the recognition of non-rigid objects. We align model and image using frame curves - which are object or "figure/ground" skeletons. Frame curves are computed, without discontinuities, using Curved Inertia Frames, a provably global scheme implemented on the Connection Machine, based on: non-cartisean networks; a definition of curved axis of inertia; and a ridge detector. I present evidence against frame alignment in human perception. This suggests: frame curves have a role in figure/ground segregation and in fuzzy boundaries; their outside/near/top/ incoming regions are more salient; and that perception begins by setting a reference frame (prior to early vision), and proceeds by processing convex structures.