31 resultados para Enunciation scene
em Cambridge University Engineering Department Publications Database
Resumo:
Holistic representations of natural scenes is an effective and powerful source of information for semantic classification and analysis of arbitrary images. Recently, the frequency domain has been successfully exploited to holistically encode the content of natural scenes in order to obtain a robust representation for scene classification. In this paper, we present a new approach to naturalness classification of scenes using frequency domain. The proposed method is based on the ordering of the Discrete Fourier Power Spectra. Features extracted from this ordering are shown sufficient to build a robust holistic representation for Natural vs. Artificial scene classification. Experiments show that the proposed frequency domain method matches the accuracy of other state-of-the-art solutions. © 2008 Springer Berlin Heidelberg.
Resumo:
Automating the model generation process of infrastructure can substantially reduce the modeling time and cost. This paper presents a method to generate a sparse point cloud of an infrastructure scene using a single video camera under practical constraints. It is the first step towards establishing an automatic framework for object-oriented as-built modeling. Motion blur and key frame selection criteria are considered. Structure from motion and bundle adjustment are explored. The method is demonstrated in a case study where the scene of a reinforced concrete bridge is videotaped, reconstructed, and metrically validated. The result indicates the applicability, efficiency, and accuracy of the proposed method.
Resumo:
We demonstrate a new method for extracting high-level scene information from the type of data available from simultaneous localisation and mapping systems. We model the scene with a collection of primitives (such as bounded planes), and make explicit use of both visible and occluded points in order to refine the model. Since our formulation allows for different kinds of primitives and an arbitrary number of each, we use Bayesian model evidence to compare very different models on an even footing. Additionally, by making use of Bayesian techniques we can also avoid explicitly finding the optimal assignment of map landmarks to primitives. The results show that explicit reasoning about occlusion improves model accuracy and yields models which are suitable for aiding data association. © 2011. The copyright of this document resides with its authors.
Resumo:
We present a novel, implementation friendly and occlusion aware semi-supervised video segmentation algorithm using tree structured graphical models, which delivers pixel labels alongwith their uncertainty estimates. Our motivation to employ supervision is to tackle a task-specific segmentation problem where the semantic objects are pre-defined by the user. The video model we propose for this problem is based on a tree structured approximation of a patch based undirected mixture model, which includes a novel time-series and a soft label Random Forest classifier participating in a feedback mechanism. We demonstrate the efficacy of our model in cutting out foreground objects and multi-class segmentation problems in lengthy and complex road scene sequences. Our results have wide applicability, including harvesting labelled video data for training discriminative models, shape/pose/articulation learning and large scale statistical analysis to develop priors for video segmentation. © 2011 IEEE.
Resumo:
Stereoscopic displays present different images to the two eyes and thereby create a compelling three-dimensional (3D) sensation. They are being developed for numerous applications including cinema, television, virtual prototyping, and medical imaging. However, stereoscopic displays cause perceptual distortions, performance decrements, and visual fatigue. These problems occur because some of the presented depth cues (i.e., perspective and binocular disparity) specify the intended 3D scene while focus cues (blur and accommodation) specify the fixed distance of the display itself. We have developed a stereoscopic display that circumvents these problems. It consists of a fast switchable lens synchronized to the display such that focus cues are nearly correct. The system has great potential for both basic vision research and display applications. © 2009 Optical Society of America.
Resumo:
Stereoscopic displays present different images to the two eyes and thereby create a compelling three-dimensional (3D) sensation. They are being developed for numerous applications including cinema, television, virtual prototyping, and medical imaging. However, stereoscopic displays cause perceptual distortions, performance decrements, and visual fatigue. These problems occur because some of the presented depth cues (i.e., perspective and binocular disparity) specify the intended 3D scene while focus cues (blur and accommodation) specify the fixed distance of the display itself. We have developed a stereoscopic display that circumvents these problems. It consists of a fast switchable lens synchronized to the display such that focus cues are nearly correct. The system has great potential for both basic vision research and display applications.
Resumo:
The use of mixture-model techniques for motion estimation and image sequence segmentation was discussed. The issues such as modeling of occlusion and uncovering, determining the relative depth of the objects in a scene, and estimating the number of objects in a scene were also investigated. The segmentation algorithm was found to be computationally demanding, but the computational requirements were reduced as the motion parameters and segmentation of the frame were initialized. The method provided a stable description, in whichthe addition and removal of objects from the description corresponded to the entry and exit of objects from the scene.
Resumo:
We present a multispectral photometric stereo method for capturing geometry of deforming surfaces. A novel photometric calibration technique allows calibration of scenes containing multiple piecewise constant chromaticities. This method estimates per-pixel photometric properties, then uses a RANSAC-based approach to estimate the dominant chromaticities in the scene. A likelihood term is developed linking surface normal, image intensity and photometric properties, which allows estimating the number of chromaticities present in a scene to be framed as a model estimation problem. The Bayesian Information Criterion is applied to automatically estimate the number of chromaticities present during calibration. A two-camera stereo system provides low resolution geometry, allowing the likelihood term to be used in segmenting new images into regions of constant chromaticity. This segmentation is carried out in a Markov Random Field framework and allows the correct photometric properties to be used at each pixel to estimate a dense normal map. Results are shown on several challenging real-world sequences, demonstrating state-of-the-art results using only two cameras and three light sources. Quantitative evaluation is provided against synthetic ground truth data. © 2011 IEEE.
Resumo:
A number of methods are commonly used today to collect infrastructure's spatial data (time-of-flight, visual triangulation, etc.). However, current practice lacks a solution that is accurate, automatic, and cost-efficient at the same time. This paper presents a videogrammetric framework for acquiring spatial data of infrastructure which holds the promise to address this limitation. It uses a calibrated set of low-cost high resolution video cameras that is progressively traversed around the scene and aims to produce a dense 3D point cloud which is updated in each frame. It allows for progressive reconstruction as opposed to point-and-shoot followed by point cloud stitching. The feasibility of the framework is studied in this paper. Required steps through this process are presented and the unique challenges of each step are identified. Results specific to each step are also presented.