991 resultados para 3D scene understanding


Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper defines the 3D reconstruction problem as the process of reconstructing a 3D scene from numerous 2D visual images of that scene. It is well known that this problem is ill-posed, and numerous constraints and assumptions are used in 3D reconstruction algorithms in order to reduce the solution space. Unfortunately, most constraints only work in a certain range of situations and often constraints are built into the most fundamental methods (e.g. Area Based Matching assumes that all the pixels in the window belong to the same object). This paper presents a novel formulation of the 3D reconstruction problem, using a voxel framework and first order logic equations, which does not contain any additional constraints or assumptions. Solving this formulation for a set of input images gives all the possible solutions for that set, rather than picking a solution that is deemed most likely. Using this formulation, this paper studies the problem of uniqueness in 3D reconstruction and how the solution space changes for different configurations of input images. It is found that it is not possible to guarantee a unique solution, no matter how many images are taken of the scene, their orientation or even how much color variation is in the scene itself. Results of using the formulation to reconstruct a few small voxel spaces are also presented. They show that the number of solutions is extremely large for even very small voxel spaces (5 x 5 voxel space gives 10 to 10(7) solutions). This shows the need for constraints to reduce the solution space to a reasonable size. Finally, it is noted that because of the discrete nature of the formulation, the solution space size can be easily calculated, making the formulation a useful tool to numerically evaluate the usefulness of any constraints that are added.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Photometric Stereo is a powerful image based 3D reconstruction technique that has recently been used to obtain very high quality reconstructions. However, in its classic form, Photometric Stereo suffers from two main limitations: Firstly, one needs to obtain images of the 3D scene under multiple different illuminations. As a result the 3D scene needs to remain static during illumination changes, which prohibits the reconstruction of deforming objects. Secondly, the images obtained must be from a single viewpoint. This leads to depth-map based 2.5 reconstructions, instead of full 3D surfaces. The aim of this Chapter is to show how these limitations can be alleviated, leading to the derivation of two practical 3D acquisition systems: The first one, based on the powerful Coloured Light Photometric Stereo method can be used to reconstruct moving objects such as cloth or human faces. The second, permits the complete 3D reconstruction of challenging objects such as porcelain vases. In addition to algorithmic details, the Chapter pays attention to practical issues such as setup calibration, detection and correction of self and cast shadows. We provide several evaluation experiments as well as reconstruction results. © 2010 Springer-Verlag Berlin Heidelberg.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we present a tutorial introduction to two important senses for biological and robotic systems — inertial and visual perception. We discuss the fundamentals of these two sensing modalities from a biological and an engineering perspective. Digital camera chips and micro-machined accelerometers and gyroscopes are now commodities, and when combined with today's available computing can provide robust estimates of self-motion as well 3D scene structure, without external infrastructure. We discuss the complementarity of these sensors, describe some fundamental approaches to fusing their outputs and survey the field.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Stereo-based visual odometry algorithms are heavily dependent on an accurate calibration of the rigidly fixed stereo pair. Even small shifts in the rigid transform between the cameras can impact on feature matching and 3D scene triangulation, adversely affecting pose estimates and applications dependent on long-term autonomy. In many field-based scenarios where vibration, knocks and pressure change affect a robotic vehicle, maintaining an accurate stereo calibration cannot be guaranteed over long periods. This paper presents a novel method of recalibrating overlapping stereo camera rigs from online visual data while simultaneously providing an up-to-date and up-to-scale pose estimate. The proposed technique implements a novel form of partitioned bundle adjustment that explicitly includes the homogeneous transform between a stereo camera pair to generate an optimal calibration. Pose estimates are computed in parallel to the calibration, providing online recalibration which seamlessly integrates into a stereo visual odometry framework. We present results demonstrating accurate performance of the algorithm on both simulated scenarios and real data gathered from a wide-baseline stereo pair on a ground vehicle traversing urban roads.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Semantic perception and object labeling are key requirements for robots interacting with objects on a higher level. Symbolic annotation of objects allows the usage of planning algorithms for object interaction, for instance in a typical fetchand-carry scenario. In current research, perception is usually based on 3D scene reconstruction and geometric model matching, where trained features are matched with a 3D sample point cloud. In this work we propose a semantic perception method which is based on spatio-semantic features. These features are defined in a natural, symbolic way, such as geometry and spatial relation. In contrast to point-based model matching methods, a spatial ontology is used where objects are rather described how they "look like", similar to how a human would described unknown objects to another person. A fuzzy based reasoning approach matches perceivable features with a spatial ontology of the objects. The approach provides a method which is able to deal with senor noise and occlusions. Another advantage is that no training phase is needed in order to learn object features. The use-case of the proposed method is the detection of soil sample containers in an outdoor environment which have to be collected by a mobile robot. The approach is verified using real world experiments.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The latest generation of Deep Convolutional Neural Networks (DCNN) have dramatically advanced challenging computer vision tasks, especially in object detection and object classification, achieving state-of-the-art performance in several computer vision tasks including text recognition, sign recognition, face recognition and scene understanding. The depth of these supervised networks has enabled learning deeper and hierarchical representation of features. In parallel, unsupervised deep learning such as Convolutional Deep Belief Network (CDBN) has also achieved state-of-the-art in many computer vision tasks. However, there is very limited research on jointly exploiting the strength of these two approaches. In this paper, we investigate the learning capability of both methods. We compare the output of individual layers and show that many learnt filters and outputs of the corresponding level layer are almost similar for both approaches. Stacking the DCNN on top of unsupervised layers or replacing layers in the DCNN with the corresponding learnt layers in the CDBN can improve the recognition/classification accuracy and training computational expense. We demonstrate the validity of the proposal on ImageNet dataset.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Stereoscopic displays present different images to the two eyes and thereby create a compelling three-dimensional (3D) sensation. They are being developed for numerous applications including cinema, television, virtual prototyping, and medical imaging. However, stereoscopic displays cause perceptual distortions, performance decrements, and visual fatigue. These problems occur because some of the presented depth cues (i.e., perspective and binocular disparity) specify the intended 3D scene while focus cues (blur and accommodation) specify the fixed distance of the display itself. We have developed a stereoscopic display that circumvents these problems. It consists of a fast switchable lens synchronized to the display such that focus cues are nearly correct. The system has great potential for both basic vision research and display applications. © 2009 Optical Society of America.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Stereoscopic displays present different images to the two eyes and thereby create a compelling three-dimensional (3D) sensation. They are being developed for numerous applications including cinema, television, virtual prototyping, and medical imaging. However, stereoscopic displays cause perceptual distortions, performance decrements, and visual fatigue. These problems occur because some of the presented depth cues (i.e., perspective and binocular disparity) specify the intended 3D scene while focus cues (blur and accommodation) specify the fixed distance of the display itself. We have developed a stereoscopic display that circumvents these problems. It consists of a fast switchable lens synchronized to the display such that focus cues are nearly correct. The system has great potential for both basic vision research and display applications.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Videogrammetry is an inexpensive and easy-to-use technology for spatial 3D scene recovery. When applied to large scale civil infrastructure scenes, only a small percentage of the collected video frames are required to achieve robust results. However, choosing the right frames requires careful consideration. Videotaping a built infrastructure scene results in large video files filled with blurry, noisy, or redundant frames. This is due to frame rate to camera speed ratios that are often higher than necessary; camera and lens imperfections and limitations that result in imaging noise; and occasional jerky motions of the camera that result in motion blur; all of which can significantly affect the performance of the videogrammetric pipeline. To tackle these issues, this paper proposes a novel method for automating the selection of an optimized number of informative, high quality frames. According to this method, as the first step, blurred frames are removed using the thresholds determined based on a minimum level of frame quality required to obtain robust results. Then, an optimum number of key frames are selected from the remaining frames using the selection criteria devised by the authors. Experimental results show that the proposed method outperforms existing methods in terms of improved 3D reconstruction results, while maintaining the optimum number of extracted frames needed to generate high quality 3D point clouds.© 2012 Elsevier Ltd. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Simuler efficacement l'éclairage global est l'un des problèmes ouverts les plus importants en infographie. Calculer avec précision les effets de l'éclairage indirect, causés par des rebonds secondaires de la lumière sur des surfaces d'une scène 3D, est généralement un processus coûteux et souvent résolu en utilisant des algorithmes tels que le path tracing ou photon mapping. Ces techniquesrésolvent numériquement l'équation du rendu en utilisant un lancer de rayons Monte Carlo. Ward et al. ont proposé une technique nommée irradiance caching afin d'accélérer les techniques précédentes lors du calcul de la composante indirecte de l'éclairage global sur les surfaces diffuses. Krivanek a étendu l'approche de Ward et Heckbert pour traiter le cas plus complexe des surfaces spéculaires, en introduisant une approche nommée radiance caching. Jarosz et al. et Schwarzhaupt et al. ont proposé un modèle utilisant le hessien et l'information de visibilité pour raffiner le positionnement des points de la cache dans la scène, raffiner de manière significative la qualité et la performance des approches précédentes. Dans ce mémoire, nous avons étendu les approches introduites dans les travaux précédents au problème du radiance caching pour améliorer le positionnement des éléments de la cache. Nous avons aussi découvert un problème important négligé dans les travaux précédents en raison du choix des scènes de test. Nous avons fait une étude préliminaire sur ce problème et nous avons trouvé deux solutions potentielles qui méritent une recherche plus approfondie.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Under the framework of the European Union Funded SAFEE project(1), this paper gives an overview of a novel monitoring and scene analysis system developed for use onboard aircraft in spatially constrained environments. The techniques discussed herein aim to warn on-board crew about pre-determined indicators of threat intent (such as running or shouting in the cabin), as elicited from industry and security experts. The subject matter experts believe that activities such as these are strong indicators of the beginnings of undesirable chains of events or scenarios, which should not be allowed to develop aboard aircraft. This project aimes to detect these scenarios and provide advice to the crew. These events may involve unruly passengers or be indicative of the precursors to terrorist threats. With a state of the art tracking system using homography intersections of motion images, and probability based Petri nets for scene understanding, the SAFEE behavioural analysis system automatically assesses the output from multiple intelligent sensors, and creates. recommendations that are presented to the crew using an integrated airborn user interface. Evaluation of the system is conducted within a full size aircraft mockup, and experimental results are presented, showing that the SAFEE system is well suited to monitoring people in confined environments, and that meaningful and instructive output regarding human actions can be derived from the sensor network within the cabin.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A desktop tool for replay and analysis of gaze-enhanced multiparty virtual collaborative sessions is described. We linked three CAVE (TM)-like environments, creating a multiparty collaborative virtual space where avatars are animated with 3D gaze as well as head and hand motions in real time. Log files are recorded for subsequent playback and analysis Using the proposed software tool. During replaying the user can rotate the viewpoint and navigate in the simulated 3D scene. The playback mechanism relies on multiple distributed log files captured at every site. This structure enables an observer to experience latencies of movement and information transfer for every site as this is important fir conversation analysis. Playback uses an event-replay algorithm, modified to allow fast traversal of the scene by selective rendering of nodes, and to simulate fast random access. The tool's is analysis module can show each participant's 3D gaze points and areas where gaze has been concentrated.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper addresses the problem of determining which 3D shape is present, and more importantly, the dimensions of the shape in a scene. This is performed in an active vision system because it reduces the complexity of the problem through the use of gaze stabilization, choice of foveation point, and selective processing by adaptively processing regions of interest. In our case, only a small number of equations and parameters are needed for each shape and these are incorporated into functional descriptions of the shapes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper addresses the problem of determening which 3D shape is present, and more importantly, the dimensions of the shape within a scene. This is performed in an active vision system because it reduces the complexity of the problem through the use of gaze stabilisation, choice of foveation point and selective processing by adaptively processing regions of interest. In our case only a small number of equations and parameters are needed for each shape. For example, a container has width and height. These are incorporated into functional descriptions of the shapes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We propose a novel framework for large-scale scene understanding in static camera surveillance. Our techniques combine fast rank-1 constrained robust PCA to compute the foreground, with non-parametric Bayesian models for inference. Clusters are extracted in foreground patterns using a joint multinomial+Gaussian Dirichlet process model (DPM). Since the multinomial distribution is normalized, the Gaussian mixture distinguishes between similar spatial patterns but different activity levels (eg. car vs bike). We propose a modification of the decayed MCMC technique for incremental inference, providing the ability to discover theoretically unlimited patterns in unbounded video streams. A promising by-product of our framework is online, abnormal activity detection. A benchmark video and two surveillance videos, with the longest being 140 hours long are used in our experiments. The patterns discovered are as informative as existing scene understanding algorithms. However, unlike existing work, we achieve near real-time execution and encouraging performance in abnormal activity detection.