3 resultados para Scene understanding
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
This thesis investigates interactive scene reconstruction and understanding using RGB-D data only. Indeed, we believe that depth cameras will still be in the near future a cheap and low-power 3D sensing alternative suitable for mobile devices too. Therefore, our contributions build on top of state-of-the-art approaches to achieve advances in three main challenging scenarios, namely mobile mapping, large scale surface reconstruction and semantic modeling. First, we will describe an effective approach dealing with Simultaneous Localization And Mapping (SLAM) on platforms with limited resources, such as a tablet device. Unlike previous methods, dense reconstruction is achieved by reprojection of RGB-D frames, while local consistency is maintained by deploying relative bundle adjustment principles. We will show quantitative results comparing our technique to the state-of-the-art as well as detailed reconstruction of various environments ranging from rooms to small apartments. Then, we will address large scale surface modeling from depth maps exploiting parallel GPU computing. We will develop a real-time camera tracking method based on the popular KinectFusion system and an online surface alignment technique capable of counteracting drift errors and closing small loops. We will show very high quality meshes outperforming existing methods on publicly available datasets as well as on data recorded with our RGB-D camera even in complete darkness. Finally, we will move to our Semantic Bundle Adjustment framework to effectively combine object detection and SLAM in a unified system. Though the mathematical framework we will describe does not restrict to a particular sensing technology, in the experimental section we will refer, again, only to RGB-D sensing. We will discuss successful implementations of our algorithm showing the benefit of a joint object detection, camera tracking and environment mapping.
Resumo:
The first mechanical Automaton concept was found in a Chinese text written in the 3rd century BC, while Computer Vision was born in the late 1960s. Therefore, visual perception applied to machines (i.e. the Machine Vision) is a young and exciting alliance. When robots came in, the new field of Robotic Vision was born, and these terms began to be erroneously interchanged. In short, we can say that Machine Vision is an engineering domain, which concern the industrial use of Vision. The Robotic Vision, instead, is a research field that tries to incorporate robotics aspects in computer vision algorithms. Visual Servoing, for example, is one of the problems that cannot be solved by computer vision only. Accordingly, a large part of this work deals with boosting popular Computer Vision techniques by exploiting robotics: e.g. the use of kinematics to localize a vision sensor, mounted as the robot end-effector. The remainder of this work is dedicated to the counterparty, i.e. the use of computer vision to solve real robotic problems like grasping objects or navigate avoiding obstacles. Will be presented a brief survey about mapping data structures most widely used in robotics along with SkiMap, a novel sparse data structure created both for robotic mapping and as a general purpose 3D spatial index. Thus, several approaches to implement Object Detection and Manipulation, by exploiting the aforementioned mapping strategies, will be proposed, along with a completely new Machine Teaching facility in order to simply the training procedure of modern Deep Learning networks.
Resumo:
Sketches are a unique way to communicate: drawing a simple sketch does not require any training, sketches convey information that is hard to describe with words, they are powerful enough to represent almost any concept, and nowadays, it is possible to draw directly from mobile devices. Motivated from the unique characteristics of sketches and fascinated by the human ability to imagine 3D objects from drawings, this thesis focuses on automatically associating geometric information to sketches. The main research directions of the thesis can be summarized as obtaining geometric information from freehand scene sketches to improve 2D sketch-based tasks and investigating Vision-Language models to overcome 3D sketch-based tasks limitations. The first part of the thesis concerns geometric information prediction from scene sketches improving scene sketch to image generation and unlocking new creativity effects. The thesis proceeds showing a study conducted on the Vision-Language models embedding space considering sketches, line renderings and RGB renderings of 3D shape to overcome the use of supervised datasets for 3D sketch-based tasks, that are limited and hard to acquire. Following the obtained observations and results, Vision-Language models are applied to Sketch Based Shape Retrieval without the need of training on supervised datasets. We then analyze the use of Vision-Language models for sketch based 3D reconstruction in an unsupervised manner. In the final chapter we report the results obtained in an additional project carried during the PhD, which has lead to the development of a framework to learn an embedding space of neural networks that can be navigated to get ready-to-use models with desired characteristics.