981 resultados para 3D scene understanding


Relevância:

80.00% 80.00%

Publicador:

Resumo:

A novel scheme for depth sequences compression, based on a perceptual coding algorithm, is proposed. A depth sequence describes the object position in the 3D scene, and is used, in Free Viewpoint Video, for the generation of synthetic video sequences. In perceptual video coding the human visual system characteristics are exploited to improve the compression efficiency. As depth sequences are never shown, the perceptual video coding, assessed over them, is not effective. The proposed algorithm is based on a novel perceptual rate distortion optimization process, assessed over the perceptual distortion of the rendered views generated through the encoded depth sequences. The experimental results show the effectiveness of the proposed method, able to obtain a very considerable improvement of the rendered view perceptual quality.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The importance of vision-based systems for Sense-and-Avoid is increasing nowadays as remotely piloted and autonomous UAVs become part of the non-segregated airspace. The development and evaluation of these systems demand flight scenario images which are expensive and risky to obtain. Currently Augmented Reality techniques allow the compositing of real flight scenario images with 3D aircraft models to produce useful realistic images for system development and benchmarking purposes at a much lower cost and risk. With the techniques presented in this paper, 3D aircraft models are positioned firstly in a simulated 3D scene with controlled illumination and rendering parameters. Realistic simulated images are then obtained using an image processing algorithm which fuses the images obtained from the 3D scene with images from real UAV flights taking into account on board camera vibrations. Since the intruder and camera poses are user-defined, ground truth data is available. These ground truth annotations allow to develop and quantitatively evaluate aircraft detection and tracking algorithms. This paper presents the software developed to create a public dataset of 24 videos together with their annotations and some tracking application results.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Situational awareness is achieved naturally by the human senses of sight and hearing in combination. Automatic scene understanding aims at replicating this human ability using microphones and cameras in cooperation. In this paper, audio and video signals are fused and integrated at different levels of semantic abstractions. We detect and track a speaker who is relatively unconstrained, i.e., free to move indoors within an area larger than the comparable reported work, which is usually limited to round table meetings. The system is relatively simple: consisting of just 4 microphone pairs and a single camera. Results show that the overall multimodal tracker is more reliable than single modality systems, tolerating large occlusions and cross-talk. System evaluation is performed on both single and multi-modality tracking. The performance improvement given by the audio–video integration and fusion is quantified in terms of tracking precision and accuracy as well as speaker diarisation error rate and precision–recall (recognition). Improvements vs. the closest works are evaluated: 56% sound source localisation computational cost over an audio only system, 8% speaker diarisation error rate over an audio only speaker recognition unit and 36% on the precision–recall metric over an audio–video dominant speaker recognition method.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we present a convolutional neuralnetwork (CNN)-based model for human head pose estimation inlow-resolution multi-modal RGB-D data. We pose the problemas one of classification of human gazing direction. We furtherfine-tune a regressor based on the learned deep classifier. Next wecombine the two models (classification and regression) to estimateapproximate regression confidence. We present state-of-the-artresults in datasets that span the range of high-resolution humanrobot interaction (close up faces plus depth information) data tochallenging low resolution outdoor surveillance data. We buildupon our robust head-pose estimation and further introduce anew visual attention model to recover interaction with theenvironment. Using this probabilistic model, we show thatmany higher level scene understanding like human-human/sceneinteraction detection can be achieved. Our solution runs inreal-time on commercial hardware

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Safe collaboration between a robot and human operator forms a critical requirement for deploying a robotic system into a manufacturing and testing environment. In this dissertation, the safety requirement for is developed and implemented for the navigation system of the mobile manipulators. A methodology for human-robot co-existence through a 3d scene analysis is also investigated. The proposed approach exploits the advance in computing capability by relying on graphic processing units (GPU’s) for volumetric predictive human-robot contact checking. Apart from guaranteeing safety of operators, human-robot collaboration is also fundamental when cooperative activities are required, as in appliance test automation floor. To achieve this, a generalized hierarchical task controller scheme for collision avoidance is developed. This allows the robotic arm to safely approach and inspect the interior of the appliance without collision during the testing procedure. The unpredictable presence of the operators also forms dynamic obstacle that changes very fast, thereby requiring a quick reaction from the robot side. In this aspect, a GPU-accelarated distance field is computed to speed up reaction time to avoid collision between human operator and the robot. An automated appliance testing also involves robotized laundry loading and unloading during life cycle testing. This task involves Laundry detection, grasp pose estimation and manipulation in a container, inside the drum and during recovery grasping. A wrinkle and blob detection algorithms for grasp pose estimation are developed and grasp poses are calculated along the wrinkle and blobs to efficiently perform grasping task. By ranking the estimated laundry grasp poses according to a predefined cost function, the robotic arm attempt to grasp poses that are more comfortable from the robot kinematic side as well as collision free on the appliance side. This is achieved through appliance detection and full-model registration and collision free trajectory execution using online collision avoidance.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

This thesis investigates interactive scene reconstruction and understanding using RGB-D data only. Indeed, we believe that depth cameras will still be in the near future a cheap and low-power 3D sensing alternative suitable for mobile devices too. Therefore, our contributions build on top of state-of-the-art approaches to achieve advances in three main challenging scenarios, namely mobile mapping, large scale surface reconstruction and semantic modeling. First, we will describe an effective approach dealing with Simultaneous Localization And Mapping (SLAM) on platforms with limited resources, such as a tablet device. Unlike previous methods, dense reconstruction is achieved by reprojection of RGB-D frames, while local consistency is maintained by deploying relative bundle adjustment principles. We will show quantitative results comparing our technique to the state-of-the-art as well as detailed reconstruction of various environments ranging from rooms to small apartments. Then, we will address large scale surface modeling from depth maps exploiting parallel GPU computing. We will develop a real-time camera tracking method based on the popular KinectFusion system and an online surface alignment technique capable of counteracting drift errors and closing small loops. We will show very high quality meshes outperforming existing methods on publicly available datasets as well as on data recorded with our RGB-D camera even in complete darkness. Finally, we will move to our Semantic Bundle Adjustment framework to effectively combine object detection and SLAM in a unified system. Though the mathematical framework we will describe does not restrict to a particular sensing technology, in the experimental section we will refer, again, only to RGB-D sensing. We will discuss successful implementations of our algorithm showing the benefit of a joint object detection, camera tracking and environment mapping.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The first mechanical Automaton concept was found in a Chinese text written in the 3rd century BC, while Computer Vision was born in the late 1960s. Therefore, visual perception applied to machines (i.e. the Machine Vision) is a young and exciting alliance. When robots came in, the new field of Robotic Vision was born, and these terms began to be erroneously interchanged. In short, we can say that Machine Vision is an engineering domain, which concern the industrial use of Vision. The Robotic Vision, instead, is a research field that tries to incorporate robotics aspects in computer vision algorithms. Visual Servoing, for example, is one of the problems that cannot be solved by computer vision only. Accordingly, a large part of this work deals with boosting popular Computer Vision techniques by exploiting robotics: e.g. the use of kinematics to localize a vision sensor, mounted as the robot end-effector. The remainder of this work is dedicated to the counterparty, i.e. the use of computer vision to solve real robotic problems like grasping objects or navigate avoiding obstacles. Will be presented a brief survey about mapping data structures most widely used in robotics along with SkiMap, a novel sparse data structure created both for robotic mapping and as a general purpose 3D spatial index. Thus, several approaches to implement Object Detection and Manipulation, by exploiting the aforementioned mapping strategies, will be proposed, along with a completely new Machine Teaching facility in order to simply the training procedure of modern Deep Learning networks.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Sketches are a unique way to communicate: drawing a simple sketch does not require any training, sketches convey information that is hard to describe with words, they are powerful enough to represent almost any concept, and nowadays, it is possible to draw directly from mobile devices. Motivated from the unique characteristics of sketches and fascinated by the human ability to imagine 3D objects from drawings, this thesis focuses on automatically associating geometric information to sketches. The main research directions of the thesis can be summarized as obtaining geometric information from freehand scene sketches to improve 2D sketch-based tasks and investigating Vision-Language models to overcome 3D sketch-based tasks limitations. The first part of the thesis concerns geometric information prediction from scene sketches improving scene sketch to image generation and unlocking new creativity effects. The thesis proceeds showing a study conducted on the Vision-Language models embedding space considering sketches, line renderings and RGB renderings of 3D shape to overcome the use of supervised datasets for 3D sketch-based tasks, that are limited and hard to acquire. Following the obtained observations and results, Vision-Language models are applied to Sketch Based Shape Retrieval without the need of training on supervised datasets. We then analyze the use of Vision-Language models for sketch based 3D reconstruction in an unsupervised manner. In the final chapter we report the results obtained in an additional project carried during the PhD, which has lead to the development of a framework to learn an embedding space of neural networks that can be navigated to get ready-to-use models with desired characteristics.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

It is often assumed that humans generate a 3D reconstruction of the environment, either in egocentric or world-based coordinates, but the steps involved are unknown. Here, we propose two reconstruction-based models, evaluated using data from two tasks in immersive virtual reality. We model the observer’s prediction of landmark location based on standard photogrammetric methods and then combine location predictions to compute likelihood maps of navigation behaviour. In one model, each scene point is treated independently in the reconstruction; in the other, the pertinent variable is the spatial relationship between pairs of points. Participants viewed a simple environment from one location, were transported (virtually) to another part of the scene and were asked to navigate back. Error distributions varied substantially with changes in scene layout; we compared these directly with the likelihood maps to quantify the success of the models. We also measured error distributions when participants manipulated the location of a landmark to match the preceding interval, providing a direct test of the landmark-location stage of the navigation models. Models such as this, which start with scenes and end with a probabilistic prediction of behaviour, are likely to be increasingly useful for understanding 3D vision.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Structured AbstractObjectivesTo investigate the 3D morphological variations in 169 temporomandibular ioint (TMJ) condyles, using novel imaging statistical modeling approaches.Setting and sample populationThe Department of Orthodontics and Pediatric Dentistry at the University of Michigan. Cone beam CT scans were acquired from 69 subjects with long-term TMJ osteoarthritis (OA, mean age 39.115.7years), 15 subjects at initial consult diagnosis of OA (mean age 44.914.8years), and seven healthy controls (mean age 4312.4years).Materials and methods3D surface models of the condyles were constructed, and homologous correspondent points on each model were established. The statistical framework included Direction-Projection-Permutation (DiProPerm) for testing statistical significance of the differences between healthy controls and the OA groups determined by clinical and radiographic diagnoses.ResultsCondylar morphology in OA and healthy subjects varied widely with categorization from mild to severe bone degeneration or overgrowth. DiProPerm statistics supported a significant difference between the healthy control group and the initial diagnosis of OA group (t=6.6, empirical p-value=0.006) and between healthy and long-term diagnosis of OA group (t=7.2, empirical p-value=0). Compared with healthy controls, the average condyle in OA subjects was significantly smaller in all dimensions, except its anterior surface, even in subjects with initial diagnosis of OA.ConclusionThis new statistical modeling of condylar morphology allows the development of more targeted classifications of this condition than previously possible.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Lo scopo di questa tesi è dimostrare che è possibile integrare elementi sintetici in scene reali, producendo rendering realistici, avvalendosi di software come Autodesk Maya e Adobe Photoshop. Il progetto, infatti, consiste nella realizzazione di cinque immagini ottenute assemblando in fotografie reali, oggetti 3D, generati usando il software di modellazione Maya, con l’obiettivo di produrre immagini finali altamente fotorealistiche, per cui l’osservatore non riesce a distinguere se l’immagine è reale o sintetica.

Relevância:

40.00% 40.00%

Publicador:

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The analysis and reconstruction of forensically relevant events, such as traffic accidents, criminal assaults and homicides are based on external and internal morphological findings of the injured or deceased person. For this approach high-tech methods are gaining increasing importance in forensic investigations. The non-contact optical 3D digitising system GOM ATOS is applied as a suitable tool for whole body surface and wound documentation and analysis in order to identify injury-causing instruments and to reconstruct the course of event. In addition to the surface documentation, cross-sectional imaging methods deliver medical internal findings of the body. These 3D data are fused into a whole body model of the deceased. Additional to the findings of the bodies, the injury inflicting instruments and incident scene is documented in 3D. The 3D data of the incident scene, generated by 3D laser scanning and photogrammetry, is also included into the reconstruction. Two cases illustrate the methods. In the fist case a man was shot in his bedroom and the main question was, if the offender shot the man intentionally or accidentally, as he declared. In the second case a woman was hit by a car, driving backwards into a garage. It was unclear if the driver drove backwards once or twice, which would indicate that he willingly injured and killed the woman. With this work, we demonstrate how 3D documentation, data merging and animation enable to answer reconstructive questions regarding the dynamic development of patterned injuries, and how this leads to a real data based reconstruction of the course of event.