998 resultados para Imagem RGB-D
Resumo:
In this work, image based estimation methods, also known as direct methods, are studied which avoid feature extraction and matching completely. Cost functions use raw pixels as measurements and the goal is to produce precise 3D pose and structure estimates. The cost functions presented minimize the sensor error, because measurements are not transformed or modified. In photometric camera pose estimation, 3D rotation and translation parameters are estimated by minimizing a sequence of image based cost functions, which are non-linear due to perspective projection and lens distortion. In image based structure refinement, on the other hand, 3D structure is refined using a number of additional views and an image based cost metric. Image based estimation methods are particularly useful in conditions where the Lambertian assumption holds, and the 3D points have constant color despite viewing angle. The goal is to improve image based estimation methods, and to produce computationally efficient methods which can be accomodated into real-time applications. The developed image-based 3D pose and structure estimation methods are finally demonstrated in practise in indoor 3D reconstruction use, and in a live augmented reality application.
Resumo:
The recent emergence of low-cost RGB-D sensors has brought new opportunities for robotics by providing affordable devices that can provide synchronized images with both color and depth information. In this thesis, recent work on pose estimation utilizing RGBD sensors is reviewed. Also, a pose recognition system for rigid objects using RGB-D data is implemented. The implementation uses half-edge primitives extracted from the RGB-D images for pose estimation. The system is based on the probabilistic object representation framework by Detry et al., which utilizes Nonparametric Belief Propagation for pose inference. Experiments are performed on household objects to evaluate the performance and robustness of the system.
Resumo:
For general home monitoring, a system should automatically interpret people’s actions. The system should be non-intrusive, and able to deal with a cluttered background, and loose clothes. An approach based on spatio-temporal local features and a Bag-of-Words (BoW) model is proposed for single-person action recognition from combined intensity and depth images. To restore the temporal structure lost in the traditional BoW method, a dynamic time alignment technique with temporal binning is applied in this work, which has not been previously implemented in the literature for human action recognition on depth imagery. A novel human action dataset with depth data has been created using two Microsoft Kinect sensors. The ReadingAct dataset contains 20 subjects and 19 actions for a total of 2340 videos. To investigate the effect of using depth images and the proposed method, testing was conducted on three depth datasets, and the proposed method was compared to traditional Bag-of-Words methods. Results showed that the proposed method improves recognition accuracy when adding depth to the conventional intensity data, and has advantages when dealing with long actions.
Resumo:
[ES]Las tecnologías principales que se han utilizado son la visión por computador y los sensores de rango, es decir, las características visuales y la profundidad. Sin embargo, la aparición de sensores RGBD más asequibles, como Kinect, permite su aplicación en estos escenarios. Se aborda la utilización en entornos de interior de sensores RGBD para escenarios donde las condiciones de iluminación pueden ser variables. Se adopta una configuración cenital en el acceso a un espacio, para preservar la privacidad y facilitar la detección y seguimiento de los objetos salientes que aparecen en el escenario mediante técnicas de sustracción de fondo. Los objetos detectados son modelados, pudiendo ser descritos según las características de apariencia y geométricas como el área y volumen.
Resumo:
[EN]The re-identification problem has been commonly accomplished using appearance features based on salient points and color information. In this paper, we focus on the possibilities that simple geometric features obtained from depth images captured with RGB-D cameras may offer for the task, particularly working under severe illumination conditions. The results achieved for different sets of simple geometric features extracted in a top-view setup seem to provide useful descriptors for the re-identification task, which can be integrated in an ambient intelligent environment as part of a sensor network.
Resumo:
[EN]Re-identi fication is commonly accomplished using appearance features based on salient points and color information. In this paper, we make an study on the use of di fferent features exclusively obtained from depth images captured with RGB-D cameras. The results achieved, using simple geometric features extracted in a top-view setup, seem to provide useful descriptors for the re-identi fication task.
Resumo:
This thesis investigates interactive scene reconstruction and understanding using RGB-D data only. Indeed, we believe that depth cameras will still be in the near future a cheap and low-power 3D sensing alternative suitable for mobile devices too. Therefore, our contributions build on top of state-of-the-art approaches to achieve advances in three main challenging scenarios, namely mobile mapping, large scale surface reconstruction and semantic modeling. First, we will describe an effective approach dealing with Simultaneous Localization And Mapping (SLAM) on platforms with limited resources, such as a tablet device. Unlike previous methods, dense reconstruction is achieved by reprojection of RGB-D frames, while local consistency is maintained by deploying relative bundle adjustment principles. We will show quantitative results comparing our technique to the state-of-the-art as well as detailed reconstruction of various environments ranging from rooms to small apartments. Then, we will address large scale surface modeling from depth maps exploiting parallel GPU computing. We will develop a real-time camera tracking method based on the popular KinectFusion system and an online surface alignment technique capable of counteracting drift errors and closing small loops. We will show very high quality meshes outperforming existing methods on publicly available datasets as well as on data recorded with our RGB-D camera even in complete darkness. Finally, we will move to our Semantic Bundle Adjustment framework to effectively combine object detection and SLAM in a unified system. Though the mathematical framework we will describe does not restrict to a particular sensing technology, in the experimental section we will refer, again, only to RGB-D sensing. We will discuss successful implementations of our algorithm showing the benefit of a joint object detection, camera tracking and environment mapping.
Resumo:
Viene proposto un porting su piattaforma mobile Android di un sistema SLAM (Simultaneous Localization And Mapping) chiamato SlamDunk. Il porting affronta problematiche di prestazioni e qualità delle ricostruzioni 3D ottenute, proponendo poi la soluzione ritenuta ottimale.
Resumo:
An innovative background modeling technique that is able to accurately segment foreground regions in RGB-D imagery (RGB plus depth) has been presented in this paper. The technique is based on a Bayesian framework that efficiently fuses different sources of information to segment the foreground. In particular, the final segmentation is obtained by considering a prediction of the foreground regions, carried out by a novel Bayesian Network with a depth-based dynamic model, and, by considering two independent depth and color-based mixture of Gaussians background models. The efficient Bayesian combination of all these data reduces the noise and uncertainties introduced by the color and depth features and the corresponding models. As a result, more compact segmentations, and refined foreground object silhouettes are obtained. Experimental results with different databases suggest that the proposed technique outperforms existing state-of-the-art algorithms.
Resumo:
En esta tesis se presenta un análisis en profundidad de cómo se deben utilizar dos tipos de métodos directos, Lucas-Kanade e Inverse Compositional, en imágenes RGB-D y se analiza la capacidad y precisión de los mismos en una serie de experimentos sintéticos. Estos simulan imágenes RGB, imágenes de profundidad (D) e imágenes RGB-D para comprobar cómo se comportan en cada una de las combinaciones. Además, se analizan estos métodos sin ninguna técnica adicional que modifique el algoritmo original ni que lo apoye en su tarea de optimización tal y como sucede en la mayoría de los artículos encontrados en la literatura. Esto se hace con el fin de poder entender cuándo y por qué los métodos convergen o divergen para que así en el futuro cualquier interesado pueda aplicar los conocimientos adquiridos en esta tesis de forma práctica. Esta tesis debería ayudar al futuro interesado a decidir qué algoritmo conviene más en una determinada situación y debería también ayudarle a entender qué problemas le pueden dar estos algoritmos para poder poner el remedio más apropiado. Las técnicas adicionales que sirven de remedio para estos problemas quedan fuera de los contenidos que abarca esta tesis, sin embargo, sí se hace una revisión sobre ellas.---ABSTRACT---This thesis presents an in-depth analysis about how direct methods such as Lucas- Kanade and Inverse Compositional can be applied in RGB-D images. The capability and accuracy of these methods is also analyzed employing a series of synthetic experiments. These simulate the efects produced by RGB images, depth images and RGB-D images so that diferent combinations can be evaluated. Moreover, these methods are analyzed without using any additional technique that modifies the original algorithm or that aids the algorithm in its search for a global optima unlike most of the articles found in the literature. Our goal is to understand when and why do these methods converge or diverge so that in the future, the knowledge extracted from the results presented here can efectively help a potential implementer. After reading this thesis, the implementer should be able to decide which algorithm fits best for a particular task and should also know which are the problems that have to be addressed in each algorithm so that an appropriate correction is implemented using additional techniques. These additional techniques are outside the scope of this thesis, however, they are reviewed from the literature.
Resumo:
Low cost RGB-D cameras such as the Microsoft’s Kinect or the Asus’s Xtion Pro are completely changing the computer vision world, as they are being successfully used in several applications and research areas. Depth data are particularly attractive and suitable for applications based on moving objects detection through foreground/background segmentation approaches; the RGB-D applications proposed in literature employ, in general, state of the art foreground/background segmentation techniques based on the depth information without taking into account the color information. The novel approach that we propose is based on a combination of classifiers that allows improving background subtraction accuracy with respect to state of the art algorithms by jointly considering color and depth data. In particular, the combination of classifiers is based on a weighted average that allows to adaptively modifying the support of each classifier in the ensemble by considering foreground detections in the previous frames and the depth and color edges. In this way, it is possible to reduce false detections due to critical issues that can not be tackled by the individual classifiers such as: shadows and illumination changes, color and depth camouflage, moved background objects and noisy depth measurements. Moreover, we propose, for the best of the author’s knowledge, the first publicly available RGB-D benchmark dataset with hand-labeled ground truth of several challenging scenarios to test background/foreground segmentation algorithms.
Resumo:
Image Based Visual Servoing (IBVS) is a robotic control scheme based on vision. This scheme uses only the visual information obtained from a camera to guide a robot from any robot pose to a desired one. However, IBVS requires the estimation of different parameters that cannot be obtained directly from the image. These parameters range from the intrinsic camera parameters (which can be obtained from a previous camera calibration), to the measured distance on the optical axis between the camera and visual features, it is the depth. This paper presents a comparative study of the performance of D-IBVS estimating the depth from three different ways using a low cost RGB-D sensor like Kinect. The visual servoing system has been developed over ROS (Robot Operating System), which is a meta-operating system for robots. The experiments prove that the computation of the depth value for each visual feature improves the system performance.
Resumo:
The use of RGB-D sensors for mapping and recognition tasks in robotics or, in general, for virtual reconstruction has increased in recent years. The key aspect of these kinds of sensors is that they provide both depth and color information using the same device. In this paper, we present a comparative analysis of the most important methods used in the literature for the registration of subsequent RGB-D video frames in static scenarios. The analysis begins by explaining the characteristics of the registration problem, dividing it into two representative applications: scene modeling and object reconstruction. Then, a detailed experimentation is carried out to determine the behavior of the different methods depending on the application. For both applications, we used standard datasets and a new one built for object reconstruction.
Resumo:
This paper presents a method for fast calculation of the egomotion done by a robot using visual features. The method is part of a complete system for automatic map building and Simultaneous Localization and Mapping (SLAM). The method uses optical flow in order to determine if the robot has done a movement. If so, some visual features which do not accomplish several criteria (like intersection, unicity, etc,) are deleted, and then the egomotion is calculated. We use a state-of-the-art algorithm (TORO) in order to rectify the map and solve the SLAM problem. The proposed method provides better efficiency that other current methods.
Resumo:
This paper presents a method for the fast calculation of a robot’s egomotion using visual features. The method is part of a complete system for automatic map building and Simultaneous Location and Mapping (SLAM). The method uses optical flow to determine whether the robot has undergone a movement. If so, some visual features that do not satisfy several criteria are deleted, and then egomotion is calculated. Thus, the proposed method improves the efficiency of the whole process because not all the data is processed. We use a state-of-the-art algorithm (TORO) to rectify the map and solve the SLAM problem. Additionally, a study of different visual detectors and descriptors has been conducted to identify which of them are more suitable for the SLAM problem. Finally, a navigation method is described using the map obtained from the SLAM solution.