998 resultados para RGB-D data
Resumo:
En esta tesis se presenta un análisis en profundidad de cómo se deben utilizar dos tipos de métodos directos, Lucas-Kanade e Inverse Compositional, en imágenes RGB-D y se analiza la capacidad y precisión de los mismos en una serie de experimentos sintéticos. Estos simulan imágenes RGB, imágenes de profundidad (D) e imágenes RGB-D para comprobar cómo se comportan en cada una de las combinaciones. Además, se analizan estos métodos sin ninguna técnica adicional que modifique el algoritmo original ni que lo apoye en su tarea de optimización tal y como sucede en la mayoría de los artículos encontrados en la literatura. Esto se hace con el fin de poder entender cuándo y por qué los métodos convergen o divergen para que así en el futuro cualquier interesado pueda aplicar los conocimientos adquiridos en esta tesis de forma práctica. Esta tesis debería ayudar al futuro interesado a decidir qué algoritmo conviene más en una determinada situación y debería también ayudarle a entender qué problemas le pueden dar estos algoritmos para poder poner el remedio más apropiado. Las técnicas adicionales que sirven de remedio para estos problemas quedan fuera de los contenidos que abarca esta tesis, sin embargo, sí se hace una revisión sobre ellas.---ABSTRACT---This thesis presents an in-depth analysis about how direct methods such as Lucas- Kanade and Inverse Compositional can be applied in RGB-D images. The capability and accuracy of these methods is also analyzed employing a series of synthetic experiments. These simulate the efects produced by RGB images, depth images and RGB-D images so that diferent combinations can be evaluated. Moreover, these methods are analyzed without using any additional technique that modifies the original algorithm or that aids the algorithm in its search for a global optima unlike most of the articles found in the literature. Our goal is to understand when and why do these methods converge or diverge so that in the future, the knowledge extracted from the results presented here can efectively help a potential implementer. After reading this thesis, the implementer should be able to decide which algorithm fits best for a particular task and should also know which are the problems that have to be addressed in each algorithm so that an appropriate correction is implemented using additional techniques. These additional techniques are outside the scope of this thesis, however, they are reviewed from the literature.
Resumo:
The use of RGB-D sensors for mapping and recognition tasks in robotics or, in general, for virtual reconstruction has increased in recent years. The key aspect of these kinds of sensors is that they provide both depth and color information using the same device. In this paper, we present a comparative analysis of the most important methods used in the literature for the registration of subsequent RGB-D video frames in static scenarios. The analysis begins by explaining the characteristics of the registration problem, dividing it into two representative applications: scene modeling and object reconstruction. Then, a detailed experimentation is carried out to determine the behavior of the different methods depending on the application. For both applications, we used standard datasets and a new one built for object reconstruction.
Resumo:
This paper presents a method for fast calculation of the egomotion done by a robot using visual features. The method is part of a complete system for automatic map building and Simultaneous Localization and Mapping (SLAM). The method uses optical flow in order to determine if the robot has done a movement. If so, some visual features which do not accomplish several criteria (like intersection, unicity, etc,) are deleted, and then the egomotion is calculated. We use a state-of-the-art algorithm (TORO) in order to rectify the map and solve the SLAM problem. The proposed method provides better efficiency that other current methods.
Resumo:
Mode of access: Internet.
Resumo:
Los sensores de propósito general RGB-D son dispositivos capaces de proporcionar información de color y de profundidad de la escena. Debido al amplio rango de aplicación que tienen estos sensores, despiertan gran interés en múltiples áreas, provocando que en algunos casos funcionen al límite de sensibilidad. Los métodos de calibración resultan más importantes, si cabe, para este tipo de sensores para mejorar la precisión de los datos adquiridos. Por esta razón, resulta de enorme transcendencia analizar y estudiar el calibrado de estos sensores RGBD de propósito general. En este trabajo se ha realizado un estudio de las diferentes tecnologías empleadas para determinar la profundidad, siendo la luz estructurada y el tiempo de vuelo las más comunes. Además, se ha analizado y estudiado aquellos parámetros del sensor que influyen en la obtención de los datos con precisión adecuada dependiendo del problema a tratar. El calibrado determina, como primer elemento del proceso de visión, los parámetros característicos que definen un sistema de visión artificial, en este caso, aquellos que permiten mejorar la exactitud y precisión de los datos aportados. En este trabajo se han analizado tres algoritmos de calibración, tanto de propósito general como de propósito específico, para llevar a cabo el proceso de calibrado de tres sensores ampliamente utilizados: Microsoft Kinect, PrimeSense Carmine 1.09 y Microsoft Kinect v2. Los dos primeros utilizan la tecnología de luz estructurada para determinar la profundidad, mientras que el tercero utiliza tiempo de vuelo. La experimentación realizada permite determinar de manera cuantitativa la exactitud y la precisión de los sensores y su mejora durante el proceso de calibrado, aportando los mejores resultados para cada caso. Finalmente, y con el objetivo de mostrar el proceso de calibrado en un sistema de registro global, diferentes pruebas han sido realizadas con el método de registro µ-MAR. Se ha utilizado inspección visual para determinar el comportamiento de los datos de captura corregidos según los resultados de los diferentes algoritmos de calibrado. Este hecho permite observar la importancia de disponer de datos exactos para ciertas aplicaciones como el registro 3D de una escena.
Resumo:
En este trabajo se propone un método que combina descriptores de imágenes de intensidad y de profundidad para detectar de manera robusta el problema de cierre de bucle en SLAM. La robustez del método, proporcionada por el empleo conjunto de información de diversa naturaleza, permite detectar lugares revisitados en situaciones donde m´etodos basados solo en intensidad o en profundidad presentan dificultades (p.e. condiciones de iluminación deficientes, o falta de geometría). Además, se ha diseñado el métod cuenta su eficiencia, recurriendo para ello al detector FAST para extraer las características de las observaciones y al descriptor binario BRIEF. La detección de bucle se completa con una Bolsa de Palabras binarias. El rendimiento del método propuesto se ha evaluado en condiciones reales, obteniéndose resultados muy satisfactorios.
Resumo:
Paper submitted to the 43rd International Symposium on Robotics (ISR), Taipei, Taiwan, August 29-31, 2012.
Resumo:
In this paper we present a convolutional neuralnetwork (CNN)-based model for human head pose estimation inlow-resolution multi-modal RGB-D data. We pose the problemas one of classification of human gazing direction. We furtherfine-tune a regressor based on the learned deep classifier. Next wecombine the two models (classification and regression) to estimateapproximate regression confidence. We present state-of-the-artresults in datasets that span the range of high-resolution humanrobot interaction (close up faces plus depth information) data tochallenging low resolution outdoor surveillance data. We buildupon our robust head-pose estimation and further introduce anew visual attention model to recover interaction with theenvironment. Using this probabilistic model, we show thatmany higher level scene understanding like human-human/sceneinteraction detection can be achieved. Our solution runs inreal-time on commercial hardware
Resumo:
Nowadays, new computers generation provides a high performance that enables to build computationally expensive computer vision applications applied to mobile robotics. Building a map of the environment is a common task of a robot and is an essential part to allow the robots to move through these environments. Traditionally, mobile robots used a combination of several sensors from different technologies. Lasers, sonars and contact sensors have been typically used in any mobile robotic architecture, however color cameras are an important sensor due to we want the robots to use the same information that humans to sense and move through the different environments. Color cameras are cheap and flexible but a lot of work need to be done to give robots enough visual understanding of the scenes. Computer vision algorithms are computational complex problems but nowadays robots have access to different and powerful architectures that can be used for mobile robotics purposes. The advent of low-cost RGB-D sensors like Microsoft Kinect which provide 3D colored point clouds at high frame rates made the computer vision even more relevant in the mobile robotics field. The combination of visual and 3D data allows the systems to use both computer vision and 3D processing and therefore to be aware of more details of the surrounding environment. The research described in this thesis was motivated by the need of scene mapping. Being aware of the surrounding environment is a key feature in many mobile robotics applications from simple robotic navigation to complex surveillance applications. In addition, the acquisition of a 3D model of the scenes is useful in many areas as video games scene modeling where well-known places are reconstructed and added to game systems or advertising where once you get the 3D model of one room the system can add furniture pieces using augmented reality techniques. In this thesis we perform an experimental study of the state-of-the-art registration methods to find which one fits better to our scene mapping purposes. Different methods are tested and analyzed on different scene distributions of visual and geometry appearance. In addition, this thesis proposes two methods for 3d data compression and representation of 3D maps. Our 3D representation proposal is based on the use of Growing Neural Gas (GNG) method. This Self-Organizing Maps (SOMs) has been successfully used for clustering, pattern recognition and topology representation of various kind of data. Until now, Self-Organizing Maps have been primarily computed offline and their application in 3D data has mainly focused on free noise models without considering time constraints. Self-organising neural models have the ability to provide a good representation of the input space. In particular, the Growing Neural Gas (GNG) is a suitable model because of its flexibility, rapid adaptation and excellent quality of representation. However, this type of learning is time consuming, specially for high-dimensional input data. Since real applications often work under time constraints, it is necessary to adapt the learning process in order to complete it in a predefined time. This thesis proposes a hardware implementation leveraging the computing power of modern GPUs which takes advantage of a new paradigm coined as General-Purpose Computing on Graphics Processing Units (GPGPU). Our proposed geometrical 3D compression method seeks to reduce the 3D information using plane detection as basic structure to compress the data. This is due to our target environments are man-made and therefore there are a lot of points that belong to a plane surface. Our proposed method is able to get good compression results in those man-made scenarios. The detected and compressed planes can be also used in other applications as surface reconstruction or plane-based registration algorithms. Finally, we have also demonstrated the goodness of the GPU technologies getting a high performance implementation of a CAD/CAM common technique called Virtual Digitizing.
Resumo:
A navegação e a interpretação do meio envolvente por veículos autónomos em ambientes não estruturados continua a ser um grande desafio na actualidade. Sebastian Thrun, descreve em [Thr02], que o problema do mapeamento em sistemas robóticos é o da aquisição de um modelo espacial do meio envolvente do robô. Neste contexto, a integração de sistemas sensoriais em plataformas robóticas, que permitam a construção de mapas do mundo que as rodeia é de extrema importância. A informação recolhida desses dados pode ser interpretada, tendo aplicabilidade em tarefas de localização, navegação e manipulação de objectos. Até à bem pouco tempo, a generalidade dos sistemas robóticos que realizavam tarefas de mapeamento ou Simultaneous Localization And Mapping (SLAM), utilizavam dispositivos do tipo laser rangefinders e câmaras stereo. Estes equipamentos, para além de serem dispendiosos, fornecem apenas informação bidimensional, recolhidas através de cortes transversais 2D, no caso dos rangefinders. O paradigma deste tipo de tecnologia mudou consideravelmente, com o lançamento no mercado de câmaras RGB-D, como a desenvolvida pela PrimeSense TM e o subsequente lançamento da Kinect, pela Microsoft R para a Xbox 360 no final de 2010. A qualidade do sensor de profundidade, dada a natureza de baixo custo e a sua capacidade de aquisição de dados em tempo real, é incontornável, fazendo com que o sensor se tornasse instantaneamente popular entre pesquisadores e entusiastas. Este avanço tecnológico deu origem a várias ferramentas de desenvolvimento e interacção humana com este tipo de sensor, como por exemplo a Point Cloud Library [RC11] (PCL). Esta ferramenta tem como objectivo fornecer suporte para todos os blocos de construção comuns que uma aplicação 3D necessita, dando especial ênfase ao processamento de nuvens de pontos de n dimensões adquiridas a partir de câmaras RGB-D, bem como scanners laser, câmaras Time-of-Flight ou câmaras stereo. Neste contexto, é realizada nesta dissertação, a avaliação e comparação de alguns dos módulos e métodos constituintes da biblioteca PCL, para a resolução de problemas inerentes à construção e interpretação de mapas, em ambientes indoor não estruturados, utilizando os dados provenientes da Kinect. A partir desta avaliação, é proposta uma arquitectura de sistema que sistematiza o registo de nuvens de pontos, correspondentes a vistas parciais do mundo, num modelo global consistente. Os resultados da avaliação realizada à biblioteca PCL atestam a sua viabilidade, para a resolução dos problemas propostos. Prova da sua viabilidade, são os resultados práticos obtidos, da implementação da arquitectura de sistema proposta, que apresenta resultados de desempenho interessantes, como também boas perspectivas de integração deste tipo de conceitos e tecnologia em plataformas robóticas desenvolvidas no âmbito de projectos do Laboratório de Sistemas Autónomos (LSA).
Resumo:
A high-resolution three-dimensional (3-D) seismic reflection survey was conducted in Lake Geneva, near the city of Lausanne, Switzerland, as part of a project for developing such seismic techniques. Using a single 48-channel streamer, the 3-D site with an area of 1200 m x 600 m was surveyed in 10 days. A variety of complex geologic structures (e.g. thrusts, folds, channel-fill) up to similar to150 m below the water bottom were obtained with a 15 in.(3) water gun. The 3-D data allowed the construction of an accurate velocity model and the distinction of five major seismic facies within the Lower Freshwater Molasse (Aquitanian) and the Quaternary sedimentary units. Additionally, the Plateau Molasse (PM) and Subalpine Molasse (SM) erosional surface, "La Paudeze" thrust fault (PM-SM boundary) and the thickness of Quaternary sediments were accurately delineated in 3-D.
Resumo:
Extension of 3-D atmospheric data products back into the past is desirable for a wide range of applications. Historical upper-air data are important in this endeavour, particularly in the maritime regions of the tropics and the southern hemisphere, where observations are extremely sparse. Here we present newly digitized and re-evaluated early ship-based upper-air data from two cruises: (1) kite and registering balloon profiles from onboard the ship SMS Planet on a cruise from Europe around South Africa and across the Indian Ocean to the western Pacific in 1906/1907, and (2) ship-based radiosonde data from onboard the MS Schwabenland on a cruise from Europe across the Atlantic to Antarctica and back in 1938/1939. We describe the data and provide estimations of the errors. We compare the data with a recent reanalysis (the Twentieth Century Reanalysis Project, 20CR, Compo et al., 2011) that provides global 3-D data back to the 19th century based on an assimilation of surface pressure data only (plus monthly mean sea-surface temperatures). In cruise (1), the agreement is generally good, but large temperature differences appear during a period with a strong inversion. In cruise (2), after a subset of the data are corrected, close agreement between observations and 20CR is found for geopotential height (GPH) and temperature notwithstanding a likely cold bias of 20CR at the tropopause level. Results are considerably worse for relative humidity, which was reportedly inaccurately measured. Note that comparing 20CR, which has limited skill in the tropical regions, with measurements from ships in remote regions made under sometimes difficult conditions can be considered a worst case assessment. In view of that fact, the anomaly correlations for temperature of 0.3–0.6 in the lower troposphere in cruise (1) and of 0.5–0.7 for tropospheric temperature and GPH in cruise (2) are considered as promising results. Moreover, they are consistent with the error estimations. The results suggest room for further improvement of data products in remote regions.
Resumo:
This paper proposes a new methodology for object based 2-D data fu- sion, with a multiscale character. This methodology is intended to be use in agriculture, specifically in the characterization of the water status of different crops, so as to have an appropriate water management at a farm-holding scale. As a first approach to its evaluation, vegetation cover vigor data has been integrated with texture data. For this purpose, NDVI maps have been calculated using a multispectral image and Lacunarity maps from the panchromatic image. Preliminary results show this methodology is viable in the integration and management of large volumes of data, which characterize the behavior of agricultural covers at farm-holding scale.
Resumo:
In the recent years, the computer vision community has shown great interest on depth-based applications thanks to the performance and flexibility of the new generation of RGB-D imagery. In this paper, we present an efficient background subtraction algorithm based on the fusion of multiple region-based classifiers that processes depth and color data provided by RGB-D cameras. Foreground objects are detected by combining a region-based foreground prediction (based on depth data) with different background models (based on a Mixture of Gaussian algorithm) providing color and depth descriptions of the scene at pixel and region level. The information given by these modules is fused in a mixture of experts fashion to improve the foreground detection accuracy. The main contributions of the paper are the region-based models of both background and foreground, built from the depth and color data. The obtained results using different database sequences demonstrate that the proposed approach leads to a higher detection accuracy with respect to existing state-of-the-art techniques.