14 resultados para Kinect depth sensor
em Universidad Politécnica de Madrid
Resumo:
Este trabajo esta orientado a resolver el problema de la caracterización de la copa de arboles frutales para la aplicacion localizada de fitosanitarios. Esta propuesta utiliza un mapa de profundidad (Depth image) y una imagen RGB combinadas (RGB-D), proporcionados por el sensor Kinect de Microsoft, para aplicar pesticidas de forma localizada. A través del mapa de profundidad se puede estimar la densidad de la copa y a partir de esta información determinar qué boquillas se deben abrir en cada momento. Se desarrollaron algoritmos implementados en Matlab que permiten además de la adquisición de las imágenes RGB-D, aplicar plaguicidas sólo a hojas y/o frutos según se desee. Estos algoritmos fueron implementados en un software que se comunica con el entorno de desarrollo "Kinect Windows SDK", encargado de extraer las imágenes desde el sensor Kinect. Por otra parte, para identificar hojas, se implementaron algoritmos de clasificación e identificación. Los algoritmos de clasificación utilizados fueron "Fuzzy C-Means con Gustafson Kessel" (FCM-GK) y "K-Means". Los centroides o prototipos de cada clase generados por FCM-GK fueron usados como semilla para K-Means, para acelerar la convergencia del algoritmo y mantener la coherencia temporal en los grupos generados por K-Means. Los algoritmos de clasificación fueron aplicados sobre las imágenes transformadas al espacio de color L*a*b*; específicamente se emplearon los canales a*, b* (canales cromáticos) con el fin de reducir el efecto de la luz sobre los colores. Los algoritmos de clasificación fueron configurados para buscar cuatro grupos: hojas, porosidad, frutas y tronco. Una vez que el clasificador genera los prototipos de los grupos, un clasificador denominado Máquina de Soporte Vectorial, que utiliza como núcleo una función Gaussiana base radial, identifica la clase de interés (hojas). La combinación de estos algoritmos ha mostrado bajos errores de clasificación, rendimiento del 4% de error en la identificación de hojas. Además, estos algoritmos de procesamiento de hasta 8.4 imágenes por segundo, lo que permite su aplicación en tiempo real. Los resultados demuestran la viabilidad de utilizar el sensor "Kinect" para determinar dónde y cuándo aplicar pesticidas. Por otra parte, también muestran que existen limitaciones en su uso, impuesta por las condiciones de luz. En otras palabras, es posible usar "Kinect" en exteriores, pero durante días nublados, temprano en la mañana o en la noche con iluminación artificial, o añadiendo un parasol en condiciones de luz intensa.
Resumo:
In this paper we present an efficient hole filling strategy that improves the quality of the depth maps obtained with the Microsoft Kinect device. The proposed approach is based on a joint-bilateral filtering framework that includes spatial and temporal information. The missing depth values are obtained applying iteratively a joint-bilateral filter to their neighbor pixels. The filter weights are selected considering three different factors: visual data, depth information and a temporal-consistency map. Video and depth data are combined to improve depth map quality in presence of edges and homogeneous regions. Finally, the temporal-consistency map is generated in order to track the reliability of the depth measurements near the hole regions. The obtained depth values are included iteratively in the filtering process of the successive frames and the accuracy of the hole regions depth values increases while new samples are acquired and filtered
Resumo:
Low-cost systems that can obtain a high-quality foreground segmentation almostindependently of the existing illumination conditions for indoor environments are verydesirable, especially for security and surveillance applications. In this paper, a novelforeground segmentation algorithm that uses only a Kinect depth sensor is proposedto satisfy the aforementioned system characteristics. This is achieved by combininga mixture of Gaussians-based background subtraction algorithm with a new Bayesiannetwork that robustly predicts the foreground/background regions between consecutivetime steps. The Bayesian network explicitly exploits the intrinsic characteristics ofthe depth data by means of two dynamic models that estimate the spatial and depthevolution of the foreground/background regions. The most remarkable contribution is thedepth-based dynamic model that predicts the changes in the foreground depth distributionbetween consecutive time steps. This is a key difference with regard to visible imagery,where the color/gray distribution of the foreground is typically assumed to be constant.Experiments carried out on two different depth-based databases demonstrate that theproposed combination of algorithms is able to obtain a more accurate segmentation of theforeground/background than other state-of-the art approaches.
Resumo:
A depth-based face recognition algorithm specially adapted to high range resolution data acquired by the new Microsoft Kinect 2 sensor is presented. A novel descriptor called Depth Local Quantized Pattern descriptor has been designed to make use of the extended range resolution of the new sensor. This descriptor is a substantial modification of the popular Local Binary Pattern algorithm. One of the main contributions is the introduction of a quantification step, increasing its capacity to distinguish different depth patterns. The proposed descriptor has been used to train and test a Support Vector Machine classifier, which has proven to be able to accurately recognize different people faces from a wide range of poses. In addition, a new depth-based face database acquired by the new Kinect 2 sensor have been created and made public to evaluate the proposed face recognition system.
Resumo:
The proliferation of video games and other applications of computer graphics in everyday life demands a much easier way to create animatable virtual human characters. Traditionally, this has been the job of highly skilled artists and animators that painstakingly model, rig and animate their avatars, and usually have to tune them for each application and transmission/rendering platform. The emergence of virtual/mixed reality environments also calls for practical and costeffective ways to produce custom models of actual people. The purpose of the present dissertation is bringing 3D human scanning closer to the average user. For this, two different techniques are presented, one passive and one active. The first one is a fully automatic system for generating statically multi-textured avatars of real people captured with several standard cameras. Our system uses a state-of-the-art shape from silhouette technique to retrieve the shape of subject. However, to deal with the lack of detail that is common in the facial region for these kind of techniques, which do not handle concavities correctly, our system proposes an approach to improve the quality of this region. This face enhancement technique uses a generic facial model which is transformed according to the specific facial features of the subject. Moreover, this system features a novel technique for generating view-independent texture atlases computed from the original images. This static multi-texturing system yields a seamless texture atlas calculated by combining the color information from several photos. We suppress the color seams due to image misalignments and irregular lighting conditions that multi-texturing approaches typically suffer from, while minimizing the blurring effect introduced by color blending techniques. The second technique features a system to retrieve a fully animatable 3D model of a human using a commercial depth sensor. Differently to other approaches in the current state of the art, our system does not require the user to be completely still through the scanning process, and neither the depth sensor is moved around the subject to cover all its surface. Instead, the depth sensor remains static and the skeleton tracking information is used to compensate the user’s movements during the scanning stage. RESUMEN La popularización de videojuegos y otras aplicaciones de los gráficos por ordenador en el día a día requiere una manera más sencilla de crear modelos virtuales humanos animables. Tradicionalmente, estos modelos han sido creados por artistas profesionales que cuidadosamente los modelan y animan, y que tienen que adaptar específicamente para cada aplicación y plataforma de transmisión o visualización. La aparición de los entornos de realidad virtual/mixta aumenta incluso más la demanda de técnicas prácticas y baratas para producir modelos 3D representando personas reales. El objetivo de esta tesis es acercar el escaneo de humanos en 3D al usuario medio. Para ello, se presentan dos técnicas diferentes, una pasiva y una activa. La primera es un sistema automático para generar avatares multi-texturizados de personas reales mediante una serie de cámaras comunes. Nuestro sistema usa técnicas del estado del arte basadas en shape from silhouette para extraer la forma del sujeto a escanear. Sin embargo, este tipo de técnicas no gestiona las concavidades correctamente, por lo que nuestro sistema propone una manera de incrementar la calidad en una región del modelo que se ve especialmente afectada: la cara. Esta técnica de mejora facial usa un modelo 3D genérico de una cara y lo modifica según los rasgos faciales específicos del sujeto. Además, el sistema incluye una novedosa técnica para generar un atlas de textura a partir de las imágenes capturadas. Este sistema de multi-texturización consigue un atlas de textura sin transiciones abruptas de color gracias a su manera de mezclar la información de color de varias imágenes sobre cada triángulo. Todas las costuras y discontinuidades de color debidas a las condiciones de iluminación irregulares son eliminadas, minimizando el efecto de desenfoque de la interpolación que normalmente introducen este tipo de métodos. La segunda técnica presenta un sistema para conseguir un modelo humano 3D completamente animable utilizando un sensor de profundidad. A diferencia de otros métodos del estado de arte, nuestro sistema no requiere que el usuario esté completamente quieto durante el proceso de escaneado, ni mover el sensor alrededor del sujeto para cubrir toda su superficie. Por el contrario, el sensor se mantiene estático y el esqueleto virtual de la persona, que se va siguiendo durante el proceso, se utiliza para compensar sus movimientos durante el escaneado.
Resumo:
La cámara Kinect está desarrollada por Prime Sense en colaboración con Microsoft para la consola XBox, ofrece imágenes de profundidad gracias a un sensor infrarrojo. Este dispositivo también incluye una cámara RGB que ofrece imágenes a color además de una serie de micrófonos colocados de tal manera que son capaces de saber de qué ángulo proviene el sonido. En un principio Kinect se creó para el ocio doméstico pero su bajo precio (en comparación con otras cámaras de iguales características) y la aceptación por parte de desarrolladores han explotado sus posibilidades. El objetivo de este proyecto es, partiendo de estos datos, la obtención de variables cinemáticas tales como posición, velocidad y aceleración de determinados puntos de control del cuerpo de un individuo como pueden ser el cabeza, cuello, hombros, codos, muñecas, caderas, rodillas y tobillos a partir de los cuales poder extraer patrones de movimiento. Para ello se necesita un middleware mediante el entorno de libre distribución (GNU) multiplataforma. Como IDE se ha utilizado Processing, un entorno open source creado para proyectos de diseño. Además se ha utilizado el contenedor SimpleOpenNI, desarrollado por estudiantes e investigadores que trabajan con Kinect. Esto ofrece la posibilidad de prescindir del SDK de Microsoft, el cual es propietario y obliga a utilizar su sistema operativo, Windows. Usando estas herramientas se consigue una solución viable para varios sistemas operativos. Se han utilizado métodos y facilidades que ofrece el lenguaje orientado a objetos Java (Proccesing hereda de este), y se ha planteado una solución basada en un modelo cliente servidor que dota de escalabilidad al proyecto. El resultado del proyecto es útil en aplicaciones para poblaciones con riesgo de exclusión (como es el espectro autista), en telediagnóstico, y en general entornos donde se necesite estudiar hábitos y comportamientos a partir del movimiento humano. Con este proyecto se busca tener una continuidad mediante otras aplicaciones que analicen los datos ofrecidos. ABSTRACT. The Kinect camera is developed by PrimeSense in collaboration with Microsoft for the xBox console provides depth images thanks to an infrared sensor. This device also includes an RGB camera that provides color images in addition to a number of microphones placed such that they are able to know what angle the sound comes. Kinect initially created for domestic leisure but its low prices (compared to other cameras with the same characteristics) and acceptance by developers have exploited its possibilities. The objective of this project is based on this data to obtain kinematic variables such as position, velocity and acceleration of certain control points of the body of an individual from which to extract movement patterns. These points can be the head, neck, shoulders, elbows, wrists, hips, knees and ankles. This requires a middleware using freely distributed environment (GNU) platform. Processing has been used as a development environment, and open source environment created for design projects. Besides the container SimpleOpenNi has been used, it developed by students and researchers working with Kinect. This offers the possibility to dispense with the Microsoft SDK which owns and agrees to use its operating system, Windows. Using these tools will get a viable solution for multiple operating systems. We used methods and facilities of the Java object-oriented language (Processing inherits from this) and has proposed a solution based on a client-server model which provides scalability to the project. The result of the project is useful in applications to populations at risk of exclusion (such as autistic spectrum), in remote diagnostic, and in general environments that need study habits and behaviors from human motion. This project aims to have continuity using other applications to analyze the data provided.
Resumo:
En este Proyecto Fin de Carrera, se presenta un sistema de reconocimiento de gestos para teleoperar robots basado en el sensor Kinect. El proyecto se divide en dos partes, la primera relativa al diseño y evaluación de un sistema de reconocimiento de gestos basado en el sensor Kinect; y la segunda, relativa a la teleoperación de robots usando el sistema de reconocimiento de gestos desarrollado. En la primera parte, se enumeran las características y limitaciones del sensor Kinect. Posteriormente, se analiza la detección de movimiento y se presenta la máquina de estados propuesta para detectar el movimiento de un gesto. A continuación, se explican los posibles preprocesados de un esqueleto en 3 dimensiones para mejorar la detección de gestos y el algoritmo utilizado para la detección de gestos, el algoritmo de Alineamiento Temporal Dinámico (DTW). Por último, se expone con detalle el software desarrollado de reconocimiento y evaluación de gestos, el Evaluador de Gestos, y se realiza un análisis de varias evaluaciones realizadas con distintos perfiles de configuración donde se extraen las conclusiones de acierto, fiabilidad y precisión de cada configuración. En la segunda parte, se expone el sistema de teleoperación del robots y su integración con el evaluador de gestos: este sistema controla el robot Lego Mindstorm mediante la detección de gestos o el reconocimiento de voz. Por último, se exponen las conclusiones finales del proyecto.
Resumo:
Facilitating general access to data from sensor networks (including traffic, hydrology and other domains) increases their utility. In this paper we argue that the journalistic metaphor can be effectively used to automatically generate multimedia presentations that help non-expert users analyze and understand sensor data. The journalistic layout and style are familiar to most users. Furthermore, the journalistic approach of ordering information from most general to most specific helps users obtain a high-level understanding while providing them the freedom to choose the depth of analysis to which they want to go. We describe the general characteristics and architectural requirements for an interactive intelligent user interface for exploring sensor data that uses the journalistic metaphor. We also describe our experience in developing this interface in real-world domains (e.g., hydrology).
Resumo:
The evolution of water content on a sandy soil during the sprinkler irrigation campaign, in the summer of 2010, of a field of sugar beet crop located at Valladolid (Spain) is assessed by a capacitive FDR (Frequency Domain Reflectometry) EnviroScan. This field is one of the experimental sites of the Spanish research center for the sugar beet development (AIMCRA). The objective of the work focus on monitoring the soil water content evolution of consecutive irrigations during the second two weeks of July (from the 12th to the 28th). These measurements will be used to simulate water movement by means of Hydrus-2D. The water probe logged water content readings (m3/m3) at 10, 20, 40 and 60 cm depth every 30 minutes. The probe was placed between two rows in one of the typical 12 x 15 m sprinkler irrigation framework. Furthermore, a texture analysis at the soil profile was also conducted. The irrigation frequency in this farm was set by the own personal farmer 0 s criteria that aiming to minimizing electricity pumping costs, used to irrigate at night and during the weekend i.e. longer irrigation frequency than expected. However, the high evapotranspiration rates and the weekly sugar beet water consumption—up to 50mm/week—clearly determined the need for lower this frequency. Moreover, farmer used to irrigate for six or five hours whilst results from the EnviroScan probe showed the soil profile reaching saturation point after the first three hours. It must be noted that AIMCRA provides to his members with a SMS service regarding weekly sugar beet water requirement; from the use of different meteorological stations and evapotranspiration pans, farmers have an idea of the weekly irrigation needs. Nevertheless, it is the farmer 0 s decision to decide how to irrigate. Thus, in order to minimize water stress and pumping costs, a suitable irrigation time and irrigation frequency was modeled with Hydrus-2D. Results for the period above mentioned showed values of water content ranging from 35 and 30 (m3/m3) for the first 10 and 20cm profile depth (two hours after irrigation) to the minimum 14 and 13 (m3/m3) ( two hours before irrigation). For the 40 and 60 cm profile depth, water content moves steadily across the dates: The greater the root activity the greater the water content variation. According to the results in the EnviroScan probe and the modeling in Hydrus-2D, shorter frequencies and irrigation times are suggested.
Resumo:
Abstract The development of cognitive robots needs a strong “sensorial” support which should allow it to perceive the real world for interacting with it properly. Therefore the development of efficient visual-processing software to be equipped in effective artificial agents is a must. In this project we study and develop a visual-processing software that will work as the “eyes” of a cognitive robot. This software performs a three-dimensional mapping of the robot’s environment, providing it with the essential information required to make proper decisions during its navigation. Due to the complexity of this objective we have adopted the Scrum methodology in order to achieve an agile development process, which has allowed us to correct and improve in a fast way the successive versions of the product. The present project is structured in Sprints, which cover the different stages of the software development based on the requirements imposed by the robot and its real necessities. We have initially explored different commercial devices oriented to the acquisition of the required visual information, adopting the Kinect Sensor camera (Microsoft) as the most suitable option. Later on, we have studied the available software to manage the obtained visual information as well as its integration with the robot’s software, choosing the high-level platform Matlab as the common nexus to join the management of the camera, the management of the robot and the implementation of the behavioral algorithms. During the last stages the software has been developed to include the fundamental functionalities required to process the real environment, such as depth representation, segmentation, and clustering. Finally the software has been optimized to exhibit real-time processing and a suitable performance to fulfill the robot’s requirements during its operation in real situations.
Resumo:
The readout procedure of charge-coupled device (CCD) cameras is known to generate some image degradation in different scientific imaging fields, especially in astrophysics. In the particular field of particle image velocimetry (PIV), widely extended in the scientific community, the readout procedure of the interline CCD sensor induces a bias in the registered position of particle images. This work proposes simple procedures to predict the magnitude of the associated measurement error. Generally, there are differences in the position bias for the different images of a certain particle at each PIV frame. This leads to a substantial bias error in the PIV velocity measurement (~0.1 pixels). This is the order of magnitude that other typical PIV errors such as peak-locking may reach. Based on modern CCD technology and architecture, this work offers a description of the readout phenomenon and proposes a modeling for the CCD readout bias error magnitude. This bias, in turn, generates a velocity measurement bias error when there is an illumination difference between two successive PIV exposures. The model predictions match the experiments performed with two 12-bit-depth interline CCD cameras (MegaPlus ES 4.0/E incorporating the Kodak KAI-4000M CCD sensor with 4 megapixels). For different cameras, only two constant values are needed to fit the proposed calibration model and predict the error from the readout procedure. Tests by different researchers using different cameras would allow verification of the model, that can be used to optimize acquisition setups. Simple procedures to obtain these two calibration values are also described.
Resumo:
In the recent years, the computer vision community has shown great interest on depth-based applications thanks to the performance and flexibility of the new generation of RGB-D imagery. In this paper, we present an efficient background subtraction algorithm based on the fusion of multiple region-based classifiers that processes depth and color data provided by RGB-D cameras. Foreground objects are detected by combining a region-based foreground prediction (based on depth data) with different background models (based on a Mixture of Gaussian algorithm) providing color and depth descriptions of the scene at pixel and region level. The information given by these modules is fused in a mixture of experts fashion to improve the foreground detection accuracy. The main contributions of the paper are the region-based models of both background and foreground, built from the depth and color data. The obtained results using different database sequences demonstrate that the proposed approach leads to a higher detection accuracy with respect to existing state-of-the-art techniques.
Resumo:
Low cost RGB-D cameras such as the Microsoft’s Kinect or the Asus’s Xtion Pro are completely changing the computer vision world, as they are being successfully used in several applications and research areas. Depth data are particularly attractive and suitable for applications based on moving objects detection through foreground/background segmentation approaches; the RGB-D applications proposed in literature employ, in general, state of the art foreground/background segmentation techniques based on the depth information without taking into account the color information. The novel approach that we propose is based on a combination of classifiers that allows improving background subtraction accuracy with respect to state of the art algorithms by jointly considering color and depth data. In particular, the combination of classifiers is based on a weighted average that allows to adaptively modifying the support of each classifier in the ensemble by considering foreground detections in the previous frames and the depth and color edges. In this way, it is possible to reduce false detections due to critical issues that can not be tackled by the individual classifiers such as: shadows and illumination changes, color and depth camouflage, moved background objects and noisy depth measurements. Moreover, we propose, for the best of the author’s knowledge, the first publicly available RGB-D benchmark dataset with hand-labeled ground truth of several challenging scenarios to test background/foreground segmentation algorithms.
Resumo:
Ultrasound wave velocity was measured in 30 pieces of Spanish Scots pine (Pinus sylvestris L.), 90 x 140 mm in cross-section and 4 m long. Five different sensor placement arrangements were used: end to end (V0), face to opposite face, edge to opposite edge, face to same face and edge to same edge. The pieces were successively shortened to 3, 2 and 1 m, in order to obtain these velocities and their ratios to reference value V0 for different lengths and angles with respect to the piece axis for the crossed measurements. The velocity obtained in crossed measurements is lower than V0. A correction coefficient for crossed velocities is proposed, depending on the angle, to adjust them to the V0 benchmark. The velocities measured on a surface, are also lower than V0, and their ratio with respect to V0 is close to 0.97 for distances equal to or greater than 18 times the depth of the beam.