878 resultados para Depth Estimation,Deep Learning,Disparity Estimation,Computer Vision,Stereo Vision


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Epipolar geometry is a key point in computer vision and the fundamental matrix estimation is the only way to compute it. This article surveys several methods of fundamental matrix estimation which have been classified into linear methods, iterative methods and robust methods. All of these methods have been programmed and their accuracy analysed using real images. A summary, accompanied with experimental results, is given

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A novel technique for estimating the rank of the trajectory matrix in the local subspace affinity (LSA) motion segmentation framework is presented. This new rank estimation is based on the relationship between the estimated rank of the trajectory matrix and the affinity matrix built with LSA. The result is an enhanced model selection technique for trajectory matrix rank estimation by which it is possible to automate LSA, without requiring any a priori knowledge, and to improve the final segmentation

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El treball desenvolupat en aquesta tesi aprofundeix i aporta solucions innovadores en el camp orientat a tractar el problema de la correspondència en imatges subaquàtiques. En aquests entorns, el que realment complica les tasques de processat és la falta de contorns ben definits per culpa d'imatges esborronades; un fet aquest que es deu fonamentalment a il·luminació deficient o a la manca d'uniformitat dels sistemes d'il·luminació artificials. Els objectius aconseguits en aquesta tesi es poden remarcar en dues grans direccions. Per millorar l'algorisme d'estimació de moviment es va proposar un nou mètode que introdueix paràmetres de textura per rebutjar falses correspondències entre parells d'imatges. Un seguit d'assaigs efectuats en imatges submarines reals han estat portats a terme per seleccionar les estratègies més adients. Amb la finalitat d'aconseguir resultats en temps real, es proposa una innovadora arquitectura VLSI per la implementació d'algunes parts de l'algorisme d'estimació de moviment amb alt cost computacional.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Analysis of human behaviour through visual information has been a highly active research topic in the computer vision community. This was previously achieved via images from a conventional camera, but recently depth sensors have made a new type of data available. This survey starts by explaining the advantages of depth imagery, then describes the new sensors that are available to obtain it. In particular, the Microsoft Kinect has made high-resolution real-time depth cheaply available. The main published research on the use of depth imagery for analysing human activity is reviewed. Much of the existing work focuses on body part detection and pose estimation. A growing research area addresses the recognition of human actions. The publicly available datasets that include depth imagery are listed, as are the software libraries that can acquire it from a sensor. This survey concludes by summarising the current state of work on this topic, and pointing out promising future research directions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

[EN] In this work we propose a new variational model for the consistent estimation of motion fields. The aim of this work is to develop appropriate spatio-temporal coherence models. In this sense, we propose two main contributions: a nonlinear flow constancy assumption, similar in spirit to the nonlinear brightness constancy assumption, which conveniently relates flow fields at different time instants; and a nonlinear temporal regularization scheme, which complements the spatial regularization and can cope with piecewise continuous motion fields. These contributions pose a congruent variational model since all the energy terms, except the spatial regularization, are based on nonlinear warpings of the flow field. This model is more general than its spatial counterpart, provides more accurate solutions and preserves the continuity of optical flows in time. In the experimental results, we show that the method attains better results and, in particular, it considerably improves the accuracy in the presence of large displacements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

[EN] The aim of this work is to propose a new method for estimating the backward flow directly from the optical flow. We assume that the optical flow has already been computed and we need to estimate the inverse mapping. This mapping is not bijective due to the presence of occlusions and disocclusions, therefore it is not possible to estimate the inverse function in the whole domain. Values in these regions has to be guessed from the available information. We propose an accurate algorithm to calculate the backward flow uniquely from the optical flow, using a simple relation. Occlusions are filled by selecting the maximum motion and disocclusions are filled with two different strategies: a min-fill strategy, which fills each disoccluded region with the minimum value around the region; and a restricted min-fill approach that selects the minimum value in a close neighborhood. In the experimental results, we show the accuracy of the method and compare the results using these two strategies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

[EN] In this report we study a number of fluid optic flow sequences in the context of the FLUID Specific Targeted Research Project - Contract No 513633 founded by the EEC. The main goal of this report is to analyse the behaviour of classical computer vision optic flow techniques when we deal with fluid sequences. We use the optic flow sequences provided by other partners of the FLUID project.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Perceptual learning is a training induced improvement in performance. Mechanisms underlying the perceptual learning of depth discrimination in dynamic random dot stereograms were examined by assessing stereothresholds as a function of decorrelation. The inflection point of the decorrelation function was defined as the level of decorrelation corresponding to 1.4 times the threshold when decorrelation is 0%. In general, stereothresholds increased with increasing decorrelation. Following training, stereothresholds and standard errors of measurement decreased systematically for all tested decorrelation values. Post training decorrelation functions were reduced by a multiplicative constant (approximately 5), exhibiting changes in stereothresholds without changes in the inflection points. Disparity energy model simulations indicate that a post-training reduction in neuronal noise can sufficiently account for the perceptual learning effects. In two subjects, learning effects were retained over a period of six months, which may have application for training stereo deficient subjects.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Diabetes mellitus is spreading throughout the world and diabetic individuals have been shown to often assess their food intake inaccurately; therefore, it is a matter of urgency to develop automated diet assessment tools. The recent availability of mobile phones with enhanced capabilities, together with the advances in computer vision, have permitted the development of image analysis apps for the automated assessment of meals. GoCARB is a mobile phone-based system designed to support individuals with type 1 diabetes during daily carbohydrate estimation. In a typical scenario, the user places a reference card next to the dish and acquires two images using a mobile phone. A series of computer vision modules detect the plate and automatically segment and recognize the different food items, while their 3D shape is reconstructed. Finally, the carbohydrate content is calculated by combining the volume of each food item with the nutritional information provided by the USDA Nutrient Database for Standard Reference. Objective: The main objective of this study is to assess the accuracy of the GoCARB prototype when used by individuals with type 1 diabetes and to compare it to their own performance in carbohydrate counting. In addition, the user experience and usability of the system is evaluated by questionnaires. Methods: The study was conducted at the Bern University Hospital, “Inselspital” (Bern, Switzerland) and involved 19 adult volunteers with type 1 diabetes, each participating once. Each study day, a total of six meals of broad diversity were taken from the hospital’s restaurant and presented to the participants. The food items were weighed on a standard balance and the true amount of carbohydrate was calculated from the USDA nutrient database. Participants were asked to count the carbohydrate content of each meal independently and then by using GoCARB. At the end of each session, a questionnaire was completed to assess the user’s experience with GoCARB. Results: The mean absolute error was 27.89 (SD 38.20) grams of carbohydrate for the estimation of participants, whereas the corresponding value for the GoCARB system was 12.28 (SD 9.56) grams of carbohydrate, which was a significantly better performance ( P=.001). In 75.4% (86/114) of the meals, the GoCARB automatic segmentation was successful and 85.1% (291/342) of individual food items were successfully recognized. Most participants found GoCARB easy to use. Conclusions: This study indicates that the system is able to estimate, on average, the carbohydrate content of meals with higher accuracy than individuals with type 1 diabetes can. The participants thought the app was useful and easy to use. GoCARB seems to be a well-accepted supportive mHealth tool for the assessment of served-on-a-plate meals.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El principal objetivo de esta tesis es dotar a los vehículos aéreos no tripulados (UAVs, por sus siglas en inglés) de una fuente de información adicional basada en visión. Esta fuente de información proviene de cámaras ubicadas a bordo de los vehículos o en el suelo. Con ella se busca que los UAVs realicen tareas de aterrizaje o inspección guiados por visión, especialmente en aquellas situaciones en las que no haya disponibilidad de estimar la posición del vehículo con base en GPS, cuando las estimaciones de GPS no tengan la suficiente precisión requerida por las tareas a realizar, o cuando restricciones de carga de pago impidan añadir sensores a bordo de los vehículos. Esta tesis trata con tres de las principales áreas de la visión por computador: seguimiento visual y estimación visual de la pose (posición y orientación), que a su vez constituyen la base de la tercera, denominada control servo visual, que en nuestra aplicación se enfoca en el empleo de información visual para controlar los UAVs. Al respecto, esta tesis se ocupa de presentar propuestas novedosas que permitan solucionar problemas relativos al seguimiento de objetos mediante cámaras ubicadas a bordo de los UAVs, se ocupa de la estimación de la pose de los UAVs basada en información visual obtenida por cámaras ubicadas en el suelo o a bordo, y también se ocupa de la aplicación de las técnicas propuestas para solucionar diferentes problemas, como aquellos concernientes al seguimiento visual para tareas de reabastecimiento autónomo en vuelo o al aterrizaje basado en visión, entre otros. Las diversas técnicas de visión por computador presentadas en esta tesis se proponen con el fin de solucionar dificultades que suelen presentarse cuando se realizan tareas basadas en visión con UAVs, como las relativas a la obtención, en tiempo real, de estimaciones robustas, o como problemas generados por vibraciones. Los algoritmos propuestos en esta tesis han sido probados con información de imágenes reales obtenidas realizando pruebas on-line y off-line. Diversos mecanismos de evaluación han sido empleados con el propósito de analizar el desempeño de los algoritmos propuestos, entre los que se incluyen datos simulados, imágenes de vuelos reales, estimaciones precisas de posición empleando el sistema VICON y comparaciones con algoritmos del estado del arte. Los resultados obtenidos indican que los algoritmos de visión por computador propuestos tienen un desempeño que es comparable e incluso mejor al de algoritmos que se encuentran en el estado del arte. Los algoritmos propuestos permiten la obtención de estimaciones robustas en tiempo real, lo cual permite su uso en tareas de control visual. El desempeño de estos algoritmos es apropiado para las exigencias de las distintas aplicaciones examinadas: reabastecimiento autónomo en vuelo, aterrizaje y estimación del estado del UAV. Abstract The main objective of this thesis is to provide Unmanned Aerial Vehicles (UAVs) with an additional vision-based source of information extracted by cameras located either on-board or on the ground, in order to allow UAVs to develop visually guided tasks, such as landing or inspection, especially in situations where GPS information is not available, where GPS-based position estimation is not accurate enough for the task to develop, or where payload restrictions do not allow the incorporation of additional sensors on-board. This thesis covers three of the main computer vision areas: visual tracking and visual pose estimation, which are the bases the third one called visual servoing, which, in this work, focuses on using visual information to control UAVs. In this sense, the thesis focuses on presenting novel solutions for solving the tracking problem of objects when using cameras on-board UAVs, on estimating the pose of the UAVs based on the visual information collected by cameras located either on the ground or on-board, and also focuses on applying these proposed techniques for solving different problems, such as visual tracking for aerial refuelling or vision-based landing, among others. The different computer vision techniques presented in this thesis are proposed to solve some of the frequently problems found when addressing vision-based tasks in UAVs, such as obtaining robust vision-based estimations at real-time frame rates, and problems caused by vibrations, or 3D motion. All the proposed algorithms have been tested with real-image data in on-line and off-line tests. Different evaluation mechanisms have been used to analyze the performance of the proposed algorithms, such as simulated data, images from real-flight tests, publicly available datasets, manually generated ground truth data, accurate position estimations using a VICON system and a robotic cell, and comparison with state of the art algorithms. Results show that the proposed computer vision algorithms obtain performances that are comparable to, or even better than, state of the art algorithms, obtaining robust estimations at real-time frame rates. This proves that the proposed techniques are fast enough for vision-based control tasks. Therefore, the performance of the proposed vision algorithms has shown to be of a standard appropriate to the different explored applications: aerial refuelling and landing, and state estimation. It is noteworthy that they have low computational overheads for vision systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El principal objetivo de este trabajo es proporcionar una solución en tiempo real basada en visión estéreo o monocular precisa y robusta para que un vehículo aéreo no tripulado (UAV) sea autónomo en varios tipos de aplicaciones UAV, especialmente en entornos abarrotados sin señal GPS. Este trabajo principalmente consiste en tres temas de investigación de UAV basados en técnicas de visión por computador: (I) visual tracking, proporciona soluciones efectivas para localizar visualmente objetos de interés estáticos o en movimiento durante el tiempo que dura el vuelo del UAV mediante una aproximación adaptativa online y una estrategia de múltiple resolución, de este modo superamos los problemas generados por las diferentes situaciones desafiantes, tales como cambios significativos de aspecto, iluminación del entorno variante, fondo del tracking embarullado, oclusión parcial o total de objetos, variaciones rápidas de posición y vibraciones mecánicas a bordo. La solución ha sido utilizada en aterrizajes autónomos, inspección de plataformas mar adentro o tracking de aviones en pleno vuelo para su detección y evasión; (II) odometría visual: proporciona una solución eficiente al UAV para estimar la posición con 6 grados de libertad (6D) usando únicamente la entrada de una cámara estéreo a bordo del UAV. Un método Semi-Global Blocking Matching (SGBM) eficiente basado en una estrategia grueso-a-fino ha sido implementada para una rápida y profunda estimación del plano. Además, la solución toma provecho eficazmente de la información 2D y 3D para estimar la posición 6D, resolviendo de esta manera la limitación de un punto de referencia fijo en la cámara estéreo. Una robusta aproximación volumétrica de mapping basada en el framework Octomap ha sido utilizada para reconstruir entornos cerrados y al aire libre bastante abarrotados en 3D con memoria y errores correlacionados espacialmente o temporalmente; (III) visual control, ofrece soluciones de control prácticas para la navegación de un UAV usando Fuzzy Logic Controller (FLC) con la estimación visual. Y el framework de Cross-Entropy Optimization (CEO) ha sido usado para optimizar el factor de escala y la función de pertenencia en FLC. Todas las soluciones basadas en visión en este trabajo han sido probadas en test reales. Y los conjuntos de datos de imágenes reales grabados en estos test o disponibles para la comunidad pública han sido utilizados para evaluar el rendimiento de estas soluciones basadas en visión con ground truth. Además, las soluciones de visión presentadas han sido comparadas con algoritmos de visión del estado del arte. Los test reales y los resultados de evaluación muestran que las soluciones basadas en visión proporcionadas han obtenido rendimientos en tiempo real precisos y robustos, o han alcanzado un mejor rendimiento que aquellos algoritmos del estado del arte. La estimación basada en visión ha ganado un rol muy importante en controlar un UAV típico para alcanzar autonomía en aplicaciones UAV. ABSTRACT The main objective of this dissertation is providing real-time accurate robust monocular or stereo vision-based solution for Unmanned Aerial Vehicle (UAV) to achieve the autonomy in various types of UAV applications, especially in GPS-denied dynamic cluttered environments. This dissertation mainly consists of three UAV research topics based on computer vision technique: (I) visual tracking, it supplys effective solutions to visually locate interesting static or moving object over time during UAV flight with on-line adaptivity approach and multiple-resolution strategy, thereby overcoming the problems generated by the different challenging situations, such as significant appearance change, variant surrounding illumination, cluttered tracking background, partial or full object occlusion, rapid pose variation and onboard mechanical vibration. The solutions have been utilized in autonomous landing, offshore floating platform inspection and midair aircraft tracking for sense-and-avoid; (II) visual odometry: it provides the efficient solution for UAV to estimate the 6 Degree-of-freedom (6D) pose using only the input of stereo camera onboard UAV. An efficient Semi-Global Blocking Matching (SGBM) method based on a coarse-to-fine strategy has been implemented for fast depth map estimation. In addition, the solution effectively takes advantage of both 2D and 3D information to estimate the 6D pose, thereby solving the limitation of a fixed small baseline in the stereo camera. A robust volumetric occupancy mapping approach based on the Octomap framework has been utilized to reconstruct indoor and outdoor large-scale cluttered environments in 3D with less temporally or spatially correlated measurement errors and memory; (III) visual control, it offers practical control solutions to navigate UAV using Fuzzy Logic Controller (FLC) with the visual estimation. And the Cross-Entropy Optimization (CEO) framework has been used to optimize the scaling factor and the membership function in FLC. All the vision-based solutions in this dissertation have been tested in real tests. And the real image datasets recorded from these tests or available from public community have been utilized to evaluate the performance of these vision-based solutions with ground truth. Additionally, the presented vision solutions have compared with the state-of-art visual algorithms. Real tests and evaluation results show that the provided vision-based solutions have obtained real-time accurate robust performances, or gained better performance than those state-of-art visual algorithms. The vision-based estimation has played a critically important role for controlling a typical UAV to achieve autonomy in the UAV application.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work describes a neural network based architecture that represents and estimates object motion in videos. This architecture addresses multiple computer vision tasks such as image segmentation, object representation or characterization, motion analysis and tracking. The use of a neural network architecture allows for the simultaneous estimation of global and local motion and the representation of deformable objects. This architecture also avoids the problem of finding corresponding features while tracking moving objects. Due to the parallel nature of neural networks, the architecture has been implemented on GPUs that allows the system to meet a set of requirements such as: time constraints management, robustness, high processing speed and re-configurability. Experiments are presented that demonstrate the validity of our architecture to solve problems of mobile agents tracking and motion analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Plane model extraction from three-dimensional point clouds is a necessary step in many different applications such as planar object reconstruction, indoor mapping and indoor localization. Different RANdom SAmple Consensus (RANSAC)-based methods have been proposed for this purpose in recent years. In this study, we propose a novel method-based on RANSAC called Multiplane Model Estimation, which can estimate multiple plane models simultaneously from a noisy point cloud using the knowledge extracted from a scene (or an object) in order to reconstruct it accurately. This method comprises two steps: first, it clusters the data into planar faces that preserve some constraints defined by knowledge related to the object (e.g., the angles between faces); and second, the models of the planes are estimated based on these data using a novel multi-constraint RANSAC. We performed experiments in the clustering and RANSAC stages, which showed that the proposed method performed better than state-of-the-art methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La recherche d'informations s'intéresse, entre autres, à répondre à des questions comme: est-ce qu'un document est pertinent à une requête ? Est-ce que deux requêtes ou deux documents sont similaires ? Comment la similarité entre deux requêtes ou documents peut être utilisée pour améliorer l'estimation de la pertinence ? Pour donner réponse à ces questions, il est nécessaire d'associer chaque document et requête à des représentations interprétables par ordinateur. Une fois ces représentations estimées, la similarité peut correspondre, par exemple, à une distance ou une divergence qui opère dans l'espace de représentation. On admet généralement que la qualité d'une représentation a un impact direct sur l'erreur d'estimation par rapport à la vraie pertinence, jugée par un humain. Estimer de bonnes représentations des documents et des requêtes a longtemps été un problème central de la recherche d'informations. Le but de cette thèse est de proposer des nouvelles méthodes pour estimer les représentations des documents et des requêtes, la relation de pertinence entre eux et ainsi modestement avancer l'état de l'art du domaine. Nous présentons quatre articles publiés dans des conférences internationales et un article publié dans un forum d'évaluation. Les deux premiers articles concernent des méthodes qui créent l'espace de représentation selon une connaissance à priori sur les caractéristiques qui sont importantes pour la tâche à accomplir. Ceux-ci nous amènent à présenter un nouveau modèle de recherche d'informations qui diffère des modèles existants sur le plan théorique et de l'efficacité expérimentale. Les deux derniers articles marquent un changement fondamental dans l'approche de construction des représentations. Ils bénéficient notamment de l'intérêt de recherche dont les techniques d'apprentissage profond par réseaux de neurones, ou deep learning, ont fait récemment l'objet. Ces modèles d'apprentissage élicitent automatiquement les caractéristiques importantes pour la tâche demandée à partir d'une quantité importante de données. Nous nous intéressons à la modélisation des relations sémantiques entre documents et requêtes ainsi qu'entre deux ou plusieurs requêtes. Ces derniers articles marquent les premières applications de l'apprentissage de représentations par réseaux de neurones à la recherche d'informations. Les modèles proposés ont aussi produit une performance améliorée sur des collections de test standard. Nos travaux nous mènent à la conclusion générale suivante: la performance en recherche d'informations pourrait drastiquement être améliorée en se basant sur les approches d'apprentissage de représentations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La recherche d'informations s'intéresse, entre autres, à répondre à des questions comme: est-ce qu'un document est pertinent à une requête ? Est-ce que deux requêtes ou deux documents sont similaires ? Comment la similarité entre deux requêtes ou documents peut être utilisée pour améliorer l'estimation de la pertinence ? Pour donner réponse à ces questions, il est nécessaire d'associer chaque document et requête à des représentations interprétables par ordinateur. Une fois ces représentations estimées, la similarité peut correspondre, par exemple, à une distance ou une divergence qui opère dans l'espace de représentation. On admet généralement que la qualité d'une représentation a un impact direct sur l'erreur d'estimation par rapport à la vraie pertinence, jugée par un humain. Estimer de bonnes représentations des documents et des requêtes a longtemps été un problème central de la recherche d'informations. Le but de cette thèse est de proposer des nouvelles méthodes pour estimer les représentations des documents et des requêtes, la relation de pertinence entre eux et ainsi modestement avancer l'état de l'art du domaine. Nous présentons quatre articles publiés dans des conférences internationales et un article publié dans un forum d'évaluation. Les deux premiers articles concernent des méthodes qui créent l'espace de représentation selon une connaissance à priori sur les caractéristiques qui sont importantes pour la tâche à accomplir. Ceux-ci nous amènent à présenter un nouveau modèle de recherche d'informations qui diffère des modèles existants sur le plan théorique et de l'efficacité expérimentale. Les deux derniers articles marquent un changement fondamental dans l'approche de construction des représentations. Ils bénéficient notamment de l'intérêt de recherche dont les techniques d'apprentissage profond par réseaux de neurones, ou deep learning, ont fait récemment l'objet. Ces modèles d'apprentissage élicitent automatiquement les caractéristiques importantes pour la tâche demandée à partir d'une quantité importante de données. Nous nous intéressons à la modélisation des relations sémantiques entre documents et requêtes ainsi qu'entre deux ou plusieurs requêtes. Ces derniers articles marquent les premières applications de l'apprentissage de représentations par réseaux de neurones à la recherche d'informations. Les modèles proposés ont aussi produit une performance améliorée sur des collections de test standard. Nos travaux nous mènent à la conclusion générale suivante: la performance en recherche d'informations pourrait drastiquement être améliorée en se basant sur les approches d'apprentissage de représentations.