878 resultados para Depth Estimation,Deep Learning,Disparity Estimation,Computer Vision,Stereo Vision
Resumo:
In this paper we propose an innovative method for the automatic detection and tracking of road traffic signs using an onboard stereo camera. It involves a combination of monocular and stereo analysis strategies to increase the reliability of the detections such that it can boost the performance of any traffic sign recognition scheme. Firstly, an adaptive color and appearance based detection is applied at single camera level to generate a set of traffic sign hypotheses. In turn, stereo information allows for sparse 3D reconstruction of potential traffic signs through a SURF-based matching strategy. Namely, the plane that best fits the cloud of 3D points traced back from feature matches is estimated using a RANSAC based approach to improve robustness to outliers. Temporal consistency of the 3D information is ensured through a Kalman-based tracking stage. This also allows for the generation of a predicted 3D traffic sign model, which is in turn used to enhance the previously mentioned color-based detector through a feedback loop, thus improving detection accuracy. The proposed solution has been tested with real sequences under several illumination conditions and in both urban areas and highways, achieving very high detection rates in challenging environments, including rapid motion and significant perspective distortion
Resumo:
El presente proyecto trata sobre uno de los campos más problemáticos de la inteligencia artificial, el reconocimiento facial. Algo tan sencillo para las personas como es reconocer una cara conocida se traduce en complejos algoritmos y miles de datos procesados en cuestión de segundos. El proyecto comienza con un estudio del estado del arte de las diversas técnicas de reconocimiento facial, desde las más utilizadas y probadas como el PCA y el LDA, hasta técnicas experimentales que utilizan imágenes térmicas en lugar de las clásicas con luz visible. A continuación, se ha implementado una aplicación en lenguaje C++ que sea capaz de reconocer a personas almacenadas en su base de datos leyendo directamente imágenes desde una webcam. Para realizar la aplicación, se ha utilizado una de las librerías más extendidas en cuanto a procesado de imágenes y visión artificial, OpenCV. Como IDE se ha escogido Visual Studio 2010, que cuenta con una versión gratuita para estudiantes. La técnica escogida para implementar la aplicación es la del PCA ya que es una técnica básica en el reconocimiento facial, y además sirve de base para soluciones mucho más complejas. Se han estudiado los fundamentos matemáticos de la técnica para entender cómo procesa la información y en qué se datos se basa para realizar el reconocimiento. Por último, se ha implementado un algoritmo de testeo para poder conocer la fiabilidad de la aplicación con varias bases de datos de imágenes faciales. De esta forma, se puede comprobar los puntos fuertes y débiles del PCA. ABSTRACT. This project deals with one of the most problematic areas of artificial intelligence, facial recognition. Something so simple for human as to recognize a familiar face becomes into complex algorithms and thousands of data processed in seconds. The project begins with a study of the state of the art of various face recognition techniques, from the most used and tested as PCA and LDA, to experimental techniques that use thermal images instead of the classic visible light images. Next, an application has been implemented in C + + language that is able to recognize people stored in a database reading images directly from a webcam. To make the application, it has used one of the most outstretched libraries in terms of image processing and computer vision, OpenCV. Visual Studio 2010 has been chosen as the IDE, which has a free student version. The technique chosen to implement the software is the PCA because it is a basic technique in face recognition, and also provides a basis for more complex solutions. The mathematical foundations of the technique have been studied to understand how it processes the information and which data are used to do the recognition. Finally, an algorithm for testing has been implemented to know the reliability of the application with multiple databases of facial images. In this way, the strengths and weaknesses of the PCA can be checked.
Resumo:
Esta tesis trata sobre métodos de corrección que compensan la variación de las condiciones de iluminación en aplicaciones de imagen y video a color. Estas variaciones hacen que a menudo fallen aquellos algoritmos de visión artificial que utilizan características de color para describir los objetos. Se formulan tres preguntas de investigación que definen el marco de trabajo de esta tesis. La primera cuestión aborda las similitudes que se dan entre las imágenes de superficies adyacentes en relación a su comportamiento fotométrico. En base al análisis del modelo de formación de imágenes en situaciones dinámicas, esta tesis propone un modelo capaz de predecir las variaciones de color de la región de una determinada imagen a partir de las variaciones de las regiones colindantes. Dicho modelo se denomina Quotient Relational Model of Regions. Este modelo es válido cuando: las fuentes de luz iluminan todas las superficies incluídas en él; estas superficies están próximas entre sí y tienen orientaciones similares; y cuando son en su mayoría lambertianas. Bajo ciertas circunstancias, la respuesta fotométrica de una región se puede relacionar con el resto mediante una combinación lineal. No se ha podido encontrar en la literatura científica ningún trabajo previo que proponga este tipo de modelo relacional. La segunda cuestión va un paso más allá y se pregunta si estas similitudes se pueden utilizar para corregir variaciones fotométricas desconocidas en una región también desconocida, a partir de regiones conocidas adyacentes. Para ello, se propone un método llamado Linear Correction Mapping capaz de dar una respuesta afirmativa a esta cuestión bajo las circunstancias caracterizadas previamente. Para calcular los parámetros del modelo se requiere una etapa de entrenamiento previo. El método, que inicialmente funciona para una sola cámara, se amplía para funcionar en arquitecturas con varias cámaras sin solape entre sus campos visuales. Para ello, tan solo se necesitan varias muestras de imágenes del mismo objeto capturadas por todas las cámaras. Además, este método tiene en cuenta tanto las variaciones de iluminación, como los cambios en los parámetros de exposición de las cámaras. Todos los métodos de corrección de imagen fallan cuando la imagen del objeto que tiene que ser corregido está sobreexpuesta o cuando su relación señal a ruido es muy baja. Así, la tercera cuestión se refiere a si se puede establecer un proceso de control de la adquisición que permita obtener una exposición óptima cuando las condiciones de iluminación no están controladas. De este modo, se propone un método denominado Camera Exposure Control capaz de mantener una exposición adecuada siempre y cuando las variaciones de iluminación puedan recogerse dentro del margen dinámico de la cámara. Los métodos propuestos se evaluaron individualmente. La metodología llevada a cabo en los experimentos consistió en, primero, seleccionar algunos escenarios que cubrieran situaciones representativas donde los métodos fueran válidos teóricamente. El Linear Correction Mapping fue validado en tres aplicaciones de re-identificación de objetos (vehículos, caras y personas) que utilizaban como caracterísiticas la distribución de color de éstos. Por otra parte, el Camera Exposure Control se probó en un parking al aire libre. Además de esto, se definieron varios indicadores que permitieron comparar objetivamente los resultados de los métodos propuestos con otros métodos relevantes de corrección y auto exposición referidos en el estado del arte. Los resultados de la evaluación demostraron que los métodos propuestos mejoran los métodos comparados en la mayoría de las situaciones. Basándose en los resultados obtenidos, se puede decir que las respuestas a las preguntas de investigación planteadas son afirmativas, aunque en circunstancias limitadas. Esto quiere decir que, las hipótesis planteadas respecto a la predicción, la corrección basada en ésta y la auto exposición, son factibles en aquellas situaciones identificadas a lo largo de la tesis pero que, sin embargo, no se puede garantizar que se cumplan de manera general. Por otra parte, se señalan como trabajo de investigación futuro algunas cuestiones nuevas y retos científicos que aparecen a partir del trabajo presentado en esta tesis. ABSTRACT This thesis discusses the correction methods used to compensate the variation of lighting conditions in colour image and video applications. These variations are such that Computer Vision algorithms that use colour features to describe objects mostly fail. Three research questions are formulated that define the framework of the thesis. The first question addresses the similarities of the photometric behaviour between images of dissimilar adjacent surfaces. Based on the analysis of the image formation model in dynamic situations, this thesis proposes a model that predicts the colour variations of the region of an image from the variations of the surrounded regions. This proposed model is called the Quotient Relational Model of Regions. This model is valid when the light sources illuminate all of the surfaces included in the model; these surfaces are placed close each other, have similar orientations, and are primarily Lambertian. Under certain circumstances, a linear combination is established between the photometric responses of the regions. Previous work that proposed such a relational model was not found in the scientific literature. The second question examines whether those similarities could be used to correct the unknown photometric variations in an unknown region from the known adjacent regions. A method is proposed, called Linear Correction Mapping, which is capable of providing an affirmative answer under the circumstances previously characterised. A training stage is required to determine the parameters of the model. The method for single camera scenarios is extended to cover non-overlapping multi-camera architectures. To this extent, only several image samples of the same object acquired by all of the cameras are required. Furthermore, both the light variations and the changes in the camera exposure settings are covered by correction mapping. Every image correction method is unsuccessful when the image of the object to be corrected is overexposed or the signal-to-noise ratio is very low. Thus, the third question refers to the control of the acquisition process to obtain an optimal exposure in uncontrolled light conditions. A Camera Exposure Control method is proposed that is capable of holding a suitable exposure provided that the light variations can be collected within the dynamic range of the camera. Each one of the proposed methods was evaluated individually. The methodology of the experiments consisted of first selecting some scenarios that cover the representative situations for which the methods are theoretically valid. Linear Correction Mapping was validated using three object re-identification applications (vehicles, faces and persons) based on the object colour distributions. Camera Exposure Control was proved in an outdoor parking scenario. In addition, several performance indicators were defined to objectively compare the results with other relevant state of the art correction and auto-exposure methods. The results of the evaluation demonstrated that the proposed methods outperform the compared ones in the most situations. Based on the obtained results, the answers to the above-described research questions are affirmative in limited circumstances, that is, the hypothesis of the forecasting, the correction based on it, and the auto exposure are feasible in the situations identified in the thesis, although they cannot be guaranteed in general. Furthermore, the presented work raises new questions and scientific challenges, which are highlighted as future research work.
Resumo:
Este documento es una guía para el desarrollo de una aplicación para dispositivos móviles en Android. Dicha aplicación combina las técnicas de visión por computador para calibrar la cámara del dispositivo y localizar un elemento en el espacio en base a esos los parámetros calculados en la calibración. El diseño de la aplicación incluye las decisiones sobre la forma en que se reciben los inputs de la aplicación, que patrones se utilizan en la calibración y en la localización y como se muestran los resultados finales al usuario. También incluye un diagrama de flujo de información que representa el tránsito de esta entre los diferentes módulos. La implementación comienza con la configuración de un entorno para desarrollar aplicaciones con parte nativa en Android, después comenta el código de la aplicación paso por paso incluyendo comentarios sobre los archivos adicionales necesarios para la compilación y finalmente detalla los archivos dedicados a la interfaz. Los experimentos incluyen una breve descripción sobre cómo interpretar los resultados seguidos de una serie de imágenes tomadas de la aplicación con diferentes localizaciones del patrón. En la entrega se incluye también un video. En el capítulo de resultados y conclusiones podemos encontrar observaciones sobre el desarrollo de la práctica, opiniones sobre su utilidad, y posibles mejoras.---ABSTRACT---This document is a guide that describes the development of and application for mobile devices in Android OS. The application combines computer vision techniques to calibrate the device camera and locate an element in the real world based on the parameters of the calibration The design of the application includes the decisions over the way that the application receives its input data, the patterns used in the calibration and localization and how the results are shown to the user. It also includes a flow chart that describes how the information travels along the application modules. The development begins with the steps necessary to configure the environment to develop native Android applications, then it explains the code step by step, including commentaries on the additional files necessary to build the application and details the files of the user interface. The experiments chapter explains the way the results are shown in the experiments before showing samples of different pattern localizations. There is also a video attached. In the conclusions chapter we can find observations on the development of the TFG, opinions about its usefulness, and possibilities of improvement in the future.
Resumo:
A novel GPU-based nonparametric moving object detection strategy for computer vision tools requiring real-time processing is proposed. An alternative and efficient Bayesian classifier to combine nonparametric background and foreground models allows increasing correct detections while avoiding false detections. Additionally, an efficient region of interest analysis significantly reduces the computational cost of the detections.
Resumo:
The use of the Information and Communication Technologies (ICT) in Learning Environment allows achieving the maximum interaction between Teachers and Students.The Virtual Learning Environments are computer programs that benefit the learning facilitating the communication between users. Open Source software allow to create the own online modular learning environment with a fast placed in service. In the present paper the use of a Learning Management Systems (LMS) as continuous education tool is proposed.
Resumo:
Validating modern oceanographic theories using models produced through stereo computer vision principles has recently emerged. Space-time (4-D) models of the ocean surface may be generated by stacking a series of 3-D reconstructions independently generated for each time instant or, in a more robust manner, by simultaneously processing several snapshots coherently in a true ?4-D reconstruction.? However, the accuracy of these computer-vision-generated models is subject to the estimations of camera parameters, which may be corrupted under the influence of natural factors such as wind and vibrations. Therefore, removing the unpredictable errors of the camera parameters is necessary for an accurate reconstruction. In this paper, we propose a novel algorithm that can jointly perform a 4-D reconstruction as well as correct the camera parameter errors introduced by external factors. The technique is founded upon variational optimization methods to benefit from their numerous advantages: continuity of the estimated surface in space and time, robustness, and accuracy. The performance of the proposed algorithm is tested using synthetic data produced through computer graphics techniques, based on which the errors of the camera parameters arising from natural factors can be simulated.
Resumo:
In this paper, we consider the problem of autonomous navigation of multirotor platforms in GPS-denied environments. The focus of this work is on safe navigation based on unperfect odometry measurements, such as on-board optical flow measurements. The multirotor platform is modeled as a flying object with specific kinematic constraints that must be taken into account in order to obtain successful results. A navigation controller is proposed featuring a set of configurable parameters that allow, for instance, to have a configuration setup for fast trajectory following, and another to soften the control laws and make the vehicle navigation more precise and slow whenever necessary. The proposed controller has been successfully implemented in two different multirotor platforms with similar sensoring capabilities showing the openness and tolerance of the approach. This research is focused around the Computer Vision Group's objective of applying multirotor vehicles to civilian service applications. The presented work was implemented to compete in the International Micro Air Vehicle Conference and Flight Competition IMAV 2012, gaining two awards: the Special Award on "Best Automatic Performance - IMAV 2012" and the second overall prize in the participating category "Indoor Flight Dynamics - Rotary Wing MAV". Most of the code related to the present work is available as two open-source projects hosted in GitHub.
Resumo:
In this paper we tackle the problem of landing a helicopter autonomously on a ship deck, using as the main sensor, an on-board colour camera. To create a test-bed, we first adequately simulate the movement of a ship landing platform on the Sea, for different Sea States, for different ships, randomly and realistically enough. We use a commercial parallel robot to get this movement. Once we had this, we developed an accurate and robust computer vision system to measure the pose of the helipad with respect to the on-board camera. To deal with the noise and the possible fails of the computer vision, a state estimator was created. With all of this, we are now able to develop and test a controller that closes the loop and finish the autonomous landing task.
Resumo:
Many computer vision and human-computer interaction applications developed in recent years need evaluating complex and continuous mathematical functions as an essential step toward proper operation. However, rigorous evaluation of this kind of functions often implies a very high computational cost, unacceptable in real-time applications. To alleviate this problem, functions are commonly approximated by simpler piecewise-polynomial representations. Following this idea, we propose a novel, efficient, and practical technique to evaluate complex and continuous functions using a nearly optimal design of two types of piecewise linear approximations in the case of a large budget of evaluation subintervals. To this end, we develop a thorough error analysis that yields asymptotically tight bounds to accurately quantify the approximation performance of both representations. It provides an improvement upon previous error estimates and allows the user to control the trade-off between the approximation error and the number of evaluation subintervals. To guarantee real-time operation, the method is suitable for, but not limited to, an efficient implementation in modern Graphics Processing Units (GPUs), where it outperforms previous alternative approaches by exploiting the fixed-function interpolation routines present in their texture units. The proposed technique is a perfect match for any application requiring the evaluation of continuous functions, we have measured in detail its quality and efficiency on several functions, and, in particular, the Gaussian function because it is extensively used in many areas of computer vision and cybernetics, and it is expensive to evaluate.
Resumo:
En este proyecto se realiza el diseño e implementación de un sistema que detecta anomalías en las entradas de entornos controlados. Para ello, se hace uso de las últimas técnicas en visión por computador y se avisa visual y auditivamente, mediante un sistema hardware que recibe señales del ordenador al que está conectado. Se marca y fotografía, a una o varias personas, que cometen una infracción en las entradas de un establecimiento, vigilado con sistemas de vídeo. Las imágenes se almacenan en las carpetas correspondientes. El sistema diseñado es colaborativo, por lo tanto, las cámaras que intervienen, se comunican entre ellas a través de estructuras de datos con el objetivo de intercambiar información. Además, se utiliza conexión inalámbrica desde un dispositivo móvil para obtener una visión global del entorno desde cualquier lugar del mundo. La aplicación se desarrolla en el entorno MATLAB, que permite un tratamiento de la señal de imagen apropiado para el presente proyecto. Asimismo, se proporciona al usuario una interfaz gráfica con la que interactuar de manera sencilla, evitando así, el cambio de parámetros en la estructura interna del programa cuando se quiere variar el entorno o el tipo de adquisición de datos. El lenguaje que se escoge facilita la ejecución en distintos sistemas operativos, incluyendo Windows o iOS y, de esta manera, se proporciona flexibilidad. ABSTRACT. This project studies the design and implementation of a system that detects any anomalies on the entrances to controlled environments. To this end, it is necessary the use of last techniques in computer vision in order to notify visually and aurally, by a hardware system which receives signs from the computer it is connected to. One or more people that commit an infringement while entering into a secured environment, with video systems, are marked and photographed and those images are stored in their belonging file folder. This is a collaborative design system, therefore, every involved camera communicates among themselves through data structures with the purpose of exchanging information. Furthermore, to obtain a global environment vision from any place in the world it uses a mobile wireless connection. The application is developed in MATLAB environment because it allows an appropriate treatment of the image signal for this project. In addition, the user is given a graphical interface to easily interact, avoiding with this, changing any parameters on the program’s intern structure, when it requires modifying the environment or the data type acquisition. The chosen language eases its execution in different operating systems, including Windows or iOS, providing flexibility.
Resumo:
Postprint
Resumo:
Comunicación presentada en el IX Workshop de Agentes Físicos (WAF'2008), Vigo, 11-12 septiembre 2008.
Resumo:
Comunicación presentada en el X Workshop of Physical Agents, Cáceres, 10-11 septiembre 2009.
Resumo:
In this paper, we propose two Bayesian methods for detecting and grouping junctions. Our junction detection method evolves from the Kona approach, and it is based on a competitive greedy procedure inspired in the region competition method. Then, junction grouping is accomplished by finding connecting paths between pairs of junctions. Path searching is performed by applying a Bayesian A* algorithm that has been recently proposed. Both methods are efficient and robust, and they are tested with synthetic and real images.