35 resultados para Computer Vision Android
Resumo:
In this paper we propose an innovative approach to tackle the problem of traffic sign detection using a computer vision algorithm and taking into account real-time operation constraints, trying to establish intelligent strategies to simplify as much as possible the algorithm complexity and to speed up the process. Firstly, a set of candidates is generated according to a color segmentation stage, followed by a region analysis strategy, where spatial characteristic of previously detected objects are taken into account. Finally, temporal coherence is introduced by means of a tracking scheme, performed using a Kalman filter for each potential candidate. Taking into consideration time constraints, efficiency is achieved two-fold: on the one side, a multi-resolution strategy is adopted for segmentation, where global operation will be applied only to low-resolution images, increasing the resolution to the maximum only when a potential road sign is being tracked. On the other side, we take advantage of the expected spacing between traffic signs. Namely, the tracking of objects of interest allows to generate inhibition areas, which are those ones where no new traffic signs are expected to appear due to the existence of a TS in the neighborhood. The proposed solution has been tested with real sequences in both urban areas and highways, and proved to achieve higher computational efficiency, especially as a result of the multi-resolution approach.
Resumo:
In this paper we propose an innovative method for the automatic detection and tracking of road traffic signs using an onboard stereo camera. It involves a combination of monocular and stereo analysis strategies to increase the reliability of the detections such that it can boost the performance of any traffic sign recognition scheme. Firstly, an adaptive color and appearance based detection is applied at single camera level to generate a set of traffic sign hypotheses. In turn, stereo information allows for sparse 3D reconstruction of potential traffic signs through a SURF-based matching strategy. Namely, the plane that best fits the cloud of 3D points traced back from feature matches is estimated using a RANSAC based approach to improve robustness to outliers. Temporal consistency of the 3D information is ensured through a Kalman-based tracking stage. This also allows for the generation of a predicted 3D traffic sign model, which is in turn used to enhance the previously mentioned color-based detector through a feedback loop, thus improving detection accuracy. The proposed solution has been tested with real sequences under several illumination conditions and in both urban areas and highways, achieving very high detection rates in challenging environments, including rapid motion and significant perspective distortion
Resumo:
We propose a new method to automatically refine a facial disparity map obtained with standard cameras and under conventional illumination conditions by using a smart combination of traditional computer vision and 3D graphics techniques. Our system inputs two stereo images acquired with standard (calibrated) cameras and uses dense disparity estimation strategies to obtain a coarse initial disparity map, and SIFT to detect and match several feature points in the subjects face. We then use these points as anchors to modify the disparity in the facial area by building a Delaunay triangulation of their convex hull and interpolating their disparity values inside each triangle. We thus obtain a refined disparity map providing a much more accurate representation of the the subjects facial features. This refined facial disparity map may be easily transformed, through the camera calibration parameters, into a depth map to be used, also automatically, to improve the facial mesh of a 3D avatar to match the subjects real human features.
Resumo:
El presente proyecto trata sobre uno de los campos más problemáticos de la inteligencia artificial, el reconocimiento facial. Algo tan sencillo para las personas como es reconocer una cara conocida se traduce en complejos algoritmos y miles de datos procesados en cuestión de segundos. El proyecto comienza con un estudio del estado del arte de las diversas técnicas de reconocimiento facial, desde las más utilizadas y probadas como el PCA y el LDA, hasta técnicas experimentales que utilizan imágenes térmicas en lugar de las clásicas con luz visible. A continuación, se ha implementado una aplicación en lenguaje C++ que sea capaz de reconocer a personas almacenadas en su base de datos leyendo directamente imágenes desde una webcam. Para realizar la aplicación, se ha utilizado una de las librerías más extendidas en cuanto a procesado de imágenes y visión artificial, OpenCV. Como IDE se ha escogido Visual Studio 2010, que cuenta con una versión gratuita para estudiantes. La técnica escogida para implementar la aplicación es la del PCA ya que es una técnica básica en el reconocimiento facial, y además sirve de base para soluciones mucho más complejas. Se han estudiado los fundamentos matemáticos de la técnica para entender cómo procesa la información y en qué se datos se basa para realizar el reconocimiento. Por último, se ha implementado un algoritmo de testeo para poder conocer la fiabilidad de la aplicación con varias bases de datos de imágenes faciales. De esta forma, se puede comprobar los puntos fuertes y débiles del PCA. ABSTRACT. This project deals with one of the most problematic areas of artificial intelligence, facial recognition. Something so simple for human as to recognize a familiar face becomes into complex algorithms and thousands of data processed in seconds. The project begins with a study of the state of the art of various face recognition techniques, from the most used and tested as PCA and LDA, to experimental techniques that use thermal images instead of the classic visible light images. Next, an application has been implemented in C + + language that is able to recognize people stored in a database reading images directly from a webcam. To make the application, it has used one of the most outstretched libraries in terms of image processing and computer vision, OpenCV. Visual Studio 2010 has been chosen as the IDE, which has a free student version. The technique chosen to implement the software is the PCA because it is a basic technique in face recognition, and also provides a basis for more complex solutions. The mathematical foundations of the technique have been studied to understand how it processes the information and which data are used to do the recognition. Finally, an algorithm for testing has been implemented to know the reliability of the application with multiple databases of facial images. In this way, the strengths and weaknesses of the PCA can be checked.
Resumo:
Esta tesis trata sobre métodos de corrección que compensan la variación de las condiciones de iluminación en aplicaciones de imagen y video a color. Estas variaciones hacen que a menudo fallen aquellos algoritmos de visión artificial que utilizan características de color para describir los objetos. Se formulan tres preguntas de investigación que definen el marco de trabajo de esta tesis. La primera cuestión aborda las similitudes que se dan entre las imágenes de superficies adyacentes en relación a su comportamiento fotométrico. En base al análisis del modelo de formación de imágenes en situaciones dinámicas, esta tesis propone un modelo capaz de predecir las variaciones de color de la región de una determinada imagen a partir de las variaciones de las regiones colindantes. Dicho modelo se denomina Quotient Relational Model of Regions. Este modelo es válido cuando: las fuentes de luz iluminan todas las superficies incluídas en él; estas superficies están próximas entre sí y tienen orientaciones similares; y cuando son en su mayoría lambertianas. Bajo ciertas circunstancias, la respuesta fotométrica de una región se puede relacionar con el resto mediante una combinación lineal. No se ha podido encontrar en la literatura científica ningún trabajo previo que proponga este tipo de modelo relacional. La segunda cuestión va un paso más allá y se pregunta si estas similitudes se pueden utilizar para corregir variaciones fotométricas desconocidas en una región también desconocida, a partir de regiones conocidas adyacentes. Para ello, se propone un método llamado Linear Correction Mapping capaz de dar una respuesta afirmativa a esta cuestión bajo las circunstancias caracterizadas previamente. Para calcular los parámetros del modelo se requiere una etapa de entrenamiento previo. El método, que inicialmente funciona para una sola cámara, se amplía para funcionar en arquitecturas con varias cámaras sin solape entre sus campos visuales. Para ello, tan solo se necesitan varias muestras de imágenes del mismo objeto capturadas por todas las cámaras. Además, este método tiene en cuenta tanto las variaciones de iluminación, como los cambios en los parámetros de exposición de las cámaras. Todos los métodos de corrección de imagen fallan cuando la imagen del objeto que tiene que ser corregido está sobreexpuesta o cuando su relación señal a ruido es muy baja. Así, la tercera cuestión se refiere a si se puede establecer un proceso de control de la adquisición que permita obtener una exposición óptima cuando las condiciones de iluminación no están controladas. De este modo, se propone un método denominado Camera Exposure Control capaz de mantener una exposición adecuada siempre y cuando las variaciones de iluminación puedan recogerse dentro del margen dinámico de la cámara. Los métodos propuestos se evaluaron individualmente. La metodología llevada a cabo en los experimentos consistió en, primero, seleccionar algunos escenarios que cubrieran situaciones representativas donde los métodos fueran válidos teóricamente. El Linear Correction Mapping fue validado en tres aplicaciones de re-identificación de objetos (vehículos, caras y personas) que utilizaban como caracterísiticas la distribución de color de éstos. Por otra parte, el Camera Exposure Control se probó en un parking al aire libre. Además de esto, se definieron varios indicadores que permitieron comparar objetivamente los resultados de los métodos propuestos con otros métodos relevantes de corrección y auto exposición referidos en el estado del arte. Los resultados de la evaluación demostraron que los métodos propuestos mejoran los métodos comparados en la mayoría de las situaciones. Basándose en los resultados obtenidos, se puede decir que las respuestas a las preguntas de investigación planteadas son afirmativas, aunque en circunstancias limitadas. Esto quiere decir que, las hipótesis planteadas respecto a la predicción, la corrección basada en ésta y la auto exposición, son factibles en aquellas situaciones identificadas a lo largo de la tesis pero que, sin embargo, no se puede garantizar que se cumplan de manera general. Por otra parte, se señalan como trabajo de investigación futuro algunas cuestiones nuevas y retos científicos que aparecen a partir del trabajo presentado en esta tesis. ABSTRACT This thesis discusses the correction methods used to compensate the variation of lighting conditions in colour image and video applications. These variations are such that Computer Vision algorithms that use colour features to describe objects mostly fail. Three research questions are formulated that define the framework of the thesis. The first question addresses the similarities of the photometric behaviour between images of dissimilar adjacent surfaces. Based on the analysis of the image formation model in dynamic situations, this thesis proposes a model that predicts the colour variations of the region of an image from the variations of the surrounded regions. This proposed model is called the Quotient Relational Model of Regions. This model is valid when the light sources illuminate all of the surfaces included in the model; these surfaces are placed close each other, have similar orientations, and are primarily Lambertian. Under certain circumstances, a linear combination is established between the photometric responses of the regions. Previous work that proposed such a relational model was not found in the scientific literature. The second question examines whether those similarities could be used to correct the unknown photometric variations in an unknown region from the known adjacent regions. A method is proposed, called Linear Correction Mapping, which is capable of providing an affirmative answer under the circumstances previously characterised. A training stage is required to determine the parameters of the model. The method for single camera scenarios is extended to cover non-overlapping multi-camera architectures. To this extent, only several image samples of the same object acquired by all of the cameras are required. Furthermore, both the light variations and the changes in the camera exposure settings are covered by correction mapping. Every image correction method is unsuccessful when the image of the object to be corrected is overexposed or the signal-to-noise ratio is very low. Thus, the third question refers to the control of the acquisition process to obtain an optimal exposure in uncontrolled light conditions. A Camera Exposure Control method is proposed that is capable of holding a suitable exposure provided that the light variations can be collected within the dynamic range of the camera. Each one of the proposed methods was evaluated individually. The methodology of the experiments consisted of first selecting some scenarios that cover the representative situations for which the methods are theoretically valid. Linear Correction Mapping was validated using three object re-identification applications (vehicles, faces and persons) based on the object colour distributions. Camera Exposure Control was proved in an outdoor parking scenario. In addition, several performance indicators were defined to objectively compare the results with other relevant state of the art correction and auto-exposure methods. The results of the evaluation demonstrated that the proposed methods outperform the compared ones in the most situations. Based on the obtained results, the answers to the above-described research questions are affirmative in limited circumstances, that is, the hypothesis of the forecasting, the correction based on it, and the auto exposure are feasible in the situations identified in the thesis, although they cannot be guaranteed in general. Furthermore, the presented work raises new questions and scientific challenges, which are highlighted as future research work.
Resumo:
A novel GPU-based nonparametric moving object detection strategy for computer vision tools requiring real-time processing is proposed. An alternative and efficient Bayesian classifier to combine nonparametric background and foreground models allows increasing correct detections while avoiding false detections. Additionally, an efficient region of interest analysis significantly reduces the computational cost of the detections.
Resumo:
Due to ever increasing transportation of people and goods, automatic traffic surveillance is becoming a key issue for both providing safety to road users and improving traffic control in an efficient way. In this paper, we propose a new system that, exploiting the capabilities that both computer vision and machine learning offer, is able to detect and track different types of real incidents on a highway. Specifically, it is able to accurately detect not only stopped vehicles, but also drivers and passengers leaving the stopped vehicle, and other pedestrians present in the roadway. Additionally, a theoretical approach for detecting vehicles which may leave the road in an unexpected way is also presented. The system works in real-time and it has been optimized for working outdoor, being thus appropriate for its deployment in a real-world environment like a highway. First experimental results on a dataset created with videos provided by two Spanish highway operators demonstrate the effectiveness of the proposed system and its robustness against noise and low-quality videos.
Resumo:
In the last decade, multi-sensor data fusion has become a broadly demanded discipline to achieve advanced solutions that can be applied in many real world situations, either civil or military. In Defence,accurate detection of all target objects is fundamental to maintaining situational awareness, to locating threats in the battlefield and to identifying and protecting strategically own forces. Civil applications, such as traffic monitoring, have similar requirements in terms of object detection and reliable identification of incidents in order to ensure safety of road users. Thanks to the appropriate data fusion technique, we can give these systems the power to exploit automatically all relevant information from multiple sources to face for instance mission needs or assess daily supervision operations. This paper focuses on its application to active vehicle monitoring in a particular area of high density traffic, and how it is redirecting the research activities being carried out in the computer vision, signal processing and machine learning fields for improving the effectiveness of detection and tracking in ground surveillance scenarios in general. Specifically, our system proposes fusion of data at a feature level which is extracted from a video camera and a laser scanner. In addition, a stochastic-based tracking which introduces some particle filters into the model to deal with uncertainty due to occlusions and improve the previous detection output is presented in this paper. It has been shown that this computer vision tracker contributes to detect objects even under poor visual information. Finally, in the same way that humans are able to analyze both temporal and spatial relations among items in the scene to associate them a meaning, once the targets objects have been correctly detected and tracked, it is desired that machines can provide a trustworthy description of what is happening in the scene under surveillance. Accomplishing so ambitious task requires a machine learning-based hierarchic architecture able to extract and analyse behaviours at different abstraction levels. A real experimental testbed has been implemented for the evaluation of the proposed modular system. Such scenario is a closed circuit where real traffic situations can be simulated. First results have shown the strength of the proposed system.
Resumo:
In the recent years, the computer vision community has shown great interest on depth-based applications thanks to the performance and flexibility of the new generation of RGB-D imagery. In this paper, we present an efficient background subtraction algorithm based on the fusion of multiple region-based classifiers that processes depth and color data provided by RGB-D cameras. Foreground objects are detected by combining a region-based foreground prediction (based on depth data) with different background models (based on a Mixture of Gaussian algorithm) providing color and depth descriptions of the scene at pixel and region level. The information given by these modules is fused in a mixture of experts fashion to improve the foreground detection accuracy. The main contributions of the paper are the region-based models of both background and foreground, built from the depth and color data. The obtained results using different database sequences demonstrate that the proposed approach leads to a higher detection accuracy with respect to existing state-of-the-art techniques.
Resumo:
Low cost RGB-D cameras such as the Microsoft’s Kinect or the Asus’s Xtion Pro are completely changing the computer vision world, as they are being successfully used in several applications and research areas. Depth data are particularly attractive and suitable for applications based on moving objects detection through foreground/background segmentation approaches; the RGB-D applications proposed in literature employ, in general, state of the art foreground/background segmentation techniques based on the depth information without taking into account the color information. The novel approach that we propose is based on a combination of classifiers that allows improving background subtraction accuracy with respect to state of the art algorithms by jointly considering color and depth data. In particular, the combination of classifiers is based on a weighted average that allows to adaptively modifying the support of each classifier in the ensemble by considering foreground detections in the previous frames and the depth and color edges. In this way, it is possible to reduce false detections due to critical issues that can not be tackled by the individual classifiers such as: shadows and illumination changes, color and depth camouflage, moved background objects and noisy depth measurements. Moreover, we propose, for the best of the author’s knowledge, the first publicly available RGB-D benchmark dataset with hand-labeled ground truth of several challenging scenarios to test background/foreground segmentation algorithms.
Resumo:
Validating modern oceanographic theories using models produced through stereo computer vision principles has recently emerged. Space-time (4-D) models of the ocean surface may be generated by stacking a series of 3-D reconstructions independently generated for each time instant or, in a more robust manner, by simultaneously processing several snapshots coherently in a true ?4-D reconstruction.? However, the accuracy of these computer-vision-generated models is subject to the estimations of camera parameters, which may be corrupted under the influence of natural factors such as wind and vibrations. Therefore, removing the unpredictable errors of the camera parameters is necessary for an accurate reconstruction. In this paper, we propose a novel algorithm that can jointly perform a 4-D reconstruction as well as correct the camera parameter errors introduced by external factors. The technique is founded upon variational optimization methods to benefit from their numerous advantages: continuity of the estimated surface in space and time, robustness, and accuracy. The performance of the proposed algorithm is tested using synthetic data produced through computer graphics techniques, based on which the errors of the camera parameters arising from natural factors can be simulated.
Resumo:
In this paper, we consider the problem of autonomous navigation of multirotor platforms in GPS-denied environments. The focus of this work is on safe navigation based on unperfect odometry measurements, such as on-board optical flow measurements. The multirotor platform is modeled as a flying object with specific kinematic constraints that must be taken into account in order to obtain successful results. A navigation controller is proposed featuring a set of configurable parameters that allow, for instance, to have a configuration setup for fast trajectory following, and another to soften the control laws and make the vehicle navigation more precise and slow whenever necessary. The proposed controller has been successfully implemented in two different multirotor platforms with similar sensoring capabilities showing the openness and tolerance of the approach. This research is focused around the Computer Vision Group's objective of applying multirotor vehicles to civilian service applications. The presented work was implemented to compete in the International Micro Air Vehicle Conference and Flight Competition IMAV 2012, gaining two awards: the Special Award on "Best Automatic Performance - IMAV 2012" and the second overall prize in the participating category "Indoor Flight Dynamics - Rotary Wing MAV". Most of the code related to the present work is available as two open-source projects hosted in GitHub.
Resumo:
In this paper we tackle the problem of landing a helicopter autonomously on a ship deck, using as the main sensor, an on-board colour camera. To create a test-bed, we first adequately simulate the movement of a ship landing platform on the Sea, for different Sea States, for different ships, randomly and realistically enough. We use a commercial parallel robot to get this movement. Once we had this, we developed an accurate and robust computer vision system to measure the pose of the helipad with respect to the on-board camera. To deal with the noise and the possible fails of the computer vision, a state estimator was created. With all of this, we are now able to develop and test a controller that closes the loop and finish the autonomous landing task.
Resumo:
La familia de algoritmos de Boosting son un tipo de técnicas de clasificación y regresión que han demostrado ser muy eficaces en problemas de Visión Computacional. Tal es el caso de los problemas de detección, de seguimiento o bien de reconocimiento de caras, personas, objetos deformables y acciones. El primer y más popular algoritmo de Boosting, AdaBoost, fue concebido para problemas binarios. Desde entonces, muchas han sido las propuestas que han aparecido con objeto de trasladarlo a otros dominios más generales: multiclase, multilabel, con costes, etc. Nuestro interés se centra en extender AdaBoost al terreno de la clasificación multiclase, considerándolo como un primer paso para posteriores ampliaciones. En la presente tesis proponemos dos algoritmos de Boosting para problemas multiclase basados en nuevas derivaciones del concepto margen. El primero de ellos, PIBoost, está concebido para abordar el problema descomponiéndolo en subproblemas binarios. Por un lado, usamos una codificación vectorial para representar etiquetas y, por otro, utilizamos la función de pérdida exponencial multiclase para evaluar las respuestas. Esta codificación produce un conjunto de valores margen que conllevan un rango de penalizaciones en caso de fallo y recompensas en caso de acierto. La optimización iterativa del modelo genera un proceso de Boosting asimétrico cuyos costes dependen del número de etiquetas separadas por cada clasificador débil. De este modo nuestro algoritmo de Boosting tiene en cuenta el desbalanceo debido a las clases a la hora de construir el clasificador. El resultado es un método bien fundamentado que extiende de manera canónica al AdaBoost original. El segundo algoritmo propuesto, BAdaCost, está concebido para problemas multiclase dotados de una matriz de costes. Motivados por los escasos trabajos dedicados a generalizar AdaBoost al terreno multiclase con costes, hemos propuesto un nuevo concepto de margen que, a su vez, permite derivar una función de pérdida adecuada para evaluar costes. Consideramos nuestro algoritmo como la extensión más canónica de AdaBoost para este tipo de problemas, ya que generaliza a los algoritmos SAMME, Cost-Sensitive AdaBoost y PIBoost. Por otro lado, sugerimos un simple procedimiento para calcular matrices de coste adecuadas para mejorar el rendimiento de Boosting a la hora de abordar problemas estándar y problemas con datos desbalanceados. Una serie de experimentos nos sirven para demostrar la efectividad de ambos métodos frente a otros conocidos algoritmos de Boosting multiclase en sus respectivas áreas. En dichos experimentos se usan bases de datos de referencia en el área de Machine Learning, en primer lugar para minimizar errores y en segundo lugar para minimizar costes. Además, hemos podido aplicar BAdaCost con éxito a un proceso de segmentación, un caso particular de problema con datos desbalanceados. Concluimos justificando el horizonte de futuro que encierra el marco de trabajo que presentamos, tanto por su aplicabilidad como por su flexibilidad teórica. Abstract The family of Boosting algorithms represents a type of classification and regression approach that has shown to be very effective in Computer Vision problems. Such is the case of detection, tracking and recognition of faces, people, deformable objects and actions. The first and most popular algorithm, AdaBoost, was introduced in the context of binary classification. Since then, many works have been proposed to extend it to the more general multi-class, multi-label, costsensitive, etc... domains. Our interest is centered in extending AdaBoost to two problems in the multi-class field, considering it a first step for upcoming generalizations. In this dissertation we propose two Boosting algorithms for multi-class classification based on new generalizations of the concept of margin. The first of them, PIBoost, is conceived to tackle the multi-class problem by solving many binary sub-problems. We use a vectorial codification to represent class labels and a multi-class exponential loss function to evaluate classifier responses. This representation produces a set of margin values that provide a range of penalties for failures and rewards for successes. The stagewise optimization of this model introduces an asymmetric Boosting procedure whose costs depend on the number of classes separated by each weak-learner. In this way the Boosting procedure takes into account class imbalances when building the ensemble. The resulting algorithm is a well grounded method that canonically extends the original AdaBoost. The second algorithm proposed, BAdaCost, is conceived for multi-class problems endowed with a cost matrix. Motivated by the few cost-sensitive extensions of AdaBoost to the multi-class field, we propose a new margin that, in turn, yields a new loss function appropriate for evaluating costs. Since BAdaCost generalizes SAMME, Cost-Sensitive AdaBoost and PIBoost algorithms, we consider our algorithm as a canonical extension of AdaBoost to this kind of problems. We additionally suggest a simple procedure to compute cost matrices that improve the performance of Boosting in standard and unbalanced problems. A set of experiments is carried out to demonstrate the effectiveness of both methods against other relevant Boosting algorithms in their respective areas. In the experiments we resort to benchmark data sets used in the Machine Learning community, firstly for minimizing classification errors and secondly for minimizing costs. In addition, we successfully applied BAdaCost to a segmentation task, a particular problem in presence of imbalanced data. We conclude the thesis justifying the horizon of future improvements encompassed in our framework, due to its applicability and theoretical flexibility.
Resumo:
Many computer vision and human-computer interaction applications developed in recent years need evaluating complex and continuous mathematical functions as an essential step toward proper operation. However, rigorous evaluation of this kind of functions often implies a very high computational cost, unacceptable in real-time applications. To alleviate this problem, functions are commonly approximated by simpler piecewise-polynomial representations. Following this idea, we propose a novel, efficient, and practical technique to evaluate complex and continuous functions using a nearly optimal design of two types of piecewise linear approximations in the case of a large budget of evaluation subintervals. To this end, we develop a thorough error analysis that yields asymptotically tight bounds to accurately quantify the approximation performance of both representations. It provides an improvement upon previous error estimates and allows the user to control the trade-off between the approximation error and the number of evaluation subintervals. To guarantee real-time operation, the method is suitable for, but not limited to, an efficient implementation in modern Graphics Processing Units (GPUs), where it outperforms previous alternative approaches by exploiting the fixed-function interpolation routines present in their texture units. The proposed technique is a perfect match for any application requiring the evaluation of continuous functions, we have measured in detail its quality and efficiency on several functions, and, in particular, the Gaussian function because it is extensively used in many areas of computer vision and cybernetics, and it is expensive to evaluate.