892 resultados para SIFT,Computer Vision,Python,Object Recognition,Feature Detection,Descriptor Computation
Resumo:
Diet-related chronic diseases severely affect personal and global health. However, managing or treating these diseases currently requires long training and high personal involvement to succeed. Computer vision systems could assist with the assessment of diet by detecting and recognizing different foods and their portions in images. We propose novel methods for detecting a dish in an image and segmenting its contents with and without user interaction. All methods were evaluated on a database of over 1600 manually annotated images. The dish detection scored an average of 99% accuracy with a .2s/image run time, while the automatic and semi-automatic dish segmentation methods reached average accuracies of 88% and 91% respectively, with an average run time of .5s/image, outperforming competing solutions.
Resumo:
Diet management is a key factor for the prevention and treatment of diet-related chronic diseases. Computer vision systems aim to provide automated food intake assessment using meal images. We propose a method for the recognition of already segmented food items in meal images. The method uses a 6-layer deep convolutional neural network to classify food image patches. For each food item, overlapping patches are extracted and classified and the class with the majority of votes is assigned to it. Experiments on a manually annotated dataset with 573 food items justified the choice of the involved components and proved the effectiveness of the proposed system yielding an overall accuracy of 84.9%.
Resumo:
Adult monkeys (Macaca mulatta) with lesions of the hippocampal formation, perirhinal cortex, areas TH/TF, as well as controls were tested on tasks of object, spatial and contextual recognition memory. ^ Using a visual paired-comparison (VPC) task, all experimental groups showed a lack of object recognition relative to controls, although this impairment emerged at 10 sec with perirhinal lesions, 30 sec with areas TH/TF lesions and 60 sec with hippocampal lesions. In contrast, only perirhinal lesions impaired performance on delayed nonmatching-to-sample (DNMS), another task of object recognition memory. All groups were tested on DNMS with distraction (dDNMS) to examine whether the use of active cognitive strategies during the delay period could enable good performance on DNMS in spite of impaired recognition memory (revealed by the VPC task). Distractors affected performance of animals with perirhinal lesions at the 10-sec delay (the only delay in which their DNMS performance was above chance). They did not affect performance of animals with areas TH/TF lesions. Hippocampectomized animals were impaired at the 600-sec delay (the only delay at which prevention of active strategies would likely affect their behavior). ^ While lesions of areas TH/TF impaired spatial location memory and object-in-place memory, hippocampal lesions impaired only object-in-place memory. The pattern of results for perirhinal cortex lesions on the different task conditions indicated that this cortical area is not critical for spatial memory. ^ Finally, all three lesions impaired contextual recognition memory processes. The pattern of impairment appeared to result from the formation of only a global representation of the object and background, and suggests that all three areas are recruited for associating information across sources. ^ These results support the view that (1) the perirhinal cortex maintains storage of information about object and the context in which it is learned for a brief period of time, (2) areas TH/TF maintain information about spatial location and form associations between objects and their spatial relationship (a process that likely requires additional time) and (3) the hippocampal formation mediates associations between objects, their spatial relationship and the general context in which these associations are formed (an integrative function that requires additional time). ^
Resumo:
The important technological advances experienced along the last years have resulted in an important demand for new and efficient computer vision applications. On the one hand, the increasing use of video editing software has given rise to a necessity for faster and more efficient editing tools that, in a first step, perform a temporal segmentation in shots. On the other hand, the number of electronic devices with integrated cameras has grown enormously. These devices require new, fast, and efficient computer vision applications that include moving object detection strategies. In this dissertation, we propose a temporal segmentation strategy and several moving object detection strategies, which are suitable for the last generation of computer vision applications requiring both low computational cost and high quality results. First, a novel real-time high-quality shot detection strategy is proposed. While abrupt transitions are detected through a very fast pixel-based analysis, gradual transitions are obtained from an efficient edge-based analysis. Both analyses are reinforced with a motion analysis that allows to detect and discard false detections. This analysis is carried out exclusively over a reduced amount of candidate transitions, thus maintaining the computational requirements. On the other hand, a moving object detection strategy, which is based on the popular Mixture of Gaussians method, is proposed. This strategy, taking into account the recent history of each image pixel, adapts dynamically the amount of Gaussians that are required to model its variations. As a result, we improve significantly the computational efficiency with respect to other similar methods and, additionally, we reduce the influence of the used parameters in the results. Alternatively, in order to improve the quality of the results in complex scenarios containing dynamic backgrounds, we propose different non-parametric based moving object detection strategies that model both background and foreground. To obtain high quality results regardless of the characteristics of the analyzed sequence we dynamically estimate the most adequate bandwidth matrices for the kernels that are used in the background and foreground modeling. Moreover, the application of a particle filter allows to update the spatial information and provides a priori knowledge about the areas to analyze in the following images, enabling an important reduction in the computational requirements and improving the segmentation results. Additionally, we propose the use of an innovative combination of chromaticity and gradients that allows to reduce the influence of shadows and reflects in the detections.
Resumo:
Many existing engineering works model the statistical characteristics of the entities under study as normal distributions. These models are eventually used for decision making, requiring in practice the definition of the classification region corresponding to the desired confidence level. Surprisingly enough, however, a great amount of computer vision works using multidimensional normal models leave unspecified or fail to establish correct confidence regions due to misconceptions on the features of Gaussian functions or to wrong analogies with the unidimensional case. The resulting regions incur in deviations that can be unacceptable in high-dimensional models. Here we provide a comprehensive derivation of the optimal confidence regions for multivariate normal distributions of arbitrary dimensionality. To this end, firstly we derive the condition for region optimality of general continuous multidimensional distributions, and then we apply it to the widespread case of the normal probability density function. The obtained results are used to analyze the confidence error incurred by previous works related to vision research, showing that deviations caused by wrong regions may turn into unacceptable as dimensionality increases. To support the theoretical analysis, a quantitative example in the context of moving object detection by means of background modeling is given.
Resumo:
In this paper we propose an innovative approach to tackle the problem of traffic sign detection using a computer vision algorithm and taking into account real-time operation constraints, trying to establish intelligent strategies to simplify as much as possible the algorithm complexity and to speed up the process. Firstly, a set of candidates is generated according to a color segmentation stage, followed by a region analysis strategy, where spatial characteristic of previously detected objects are taken into account. Finally, temporal coherence is introduced by means of a tracking scheme, performed using a Kalman filter for each potential candidate. Taking into consideration time constraints, efficiency is achieved two-fold: on the one side, a multi-resolution strategy is adopted for segmentation, where global operation will be applied only to low-resolution images, increasing the resolution to the maximum only when a potential road sign is being tracked. On the other side, we take advantage of the expected spacing between traffic signs. Namely, the tracking of objects of interest allows to generate inhibition areas, which are those ones where no new traffic signs are expected to appear due to the existence of a TS in the neighborhood. The proposed solution has been tested with real sequences in both urban areas and highways, and proved to achieve higher computational efficiency, especially as a result of the multi-resolution approach.
Resumo:
We propose a new method to automatically refine a facial disparity map obtained with standard cameras and under conventional illumination conditions by using a smart combination of traditional computer vision and 3D graphics techniques. Our system inputs two stereo images acquired with standard (calibrated) cameras and uses dense disparity estimation strategies to obtain a coarse initial disparity map, and SIFT to detect and match several feature points in the subjects face. We then use these points as anchors to modify the disparity in the facial area by building a Delaunay triangulation of their convex hull and interpolating their disparity values inside each triangle. We thus obtain a refined disparity map providing a much more accurate representation of the the subjects facial features. This refined facial disparity map may be easily transformed, through the camera calibration parameters, into a depth map to be used, also automatically, to improve the facial mesh of a 3D avatar to match the subjects real human features.
Resumo:
Due to ever increasing transportation of people and goods, automatic traffic surveillance is becoming a key issue for both providing safety to road users and improving traffic control in an efficient way. In this paper, we propose a new system that, exploiting the capabilities that both computer vision and machine learning offer, is able to detect and track different types of real incidents on a highway. Specifically, it is able to accurately detect not only stopped vehicles, but also drivers and passengers leaving the stopped vehicle, and other pedestrians present in the roadway. Additionally, a theoretical approach for detecting vehicles which may leave the road in an unexpected way is also presented. The system works in real-time and it has been optimized for working outdoor, being thus appropriate for its deployment in a real-world environment like a highway. First experimental results on a dataset created with videos provided by two Spanish highway operators demonstrate the effectiveness of the proposed system and its robustness against noise and low-quality videos.
Resumo:
In the recent years, the computer vision community has shown great interest on depth-based applications thanks to the performance and flexibility of the new generation of RGB-D imagery. In this paper, we present an efficient background subtraction algorithm based on the fusion of multiple region-based classifiers that processes depth and color data provided by RGB-D cameras. Foreground objects are detected by combining a region-based foreground prediction (based on depth data) with different background models (based on a Mixture of Gaussian algorithm) providing color and depth descriptions of the scene at pixel and region level. The information given by these modules is fused in a mixture of experts fashion to improve the foreground detection accuracy. The main contributions of the paper are the region-based models of both background and foreground, built from the depth and color data. The obtained results using different database sequences demonstrate that the proposed approach leads to a higher detection accuracy with respect to existing state-of-the-art techniques.
Resumo:
PAMELA (Phased Array Monitoring for Enhanced Life Assessment) SHMTM System is an integrated embedded ultrasonic guided waves based system consisting of several electronic devices and one system manager controller. The data collected by all PAMELA devices in the system must be transmitted to the controller, who will be responsible for carrying out the advanced signal processing to obtain SHM maps. PAMELA devices consist of hardware based on a Virtex 5 FPGA with a PowerPC 440 running an embedded Linux distribution. Therefore, PAMELA devices, in addition to the capability of performing tests and transmitting the collected data to the controller, have the capability of perform local data processing or pre-processing (reduction, normalization, pattern recognition, feature extraction, etc.). Local data processing decreases the data traffic over the network and allows CPU load of the external computer to be reduced. Even it is possible that PAMELA devices are running autonomously performing scheduled tests, and only communicates with the controller in case of detection of structural damages or when programmed. Each PAMELA device integrates a software management application (SMA) that allows to the developer downloading his own algorithm code and adding the new data processing algorithm to the device. The development of the SMA is done in a virtual machine with an Ubuntu Linux distribution including all necessary software tools to perform the entire cycle of development. Eclipse IDE (Integrated Development Environment) is used to develop the SMA project and to write the code of each data processing algorithm. This paper presents the developed software architecture and describes the necessary steps to add new data processing algorithms to SMA in order to increase the processing capabilities of PAMELA devices.An example of basic damage index estimation using delay and sum algorithm is provided.
Resumo:
La familia de algoritmos de Boosting son un tipo de técnicas de clasificación y regresión que han demostrado ser muy eficaces en problemas de Visión Computacional. Tal es el caso de los problemas de detección, de seguimiento o bien de reconocimiento de caras, personas, objetos deformables y acciones. El primer y más popular algoritmo de Boosting, AdaBoost, fue concebido para problemas binarios. Desde entonces, muchas han sido las propuestas que han aparecido con objeto de trasladarlo a otros dominios más generales: multiclase, multilabel, con costes, etc. Nuestro interés se centra en extender AdaBoost al terreno de la clasificación multiclase, considerándolo como un primer paso para posteriores ampliaciones. En la presente tesis proponemos dos algoritmos de Boosting para problemas multiclase basados en nuevas derivaciones del concepto margen. El primero de ellos, PIBoost, está concebido para abordar el problema descomponiéndolo en subproblemas binarios. Por un lado, usamos una codificación vectorial para representar etiquetas y, por otro, utilizamos la función de pérdida exponencial multiclase para evaluar las respuestas. Esta codificación produce un conjunto de valores margen que conllevan un rango de penalizaciones en caso de fallo y recompensas en caso de acierto. La optimización iterativa del modelo genera un proceso de Boosting asimétrico cuyos costes dependen del número de etiquetas separadas por cada clasificador débil. De este modo nuestro algoritmo de Boosting tiene en cuenta el desbalanceo debido a las clases a la hora de construir el clasificador. El resultado es un método bien fundamentado que extiende de manera canónica al AdaBoost original. El segundo algoritmo propuesto, BAdaCost, está concebido para problemas multiclase dotados de una matriz de costes. Motivados por los escasos trabajos dedicados a generalizar AdaBoost al terreno multiclase con costes, hemos propuesto un nuevo concepto de margen que, a su vez, permite derivar una función de pérdida adecuada para evaluar costes. Consideramos nuestro algoritmo como la extensión más canónica de AdaBoost para este tipo de problemas, ya que generaliza a los algoritmos SAMME, Cost-Sensitive AdaBoost y PIBoost. Por otro lado, sugerimos un simple procedimiento para calcular matrices de coste adecuadas para mejorar el rendimiento de Boosting a la hora de abordar problemas estándar y problemas con datos desbalanceados. Una serie de experimentos nos sirven para demostrar la efectividad de ambos métodos frente a otros conocidos algoritmos de Boosting multiclase en sus respectivas áreas. En dichos experimentos se usan bases de datos de referencia en el área de Machine Learning, en primer lugar para minimizar errores y en segundo lugar para minimizar costes. Además, hemos podido aplicar BAdaCost con éxito a un proceso de segmentación, un caso particular de problema con datos desbalanceados. Concluimos justificando el horizonte de futuro que encierra el marco de trabajo que presentamos, tanto por su aplicabilidad como por su flexibilidad teórica. Abstract The family of Boosting algorithms represents a type of classification and regression approach that has shown to be very effective in Computer Vision problems. Such is the case of detection, tracking and recognition of faces, people, deformable objects and actions. The first and most popular algorithm, AdaBoost, was introduced in the context of binary classification. Since then, many works have been proposed to extend it to the more general multi-class, multi-label, costsensitive, etc... domains. Our interest is centered in extending AdaBoost to two problems in the multi-class field, considering it a first step for upcoming generalizations. In this dissertation we propose two Boosting algorithms for multi-class classification based on new generalizations of the concept of margin. The first of them, PIBoost, is conceived to tackle the multi-class problem by solving many binary sub-problems. We use a vectorial codification to represent class labels and a multi-class exponential loss function to evaluate classifier responses. This representation produces a set of margin values that provide a range of penalties for failures and rewards for successes. The stagewise optimization of this model introduces an asymmetric Boosting procedure whose costs depend on the number of classes separated by each weak-learner. In this way the Boosting procedure takes into account class imbalances when building the ensemble. The resulting algorithm is a well grounded method that canonically extends the original AdaBoost. The second algorithm proposed, BAdaCost, is conceived for multi-class problems endowed with a cost matrix. Motivated by the few cost-sensitive extensions of AdaBoost to the multi-class field, we propose a new margin that, in turn, yields a new loss function appropriate for evaluating costs. Since BAdaCost generalizes SAMME, Cost-Sensitive AdaBoost and PIBoost algorithms, we consider our algorithm as a canonical extension of AdaBoost to this kind of problems. We additionally suggest a simple procedure to compute cost matrices that improve the performance of Boosting in standard and unbalanced problems. A set of experiments is carried out to demonstrate the effectiveness of both methods against other relevant Boosting algorithms in their respective areas. In the experiments we resort to benchmark data sets used in the Machine Learning community, firstly for minimizing classification errors and secondly for minimizing costs. In addition, we successfully applied BAdaCost to a segmentation task, a particular problem in presence of imbalanced data. We conclude the thesis justifying the horizon of future improvements encompassed in our framework, due to its applicability and theoretical flexibility.
Resumo:
El principal objetivo de este trabajo es proporcionar una solución en tiempo real basada en visión estéreo o monocular precisa y robusta para que un vehículo aéreo no tripulado (UAV) sea autónomo en varios tipos de aplicaciones UAV, especialmente en entornos abarrotados sin señal GPS. Este trabajo principalmente consiste en tres temas de investigación de UAV basados en técnicas de visión por computador: (I) visual tracking, proporciona soluciones efectivas para localizar visualmente objetos de interés estáticos o en movimiento durante el tiempo que dura el vuelo del UAV mediante una aproximación adaptativa online y una estrategia de múltiple resolución, de este modo superamos los problemas generados por las diferentes situaciones desafiantes, tales como cambios significativos de aspecto, iluminación del entorno variante, fondo del tracking embarullado, oclusión parcial o total de objetos, variaciones rápidas de posición y vibraciones mecánicas a bordo. La solución ha sido utilizada en aterrizajes autónomos, inspección de plataformas mar adentro o tracking de aviones en pleno vuelo para su detección y evasión; (II) odometría visual: proporciona una solución eficiente al UAV para estimar la posición con 6 grados de libertad (6D) usando únicamente la entrada de una cámara estéreo a bordo del UAV. Un método Semi-Global Blocking Matching (SGBM) eficiente basado en una estrategia grueso-a-fino ha sido implementada para una rápida y profunda estimación del plano. Además, la solución toma provecho eficazmente de la información 2D y 3D para estimar la posición 6D, resolviendo de esta manera la limitación de un punto de referencia fijo en la cámara estéreo. Una robusta aproximación volumétrica de mapping basada en el framework Octomap ha sido utilizada para reconstruir entornos cerrados y al aire libre bastante abarrotados en 3D con memoria y errores correlacionados espacialmente o temporalmente; (III) visual control, ofrece soluciones de control prácticas para la navegación de un UAV usando Fuzzy Logic Controller (FLC) con la estimación visual. Y el framework de Cross-Entropy Optimization (CEO) ha sido usado para optimizar el factor de escala y la función de pertenencia en FLC. Todas las soluciones basadas en visión en este trabajo han sido probadas en test reales. Y los conjuntos de datos de imágenes reales grabados en estos test o disponibles para la comunidad pública han sido utilizados para evaluar el rendimiento de estas soluciones basadas en visión con ground truth. Además, las soluciones de visión presentadas han sido comparadas con algoritmos de visión del estado del arte. Los test reales y los resultados de evaluación muestran que las soluciones basadas en visión proporcionadas han obtenido rendimientos en tiempo real precisos y robustos, o han alcanzado un mejor rendimiento que aquellos algoritmos del estado del arte. La estimación basada en visión ha ganado un rol muy importante en controlar un UAV típico para alcanzar autonomía en aplicaciones UAV. ABSTRACT The main objective of this dissertation is providing real-time accurate robust monocular or stereo vision-based solution for Unmanned Aerial Vehicle (UAV) to achieve the autonomy in various types of UAV applications, especially in GPS-denied dynamic cluttered environments. This dissertation mainly consists of three UAV research topics based on computer vision technique: (I) visual tracking, it supplys effective solutions to visually locate interesting static or moving object over time during UAV flight with on-line adaptivity approach and multiple-resolution strategy, thereby overcoming the problems generated by the different challenging situations, such as significant appearance change, variant surrounding illumination, cluttered tracking background, partial or full object occlusion, rapid pose variation and onboard mechanical vibration. The solutions have been utilized in autonomous landing, offshore floating platform inspection and midair aircraft tracking for sense-and-avoid; (II) visual odometry: it provides the efficient solution for UAV to estimate the 6 Degree-of-freedom (6D) pose using only the input of stereo camera onboard UAV. An efficient Semi-Global Blocking Matching (SGBM) method based on a coarse-to-fine strategy has been implemented for fast depth map estimation. In addition, the solution effectively takes advantage of both 2D and 3D information to estimate the 6D pose, thereby solving the limitation of a fixed small baseline in the stereo camera. A robust volumetric occupancy mapping approach based on the Octomap framework has been utilized to reconstruct indoor and outdoor large-scale cluttered environments in 3D with less temporally or spatially correlated measurement errors and memory; (III) visual control, it offers practical control solutions to navigate UAV using Fuzzy Logic Controller (FLC) with the visual estimation. And the Cross-Entropy Optimization (CEO) framework has been used to optimize the scaling factor and the membership function in FLC. All the vision-based solutions in this dissertation have been tested in real tests. And the real image datasets recorded from these tests or available from public community have been utilized to evaluate the performance of these vision-based solutions with ground truth. Additionally, the presented vision solutions have compared with the state-of-art visual algorithms. Real tests and evaluation results show that the provided vision-based solutions have obtained real-time accurate robust performances, or gained better performance than those state-of-art visual algorithms. The vision-based estimation has played a critically important role for controlling a typical UAV to achieve autonomy in the UAV application.
Resumo:
Vision extracts useful information from images. Reconstructing the three-dimensional structure of our environment and recognizing the objects that populate it are among the most important functions of our visual system. Computer vision researchers study the computational principles of vision and aim at designing algorithms that reproduce these functions. Vision is difficult: the same scene may give rise to very different images depending on illumination and viewpoint. Typically, an astronomical number of hypotheses exist that in principle have to be analyzed to infer a correct scene description. Moreover, image information might be extracted at different levels of spatial and logical resolution dependent on the image processing task. Knowledge of the world allows the visual system to limit the amount of ambiguity and to greatly simplify visual computations. We discuss how simple properties of the world are captured by the Gestalt rules of grouping, how the visual system may learn and organize models of objects for recognition, and how one may control the complexity of the description that the visual system computes.
Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex.
Resumo:
The stages of integration leading from local feature analysis to object recognition were explored in human visual cortex by using the technique of functional magnetic resonance imaging. Here we report evidence for object-related activation. Such activation was located at the lateral-posterior aspect of the occipital lobe, just abutting the posterior aspect of the motion-sensitive area MT/V5, in a region termed the lateral occipital complex (LO). LO showed preferential activation to images of objects, compared to a wide range of texture patterns. This activation was not caused by a global difference in the Fourier spatial frequency content of objects versus texture images, since object images produced enhanced LO activation compared to textures matched in power spectra but randomized in phase. The preferential activation to objects also could not be explained by different patterns of eye movements: similar levels of activation were observed when subjects fixated on the objects and when they scanned the objects with their eyes. Additional manipulations such as spatial frequency filtering and a 4-fold change in visual size did not affect LO activation. These results suggest that the enhanced responses to objects were not a manifestation of low-level visual processing. A striking demonstration that activity in LO is uniquely correlated to object detectability was produced by the "Lincoln" illusion, in which blurring of objects digitized into large blocks paradoxically increases their recognizability. Such blurring led to significant enhancement of LO activation. Despite the preferential activation to objects, LO did not seem to be involved in the final, "semantic," stages of the recognition process. Thus, objects varying widely in their recognizability (e.g., famous faces, common objects, and unfamiliar three-dimensional abstract sculptures) activated it to a similar degree. These results are thus evidence for an intermediate link in the chain of processing stages leading to object recognition in human visual cortex.
Resumo:
Comunicación presentada en el X Workshop of Physical Agents, Cáceres, 10-11 septiembre 2009.