889 resultados para computer vision, facial expression recognition, swig, red5, actionscript, ruby on rails, html5
Resumo:
Diet management is a key factor for the prevention and treatment of diet-related chronic diseases. Computer vision systems aim to provide automated food intake assessment using meal images. We propose a method for the recognition of already segmented food items in meal images. The method uses a 6-layer deep convolutional neural network to classify food image patches. For each food item, overlapping patches are extracted and classified and the class with the majority of votes is assigned to it. Experiments on a manually annotated dataset with 573 food items justified the choice of the involved components and proved the effectiveness of the proposed system yielding an overall accuracy of 84.9%.
Resumo:
La segmentación de imágenes es un campo importante de la visión computacional y una de las áreas de investigación más activas, con aplicaciones en comprensión de imágenes, detección de objetos, reconocimiento facial, vigilancia de vídeo o procesamiento de imagen médica. La segmentación de imágenes es un problema difícil en general, pero especialmente en entornos científicos y biomédicos, donde las técnicas de adquisición imagen proporcionan imágenes ruidosas. Además, en muchos de estos casos se necesita una precisión casi perfecta. En esta tesis, revisamos y comparamos primero algunas de las técnicas ampliamente usadas para la segmentación de imágenes médicas. Estas técnicas usan clasificadores a nivel de pixel e introducen regularización sobre pares de píxeles que es normalmente insuficiente. Estudiamos las dificultades que presentan para capturar la información de alto nivel sobre los objetos a segmentar. Esta deficiencia da lugar a detecciones erróneas, bordes irregulares, configuraciones con topología errónea y formas inválidas. Para solucionar estos problemas, proponemos un nuevo método de regularización de alto nivel que aprende información topológica y de forma a partir de los datos de entrenamiento de una forma no paramétrica usando potenciales de orden superior. Los potenciales de orden superior se están popularizando en visión por computador, pero la representación exacta de un potencial de orden superior definido sobre muchas variables es computacionalmente inviable. Usamos una representación compacta de los potenciales basada en un conjunto finito de patrones aprendidos de los datos de entrenamiento que, a su vez, depende de las observaciones. Gracias a esta representación, los potenciales de orden superior pueden ser convertidos a potenciales de orden 2 con algunas variables auxiliares añadidas. Experimentos con imágenes reales y sintéticas confirman que nuestro modelo soluciona los errores de aproximaciones más débiles. Incluso con una regularización de alto nivel, una precisión exacta es inalcanzable, y se requeire de edición manual de los resultados de la segmentación automática. La edición manual es tediosa y pesada, y cualquier herramienta de ayuda es muy apreciada. Estas herramientas necesitan ser precisas, pero también lo suficientemente rápidas para ser usadas de forma interactiva. Los contornos activos son una buena solución: son buenos para detecciones precisas de fronteras y, en lugar de buscar una solución global, proporcionan un ajuste fino a resultados que ya existían previamente. Sin embargo, requieren una representación implícita que les permita trabajar con cambios topológicos del contorno, y esto da lugar a ecuaciones en derivadas parciales (EDP) que son costosas de resolver computacionalmente y pueden presentar problemas de estabilidad numérica. Presentamos una aproximación morfológica a la evolución de contornos basada en un nuevo operador morfológico de curvatura que es válido para superficies de cualquier dimensión. Aproximamos la solución numérica de la EDP de la evolución de contorno mediante la aplicación sucesiva de un conjunto de operadores morfológicos aplicados sobre una función de conjuntos de nivel. Estos operadores son muy rápidos, no sufren de problemas de estabilidad numérica y no degradan la función de los conjuntos de nivel, de modo que no hay necesidad de reinicializarlo. Además, su implementación es mucho más sencilla que la de las EDP, ya que no requieren usar sofisticados algoritmos numéricos. Desde un punto de vista teórico, profundizamos en las conexiones entre operadores morfológicos y diferenciales, e introducimos nuevos resultados en este área. Validamos nuestra aproximación proporcionando una implementación morfológica de los contornos geodésicos activos, los contornos activos sin bordes, y los turbopíxeles. En los experimentos realizados, las implementaciones morfológicas convergen a soluciones equivalentes a aquéllas logradas mediante soluciones numéricas tradicionales, pero con ganancias significativas en simplicidad, velocidad y estabilidad. ABSTRACT Image segmentation is an important field in computer vision and one of its most active research areas, with applications in image understanding, object detection, face recognition, video surveillance or medical image processing. Image segmentation is a challenging problem in general, but especially in the biological and medical image fields, where the imaging techniques usually produce cluttered and noisy images and near-perfect accuracy is required in many cases. In this thesis we first review and compare some standard techniques widely used for medical image segmentation. These techniques use pixel-wise classifiers and introduce weak pairwise regularization which is insufficient in many cases. We study their difficulties to capture high-level structural information about the objects to segment. This deficiency leads to many erroneous detections, ragged boundaries, incorrect topological configurations and wrong shapes. To deal with these problems, we propose a new regularization method that learns shape and topological information from training data in a nonparametric way using high-order potentials. High-order potentials are becoming increasingly popular in computer vision. However, the exact representation of a general higher order potential defined over many variables is computationally infeasible. We use a compact representation of the potentials based on a finite set of patterns learned fromtraining data that, in turn, depends on the observations. Thanks to this representation, high-order potentials can be converted into pairwise potentials with some added auxiliary variables and minimized with tree-reweighted message passing (TRW) and belief propagation (BP) techniques. Both synthetic and real experiments confirm that our model fixes the errors of weaker approaches. Even with high-level regularization, perfect accuracy is still unattainable, and human editing of the segmentation results is necessary. The manual edition is tedious and cumbersome, and tools that assist the user are greatly appreciated. These tools need to be precise, but also fast enough to be used in real-time. Active contours are a good solution: they are good for precise boundary detection and, instead of finding a global solution, they provide a fine tuning to previously existing results. However, they require an implicit representation to deal with topological changes of the contour, and this leads to PDEs that are computationally costly to solve and may present numerical stability issues. We present a morphological approach to contour evolution based on a new curvature morphological operator valid for surfaces of any dimension. We approximate the numerical solution of the contour evolution PDE by the successive application of a set of morphological operators defined on a binary level-set. These operators are very fast, do not suffer numerical stability issues, and do not degrade the level set function, so there is no need to reinitialize it. Moreover, their implementation is much easier than their PDE counterpart, since they do not require the use of sophisticated numerical algorithms. From a theoretical point of view, we delve into the connections between differential andmorphological operators, and introduce novel results in this area. We validate the approach providing amorphological implementation of the geodesic active contours, the active contours without borders, and turbopixels. In the experiments conducted, the morphological implementations converge to solutions equivalent to those achieved by traditional numerical solutions, but with significant gains in simplicity, speed, and stability.
Resumo:
We propose a new method to automatically refine a facial disparity map obtained with standard cameras and under conventional illumination conditions by using a smart combination of traditional computer vision and 3D graphics techniques. Our system inputs two stereo images acquired with standard (calibrated) cameras and uses dense disparity estimation strategies to obtain a coarse initial disparity map, and SIFT to detect and match several feature points in the subjects face. We then use these points as anchors to modify the disparity in the facial area by building a Delaunay triangulation of their convex hull and interpolating their disparity values inside each triangle. We thus obtain a refined disparity map providing a much more accurate representation of the the subjects facial features. This refined facial disparity map may be easily transformed, through the camera calibration parameters, into a depth map to be used, also automatically, to improve the facial mesh of a 3D avatar to match the subjects real human features.
Resumo:
The aim of this Master Thesis is the analysis, design and development of a robust and reliable Human-Computer Interaction interface, based on visual hand-gesture recognition. The implementation of the required functions is oriented to the simulation of a classical hardware interaction device: the mouse, by recognizing a specific hand-gesture vocabulary in color video sequences. For this purpose, a prototype of a hand-gesture recognition system has been designed and implemented, which is composed of three stages: detection, tracking and recognition. This system is based on machine learning methods and pattern recognition techniques, which have been integrated together with other image processing approaches to get a high recognition accuracy and a low computational cost. Regarding pattern recongition techniques, several algorithms and strategies have been designed and implemented, which are applicable to color images and video sequences. The design of these algorithms has the purpose of extracting spatial and spatio-temporal features from static and dynamic hand gestures, in order to identify them in a robust and reliable way. Finally, a visual database containing the necessary vocabulary of gestures for interacting with the computer has been created.
Resumo:
Vision extracts useful information from images. Reconstructing the three-dimensional structure of our environment and recognizing the objects that populate it are among the most important functions of our visual system. Computer vision researchers study the computational principles of vision and aim at designing algorithms that reproduce these functions. Vision is difficult: the same scene may give rise to very different images depending on illumination and viewpoint. Typically, an astronomical number of hypotheses exist that in principle have to be analyzed to infer a correct scene description. Moreover, image information might be extracted at different levels of spatial and logical resolution dependent on the image processing task. Knowledge of the world allows the visual system to limit the amount of ambiguity and to greatly simplify visual computations. We discuss how simple properties of the world are captured by the Gestalt rules of grouping, how the visual system may learn and organize models of objects for recognition, and how one may control the complexity of the description that the visual system computes.
Resumo:
This paper tells about the recognition of temporal expressions and the resolution of their temporal reference. A proposal of the units we have used to face up this tasks over a restricted domain is shown. We work with newspapers' articles in Spanish, that is why every reference we use is in Spanish. For the identification and recognition of temporal expressions we base on a temporal expression grammar and for the resolution on a dictionary, where we have the information necessary to do the date operation based on the recognized expressions. In the evaluation of our proposal we have obtained successful results for the examples studied.
Resumo:
The need to digitise music scores has led to the development of Optical Music Recognition (OMR) tools. Unfortunately, the performance of these systems is still far from providing acceptable results. This situation forces the user to be involved in the process due to the need of correcting the mistakes made during recognition. However, this correction is performed over the output of the system, so these interventions are not exploited to improve the performance of the recognition. This work sets the scenario in which human and machine interact to accurately complete the OMR task with the least possible effort for the user.
Resumo:
Because faces and bodies share some abstract perceptual features, we hypothesised that similar recognition processes might be used for both. We investigated whether similar caricature effects to those found in facial identity and expression recognition could be found in the recognition of individual bodies and socially meaningful body positions. Participants were trained to name four body positions (anger, fear, disgust, sadness) and four individuals (in a neutral position). We then tested their recognition of extremely caricatured, moderately caricatured, anticaricatured, and undistorted images of each stimulus. Consistent with caricature effects found in face recognition, moderately caricatured representations of individuals' bodies were recognised more accurately than undistorted and extremely caricatured representations. No significant difference was found between participants' recognition of extremely caricatured, moderately caricatured, or undistorted body position line-drawings. AU anti-caricatured representations were named significandy less accurately than the veridical stimuli. Similar mental representations may be used for both bodies and faces.
Resumo:
A major impediment to developing real-time computer vision systems has been the computational power and level of skill required to process video streams in real-time. This has meant that many researchers have either analysed video streams off-line or used expensive dedicated hardware acceleration techniques. Recent software and hardware developments have greatly eased the development burden of realtime image analysis leading to the development of portable systems using cheap PC hardware and software exploiting the Multimedia Extension (MMX) instruction set of the Intel Pentium chip. This paper describes the implementation of a computationally efficient computer vision system for recognizing hand gestures using efficient coding and MMX-acceleration to achieve real-time performance on low cost hardware.
Resumo:
Probabilistic robotics most often applied to the problem of simultaneous localisation and mapping (SLAM), requires measures of uncertainty to accompany observations of the environment. This paper describes how uncertainty can be characterised for a vision system that locates coloured landmarks in a typical laboratory environment. The paper describes a model of the uncertainty in segmentation, the internal cameral model and the mounting of the camera on the robot. It explains the implementation of the system on a laboratory robot, and provides experimental results that show the coherence of the uncertainty model.
Resumo:
Verbal working memory and emotional self-regulation are impaired in Bipolar Disorder (BD). Our aim was to investigate the effect of Lamotrigine (LTG), which is effective in the clinical management of BD, on the neural circuits subserving working memory and emotional processing. Functional Magnetic Resonance Imaging data from 12 stable BD patients was used to detect LTG-induced changes as the differences in brain activity between drug-free and post-LTG monotherapy conditions during a verbal working memory (N-back sequential letter task) and an angry facial affect recognition task. For both tasks, LGT monotherapy compared to baseline was associated with increased activation mostly within the prefrontal cortex and cingulate gyrus, in regions normally engaged in verbal working memory and emotional processing. Therefore, LTG monotherapy in BD patients may enhance cortical function within neural circuits involved in memory and emotional self-regulation. © 2007 Elsevier B.V. and ECNP.
Resumo:
We present a video-based system which interactively captures the geometry of a 3D object in the form of a point cloud, then recognizes and registers known objects in this point cloud in a matter of seconds (fig. 1). In order to achieve interactive speed, we exploit both efficient inference algorithms and parallel computation, often on a GPU. The system can be broken down into two distinct phases: geometry capture, and object inference. We now discuss these in further detail. © 2011 IEEE.
Resumo:
We address the problem of 3D-assisted 2D face recognition in scenarios when the input image is subject to degradations or exhibits intra-personal variations not captured by the 3D model. The proposed solution involves a novel approach to learn a subspace spanned by perturbations caused by the missing modes of variation and image degradations, using 3D face data reconstructed from 2D images rather than 3D capture. This is accomplished by modelling the difference in the texture map of the 3D aligned input and reference images. A training set of these texture maps then defines a perturbation space which can be represented using PCA bases. Assuming that the image perturbation subspace is orthogonal to the 3D face model space, then these additive components can be recovered from an unseen input image, resulting in an improved fit of the 3D face model. The linearity of the model leads to efficient fitting. Experiments show that our method achieves very competitive face recognition performance on Multi-PIE and AR databases. We also present baseline face recognition results on a new data set exhibiting combined pose and illumination variations as well as occlusion.