851 resultados para computer vision face recognition detection voice recognition sistemi biometrici iOS


Relevância:

100.00% 100.00%

Publicador:

Resumo:

To master changing performance demands, autonomous transport vehicles are deployed to make inhouse material flow applications more flexible. The socalled cellular transport system consists of a multitude of small scale transport vehicles which shall be able to form a swarm. Therefore the vehicles need to detect each other, exchange information amongst each other and sense their environment. By provision of peripherally acquired information of other transport entities, more convenient decisions can be made in terms of navigation and collision avoidance. This paper is a contribution to collective utilization of sensor data in the swarm of cellular transport vehicles.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we study the problem of blind deconvolution. Our analysis is based on the algorithm of Chan and Wong [2] which popularized the use of sparse gradient priors via total variation. We use this algorithm because many methods in the literature are essentially adaptations of this framework. Such algorithm is an iterative alternating energy minimization where at each step either the sharp image or the blur function are reconstructed. Recent work of Levin et al. [14] showed that any algorithm that tries to minimize that same energy would fail, as the desired solution has a higher energy than the no-blur solution, where the sharp image is the blurry input and the blur is a Dirac delta. However, experimentally one can observe that Chan and Wong's algorithm converges to the desired solution even when initialized with the no-blur one. We provide both analysis and experiments to resolve this paradoxical conundrum. We find that both claims are right. The key to understanding how this is possible lies in the details of Chan and Wong's implementation and in how seemingly harmless choices result in dramatic effects. Our analysis reveals that the delayed scaling (normalization) in the iterative step of the blur kernel is fundamental to the convergence of the algorithm. This then results in a procedure that eludes the no-blur solution, despite it being a global minimum of the original energy. We introduce an adaptation of this algorithm and show that, in spite of its extreme simplicity, it is very robust and achieves a performance comparable to the state of the art.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work we devise two novel algorithms for blind deconvolution based on a family of logarithmic image priors. In contrast to recent approaches, we consider a minimalistic formulation of the blind deconvolution problem where there are only two energy terms: a least-squares term for the data fidelity and an image prior based on a lower-bounded logarithm of the norm of the image gradients. We show that this energy formulation is sufficient to achieve the state of the art in blind deconvolution with a good margin over previous methods. Much of the performance is due to the chosen prior. On the one hand, this prior is very effective in favoring sparsity of the image gradients. On the other hand, this prior is non convex. Therefore, solutions that can deal effectively with local minima of the energy become necessary. We devise two iterative minimization algorithms that at each iteration solve convex problems: one obtained via the primal-dual approach and one via majorization-minimization. While the former is computationally efficient, the latter achieves state-of-the-art performance on a public dataset.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Diet-related chronic diseases severely affect personal and global health. However, managing or treating these diseases currently requires long training and high personal involvement to succeed. Computer vision systems could assist with the assessment of diet by detecting and recognizing different foods and their portions in images. We propose novel methods for detecting a dish in an image and segmenting its contents with and without user interaction. All methods were evaluated on a database of over 1600 manually annotated images. The dish detection scored an average of 99% accuracy with a .2s/image run time, while the automatic and semi-automatic dish segmentation methods reached average accuracies of 88% and 91% respectively, with an average run time of .5s/image, outperforming competing solutions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we propose an innovative approach to tackle the problem of traffic sign detection using a computer vision algorithm and taking into account real-time operation constraints, trying to establish intelligent strategies to simplify as much as possible the algorithm complexity and to speed up the process. Firstly, a set of candidates is generated according to a color segmentation stage, followed by a region analysis strategy, where spatial characteristic of previously detected objects are taken into account. Finally, temporal coherence is introduced by means of a tracking scheme, performed using a Kalman filter for each potential candidate. Taking into consideration time constraints, efficiency is achieved two-fold: on the one side, a multi-resolution strategy is adopted for segmentation, where global operation will be applied only to low-resolution images, increasing the resolution to the maximum only when a potential road sign is being tracked. On the other side, we take advantage of the expected spacing between traffic signs. Namely, the tracking of objects of interest allows to generate inhibition areas, which are those ones where no new traffic signs are expected to appear due to the existence of a TS in the neighborhood. The proposed solution has been tested with real sequences in both urban areas and highways, and proved to achieve higher computational efficiency, especially as a result of the multi-resolution approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A novel GPU-based nonparametric moving object detection strategy for computer vision tools requiring real-time processing is proposed. An alternative and efficient Bayesian classifier to combine nonparametric background and foreground models allows increasing correct detections while avoiding false detections. Additionally, an efficient region of interest analysis significantly reduces the computational cost of the detections.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to ever increasing transportation of people and goods, automatic traffic surveillance is becoming a key issue for both providing safety to road users and improving traffic control in an efficient way. In this paper, we propose a new system that, exploiting the capabilities that both computer vision and machine learning offer, is able to detect and track different types of real incidents on a highway. Specifically, it is able to accurately detect not only stopped vehicles, but also drivers and passengers leaving the stopped vehicle, and other pedestrians present in the roadway. Additionally, a theoretical approach for detecting vehicles which may leave the road in an unexpected way is also presented. The system works in real-time and it has been optimized for working outdoor, being thus appropriate for its deployment in a real-world environment like a highway. First experimental results on a dataset created with videos provided by two Spanish highway operators demonstrate the effectiveness of the proposed system and its robustness against noise and low-quality videos.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last decade, multi-sensor data fusion has become a broadly demanded discipline to achieve advanced solutions that can be applied in many real world situations, either civil or military. In Defence,accurate detection of all target objects is fundamental to maintaining situational awareness, to locating threats in the battlefield and to identifying and protecting strategically own forces. Civil applications, such as traffic monitoring, have similar requirements in terms of object detection and reliable identification of incidents in order to ensure safety of road users. Thanks to the appropriate data fusion technique, we can give these systems the power to exploit automatically all relevant information from multiple sources to face for instance mission needs or assess daily supervision operations. This paper focuses on its application to active vehicle monitoring in a particular area of high density traffic, and how it is redirecting the research activities being carried out in the computer vision, signal processing and machine learning fields for improving the effectiveness of detection and tracking in ground surveillance scenarios in general. Specifically, our system proposes fusion of data at a feature level which is extracted from a video camera and a laser scanner. In addition, a stochastic-based tracking which introduces some particle filters into the model to deal with uncertainty due to occlusions and improve the previous detection output is presented in this paper. It has been shown that this computer vision tracker contributes to detect objects even under poor visual information. Finally, in the same way that humans are able to analyze both temporal and spatial relations among items in the scene to associate them a meaning, once the targets objects have been correctly detected and tracked, it is desired that machines can provide a trustworthy description of what is happening in the scene under surveillance. Accomplishing so ambitious task requires a machine learning-based hierarchic architecture able to extract and analyse behaviours at different abstraction levels. A real experimental testbed has been implemented for the evaluation of the proposed modular system. Such scenario is a closed circuit where real traffic situations can be simulated. First results have shown the strength of the proposed system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Uno de los mayores retos para la comunidad científica es conseguir que las máquinas posean en un futuro la capacidad del sistema visual y cognitivo humanos, de forma que, por ejemplo, en entornos de video vigilancia, puedan llegar a proporcionar de manera automática una descripción fiable de lo que está ocurriendo en la escena. En la presente tesis, mediante la propuesta de un marco de trabajo de referencia, se discuten y plantean los pasos necesarios para el desarrollo de sistemas más inteligentes capaces de extraer y analizar, a diferentes niveles de abstracción y mediante distintos módulos de procesamiento independientes, la información necesaria para comprender qué está sucediendo en un conjunto amplio de escenarios de distinta naturaleza. Se parte de un análisis de requisitos y se identifican los retos para este tipo de sistemas en la actualidad, lo que constituye en sí mismo los objetivos de esta tesis, contribuyendo así a un modelo de datos basado en el conocimiento que permitirá analizar distintas situaciones en las que personas y vehículos son los actores principales, dejando no obstante la puerta abierta a la adaptación a otros dominios. Así mismo, se estudian los distintos procesos que se pueden lanzar a nivel interno así como la necesidad de integrar mecanismos de realimentación a distintos niveles que permitan al sistema adaptarse mejor a cambios en el entorno. Como resultado, se propone un marco de referencia jerárquico que integra las capacidades de percepción, interpretación y aprendizaje para superar los retos identificados en este ámbito; y así poder desarrollar sistemas de vigilancia más robustos, flexibles e inteligentes, capaces de operar en una variedad de entornos. Resultados experimentales ejecutados sobre distintas muestras de datos (secuencias de vídeo principalmente) demuestran la efectividad del marco de trabajo propuesto respecto a otros propuestos en el pasado. Un primer caso de estudio, permite demostrar la creación de un sistema de monitorización de entornos de parking en exteriores para la detección de vehículos y el análisis de plazas libres de aparcamiento. Un segundo caso de estudio, permite demostrar la flexibilidad del marco de referencia propuesto para adaptarse a los requisitos de un entorno de vigilancia completamente distinto, como es un hogar inteligente donde el análisis automático de actividades de la vida cotidiana centra la atención del estudio. ABSTRACT One of the most ambitious objectives for the Computer Vision and Pattern Recognition research community is that machines can achieve similar capacities to the human's visual and cognitive system, and thus provide a trustworthy description of what is happening in the scene under surveillance. Thus, a number of well-established scenario understanding architectural frameworks to develop applications working on a variety of environments can be found in the literature. In this Thesis, a highly descriptive methodology for the development of scene understanding applications is presented. It consists of a set of formal guidelines to let machines extract and analyse, at different levels of abstraction and by means of independent processing modules that interact with each other, the necessary information to understand a broad set of different real World surveillance scenarios. Taking into account the challenges that working at both low and high levels offer, we contribute with a highly descriptive knowledge-based data model for the analysis of different situations in which people and vehicles are the main actors, leaving the door open for the development of interesting applications in diverse smart domains. Recommendations to let systems achieve high-level behaviour understanding will be also provided. Furthermore, feedback mechanisms are proposed to be integrated in order to let any system to understand better the environment and the logical context around, reducing thus the uncertainty and noise, and increasing its robustness and precision in front of low-level or high-level errors. As a result, a hierarchical cognitive architecture of reference which integrates the necessary perception, interpretation, attention and learning capabilities to overcome main challenges identified in this area of research is proposed; thus allowing to develop more robust, flexible and smart surveillance systems to cope with the different requirements of a variety of environments. Once crucial issues that should be treated explicitly in the design of this kind of systems have been formulated and discussed, experimental results shows the effectiveness of the proposed framework compared with other proposed in the past. Two case studies were implemented to test the capabilities of the framework. The first case study presents how the proposed framework can be used to create intelligent parking monitoring systems. The second case study demonstrates the flexibility of the system to cope with the requirements of a completely different environment, a smart home where activities of daily living are performed. Finally, general conclusions and future work lines to further enhancing the capabilities of the proposed framework are presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays, a lot of applications use digital images. For example in face recognition to detect and tag persons in photograph, for security control, and a lot of applications that can be found in smart cities, as speed control in roads or highways and cameras in traffic lights to detect drivers ignoring red light. Also in medicine digital images are used, such as x-ray, scanners, etc. These applications depend on the quality of the image obtained. A good camera is expensive, and the image obtained depends also on external factor as light. To make these applications work properly, image enhancement is as important as, for example, a good face detection algorithm. Image enhancement also can be used in normal photograph, for pictures done in bad light conditions, or just to improve the contrast of an image. There are some applications for smartphones that allow users apply filters or change the bright, colour or contrast on the pictures. This project compares four different techniques to use in image enhancement. After applying one of these techniques to an image, it will use better the whole available dynamic range. Some of the algorithms are designed for grey scale images and others for colour images. It is used Matlab software to develop and present the final results. These algorithms are Successive Means Quantization Transform (SMQT), Histogram Equalization, using Matlab function and own implemented function, and V transform. Finally, as conclusions, we can prove that Histogram equalization algorithm is the simplest of all, it has a wide variability of grey levels and it is not suitable for colour images. V transform algorithm is a good option for colour images. The algorithm is linear and requires low computational power. SMQT algorithm is non-linear, insensitive to gain and bias and it can extract structure of the data. RESUMEN. Hoy en día incontable número de aplicaciones usan imágenes digitales. Por ejemplo, para el control de la seguridad se usa el reconocimiento de rostros para detectar y etiquetar personas en fotografías o vídeos, para distintos usos de las ciudades inteligentes, como control de velocidad en carreteras o autopistas, cámaras en los semáforos para detectar a conductores haciendo caso omiso de un semáforo en rojo, etc. También en la medicina se utilizan imágenes digitales, como por ejemplo, rayos X, escáneres, etc. Todas estas aplicaciones dependen de la calidad de la imagen obtenida. Una buena cámara es cara, y la imagen obtenida depende también de factores externos como la luz. Para hacer que estas aplicaciones funciones correctamente, el tratamiento de imagen es tan importante como, por ejemplo, un buen algoritmo de detección de rostros. La mejora de la imagen también se puede utilizar en la fotografía no profesional o de consumo, para las fotos realizadas en malas condiciones de luz, o simplemente para mejorar el contraste de una imagen. Existen aplicaciones para teléfonos móviles que permiten a los usuarios aplicar filtros y cambiar el brillo, el color o el contraste en las imágenes. Este proyecto compara cuatro técnicas diferentes para utilizar el tratamiento de imagen. Se utiliza la herramienta de software matemático Matlab para desarrollar y presentar los resultados finales. Estos algoritmos son Successive Means Quantization Transform (SMQT), Ecualización del histograma, usando la propia función de Matlab y una nueva función que se desarrolla en este proyecto y, por último, una función de transformada V. Finalmente, como conclusión, podemos comprobar que el algoritmo de Ecualización del histograma es el más simple de todos, tiene una amplia variabilidad de niveles de gris y no es adecuado para imágenes en color. El algoritmo de transformada V es una buena opción para imágenes en color, es lineal y requiere baja potencia de cálculo. El algoritmo SMQT no es lineal, insensible a la ganancia y polarización y, gracias a él, se puede extraer la estructura de los datos.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El habla es la principal herramienta de comunicación de la que dispone el ser humano que, no sólo le permite expresar su pensamiento y sus sentimientos sino que le distingue como individuo. El análisis de la señal de voz es fundamental para múltiples aplicaciones como pueden ser: síntesis y reconocimiento de habla, codificación, detección de patologías, identificación y reconocimiento de locutor… En el mercado se pueden encontrar herramientas comerciales o de libre distribución para realizar esta tarea. El objetivo de este Proyecto Fin de Grado es reunir varios algoritmos de análisis de la señal de voz en una única herramienta que se manejará a través de un entorno gráfico. Los algoritmos están siendo utilizados en el Grupo de investigación en Aplicaciones MultiMedia y Acústica de la Universidad Politécnica de Madrid para llevar a cabo su tarea investigadora y para ofertar talleres formativos a los alumnos de grado de la Escuela Técnica Superior de Ingeniería y Sistemas de Telecomunicación. Actualmente se ha encontrado alguna dificultad para poder aplicar los algoritmos ya que se han ido desarrollando a lo largo de varios años, por distintas personas y en distintos entornos de programación. Se han adaptado los programas existentes para generar una única herramienta en MATLAB que permite: . Detección de voz . Detección sordo/sonoro . Extracción y revisión manual de frecuencia fundamental de los sonidos sonoros . Extracción y revisión manual de formantes de los sonidos sonoros En todos los casos el usuario puede ajustar los parámetros de análisis y se ha mantenido y, en algunos casos, ampliado la funcionalidad de los algoritmos existentes. Los resultados del análisis se pueden manejar directamente en la aplicación o guardarse en un fichero. Por último se ha escrito el manual de usuario de la aplicación y se ha generado una aplicación independiente que puede instalarse y ejecutarse aunque no se disponga del software o de la versión adecuada de MATLAB. ABSTRACT. The speech is the main communication tool which has the human that as well as allowing to express his thoughts and feelings distinguishes him as an individual. The analysis of speech signal is essential for multiple applications such as: synthesis and recognition of speech, coding, detection of pathologies, identification and speaker recognition… In the market you can find commercial or open source tools to perform this task. The aim of this Final Degree Project is collect several algorithms of speech signal analysis in a single tool which will be managed through a graphical environment. These algorithms are being used in the research group Aplicaciones MultiMedia y Acústica at the Universidad Politécnica de Madrid to carry out its research work and to offer training workshops for students at the Escuela Técnica Superior de Ingeniería y Sistemas de Telecomunicación. Currently some difficulty has been found to be able to apply the algorithms as they have been developing over several years, by different people and in different programming environments. Existing programs have been adapted to generate a single tool in MATLAB that allows: . Voice Detection . Voice/Unvoice Detection . Extraction and manual review of fundamental frequency of voiced sounds . Extraction and manual review formant voiced sounds In all cases the user can adjust the scan settings, we have maintained and in some cases expanded the functionality of existing algorithms. The analysis results can be managed directly in the application or saved to a file. Finally we have written the application user’s manual and it has generated a standalone application that can be installed and run although the user does not have MATLAB software or the appropriate version.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Deformable Template models are first applied to track the inner wall of coronary arteries in intravascular ultrasound sequences, mainly in the assistance to angioplasty surgery. A circular template is used for initializing an elliptical deformable model to track wall deformation when inflating a balloon placed at the tip of the catheter. We define a new energy function for driving the behavior of the template and we test its robustness both in real and synthetic images. Finally we introduce a framework for learning and recognizing spatio-temporal geometric constraints based on Principal Component Analysis (eigenconstraints).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Feature vectors can be anything from simple surface normals to more complex feature descriptors. Feature extraction is important to solve various computer vision problems: e.g. registration, object recognition and scene understanding. Most of these techniques cannot be computed online due to their complexity and the context where they are applied. Therefore, computing these features in real-time for many points in the scene is impossible. In this work, a hardware-based implementation of 3D feature extraction and 3D object recognition is proposed to accelerate these methods and therefore the entire pipeline of RGBD based computer vision systems where such features are typically used. The use of a GPU as a general purpose processor can achieve considerable speed-ups compared with a CPU implementation. In this work, advantageous results are obtained using the GPU to accelerate the computation of a 3D descriptor based on the calculation of 3D semi-local surface patches of partial views. This allows descriptor computation at several points of a scene in real-time. Benefits of the accelerated descriptor have been demonstrated in object recognition tasks. Source code will be made publicly available as contribution to the Open Source Point Cloud Library.