10 resultados para VISUAL DETECTION

em Universidad Politécnica de Madrid


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Automatic visual object counting and video surveillance have important applications for home and business environments, such as security and management of access points. However, in order to obtain a satisfactory performance these technologies need professional and expensive hardware, complex installations and setups, and the supervision of qualified workers. In this paper, an efficient visual detection and tracking framework is proposed for the tasks of object counting and surveillance, which meets the requirements of the consumer electronics: off-the-shelf equipment, easy installation and configuration, and unsupervised working conditions. This is accomplished by a novel Bayesian tracking model that can manage multimodal distributions without explicitly computing the association between tracked objects and detections. In addition, it is robust to erroneous, distorted and missing detections. The proposed algorithm is compared with a recent work, also focused on consumer electronics, proving its superior performance.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

As it is known, there are five types of neurons in the mammalian retinal layer allowing the detection of several important characteristics of the visual image impinging onto the visual system, namely, photoreceptors, horizontal cells, amacrine, bipolar and ganglion cells. And it is a well known fact too, that the amacrine neuron architecture allows a first detection for objects motion, being the most important retinal cell to this function. We have already studied and simulated the Dowling retina model and we have verified that many complex processes in visual detection is performed with the basis of the amacrine cell synaptic connections. This work will show how this structure may be employed for motion detection

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the spinal cord of the anesthetized cat, spontaneous cord dorsum potentials (CDPs) appear synchronously along the lumbo-sacral segments. These CDPs have different shapes and magnitudes. Previous work has indicated that some CDPs appear to be specially associated with the activation of spinal pathways that lead to primary afferent depolarization and presynaptic inhibition. Visual detection and classification of these CDPs provides relevant information on the functional organization of the neural networks involved in the control of sensory information and allows the characterization of the changes produced by acute nerve and spinal lesions. We now present a novel feature extraction approach for signal classification, applied to CDP detection. The method is based on an intuitive procedure. We first remove by convolution the noise from the CDPs recorded in each given spinal segment. Then, we assign a coefficient for each main local maximum of the signal using its amplitude and distance to the most important maximum of the signal. These coefficients will be the input for the subsequent classification algorithm. In particular, we employ gradient boosting classification trees. This combination of approaches allows a faster and more accurate discrimination of CDPs than is obtained by other methods.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper proposes a new method, oriented to image real-time processing, for identifying crop rows in maize fields in the images. The vision system is designed to be installed onboard a mobile agricultural vehicle, that is, submitted to gyros, vibrations, and undesired movements. The images are captured under image perspective, being affected by the above undesired effects. The image processing consists of two main processes: image segmentation and crop row detection. The first one applies a threshold to separate green plants or pixels (crops and weeds) from the rest (soil, stones, and others). It is based on a fuzzy clustering process, which allows obtaining the threshold to be applied during the normal operation process. The crop row detection applies a method based on image perspective projection that searches for maximum accumulation of segmented green pixels along straight alignments. They determine the expected crop lines in the images. The method is robust enough to work under the above-mentioned undesired effects. It is favorably compared against the well-tested Hough transformation for line detection.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

One of the main challenges for intelligent vehicles is the capability of detecting other vehicles in their environment, which constitute the main source of accidents. Specifically, many methods have been proposed in the literature for video-based vehicle detection. Most of them perform supervised classification using some appearance-related feature, in particular, symmetry has been extensively utilized. However, an in-depth analysis of the classification power of this feature is missing. As a first contribution of this paper, a thorough study of the classification performance of symmetry is presented within a Bayesian decision framework. This study reveals that the performance of symmetry-based classification is very limited. Therefore, as a second contribution, a new gradient-based descriptor is proposed for vehicle detection. This descriptor exploits the known rectangular structure of vehicle rears within a Histogram of Gradients (HOG)-based framework. Experiments show that the proposed descriptor outperforms largely symmetry as a feature for vehicle verification, achieving classification rates over 90%.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes the participation of DAEDALUS at ImageCLEF 2011 Plant Identification task. The task is evaluated as a supervised classification problem over 71 tree species from the French Mediterranean area used as class labels, based on visual content from scan, scan-like and natural photo images. Our approach to this task is to build a classifier based on the detection of keypoints from the images extracted using Lowe’s Scale Invariant Feature Transform (SIFT) algorithm. Although our overall classification score is very low as compared to other participant groups, the main conclusion that can be drawn is that SIFT keypoints seem to work significantly better for photos than for the other image types, so our approach may be a feasible strategy for the classification of this kind of visual content.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, two techniques to control UAVs (Unmanned Aerial Vehicles), based on visual information are presented. The first one is based on the detection and tracking of planar structures from an on-board camera, while the second one is based on the detection and 3D reconstruction of the position of the UAV based on an external camera system. Both strategies are tested with a VTOL (Vertical take-off and landing) UAV, and results show good behavior of the visual systems (precision in the estimation and frame rate) when estimating the helicopter¿s position and using the extracted information to control the UAV.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sensing systems in living bodies offer a large variety of possible different configurations and philosophies able to be emulated in artificial sensing systems. Motion detection is one of the areas where different animals adopt different solutions and, in most of the cases, these solutions reflect a very sophisticated form. One of them, the mammalian visual system, presents several advantages with respect to the artificial ones. The main objective of this paper is to present a system, based on this biological structure, able to detect motion, its sense and its characteristics. The configuration adopted responds to the internal structure of the mammalian retina, where just five types of cells arranged in five layers are able to differentiate a large number of characteristics of the image impinging onto it. Its main advantage is that the detection of these properties is based purely on its hardware. A simple unit, based in a previous optical logic cell employed in optical computing, is the basis for emulating the different behaviors of the biological neurons. No software is present and, in this way, no possible interference from outside affects to the final behavior. This type of structure is able to work, once the internal configuration is implemented, without any further attention. Different possibilities are present in the architecture to be presented: detection of motion, of its direction and intensity. Moreover, some other characteristics, as symmetry may be obtained.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

La medida de calidad de vídeo sigue siendo necesaria para definir los criterios que caracterizan una señal que cumpla los requisitos de visionado impuestos por el usuario. Las nuevas tecnologías, como el vídeo 3D estereoscópico o formatos más allá de la alta definición, imponen nuevos criterios que deben ser analizadas para obtener la mayor satisfacción posible del usuario. Entre los problemas detectados durante el desarrollo de esta tesis doctoral se han determinado fenómenos que afectan a distintas fases de la cadena de producción audiovisual y tipo de contenido variado. En primer lugar, el proceso de generación de contenidos debe encontrarse controlado mediante parámetros que eviten que se produzca el disconfort visual y, consecuentemente, fatiga visual, especialmente en lo relativo a contenidos de 3D estereoscópico, tanto de animación como de acción real. Por otro lado, la medida de calidad relativa a la fase de compresión de vídeo emplea métricas que en ocasiones no se encuentran adaptadas a la percepción del usuario. El empleo de modelos psicovisuales y diagramas de atención visual permitirían ponderar las áreas de la imagen de manera que se preste mayor importancia a los píxeles que el usuario enfocará con mayor probabilidad. Estos dos bloques se relacionan a través de la definición del término saliencia. Saliencia es la capacidad del sistema visual para caracterizar una imagen visualizada ponderando las áreas que más atractivas resultan al ojo humano. La saliencia en generación de contenidos estereoscópicos se refiere principalmente a la profundidad simulada mediante la ilusión óptica, medida en términos de distancia del objeto virtual al ojo humano. Sin embargo, en vídeo bidimensional, la saliencia no se basa en la profundidad, sino en otros elementos adicionales, como el movimiento, el nivel de detalle, la posición de los píxeles o la aparición de caras, que serán los factores básicos que compondrán el modelo de atención visual desarrollado. Con el objetivo de detectar las características de una secuencia de vídeo estereoscópico que, con mayor probabilidad, pueden generar disconfort visual, se consultó la extensa literatura relativa a este tema y se realizaron unas pruebas subjetivas preliminares con usuarios. De esta forma, se llegó a la conclusión de que se producía disconfort en los casos en que se producía un cambio abrupto en la distribución de profundidades simuladas de la imagen, aparte de otras degradaciones como la denominada “violación de ventana”. A través de nuevas pruebas subjetivas centradas en analizar estos efectos con diferentes distribuciones de profundidades, se trataron de concretar los parámetros que definían esta imagen. Los resultados de las pruebas demuestran que los cambios abruptos en imágenes se producen en entornos con movimientos y disparidades negativas elevadas que producen interferencias en los procesos de acomodación y vergencia del ojo humano, así como una necesidad en el aumento de los tiempos de enfoque del cristalino. En la mejora de las métricas de calidad a través de modelos que se adaptan al sistema visual humano, se realizaron también pruebas subjetivas que ayudaron a determinar la importancia de cada uno de los factores a la hora de enmascarar una determinada degradación. Los resultados demuestran una ligera mejora en los resultados obtenidos al aplicar máscaras de ponderación y atención visual, los cuales aproximan los parámetros de calidad objetiva a la respuesta del ojo humano. ABSTRACT Video quality assessment is still a necessary tool for defining the criteria to characterize a signal with the viewing requirements imposed by the final user. New technologies, such as 3D stereoscopic video and formats of HD and beyond HD oblige to develop new analysis of video features for obtaining the highest user’s satisfaction. Among the problems detected during the process of this doctoral thesis, it has been determined that some phenomena affect to different phases in the audiovisual production chain, apart from the type of content. On first instance, the generation of contents process should be enough controlled through parameters that avoid the occurrence of visual discomfort in observer’s eye, and consequently, visual fatigue. It is especially necessary controlling sequences of stereoscopic 3D, with both animation and live-action contents. On the other hand, video quality assessment, related to compression processes, should be improved because some objective metrics are adapted to user’s perception. The use of psychovisual models and visual attention diagrams allow the weighting of image regions of interest, giving more importance to the areas which the user will focus most probably. These two work fields are related together through the definition of the term saliency. Saliency is the capacity of human visual system for characterizing an image, highlighting the areas which result more attractive to the human eye. Saliency in generation of 3DTV contents refers mainly to the simulated depth of the optic illusion, i.e. the distance from the virtual object to the human eye. On the other hand, saliency is not based on virtual depth, but on other features, such as motion, level of detail, position of pixels in the frame or face detection, which are the basic features that are part of the developed visual attention model, as demonstrated with tests. Extensive literature involving visual comfort assessment was looked up, and the development of new preliminary subjective assessment with users was performed, in order to detect the features that increase the probability of discomfort to occur. With this methodology, the conclusions drawn confirmed that one common source of visual discomfort was when an abrupt change of disparity happened in video transitions, apart from other degradations, such as window violation. New quality assessment was performed to quantify the distribution of disparities over different sequences. The results confirmed that abrupt changes in negative parallax environment produce accommodation-vergence mismatches derived from the increasing time for human crystalline to focus the virtual objects. On the other side, for developing metrics that adapt to human visual system, additional subjective tests were developed to determine the importance of each factor, which masks a concrete distortion. Results demonstrated slight improvement after applying visual attention to objective metrics. This process of weighing pixels approximates the quality results to human eye’s response.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A more natural, intuitive, user-friendly, and less intrusive Human–Computer interface for controlling an application by executing hand gestures is presented. For this purpose, a robust vision-based hand-gesture recognition system has been developed, and a new database has been created to test it. The system is divided into three stages: detection, tracking, and recognition. The detection stage searches in every frame of a video sequence potential hand poses using a binary Support Vector Machine classifier and Local Binary Patterns as feature vectors. These detections are employed as input of a tracker to generate a spatio-temporal trajectory of hand poses. Finally, the recognition stage segments a spatio-temporal volume of data using the obtained trajectories, and compute a video descriptor called Volumetric Spatiograms of Local Binary Patterns (VS-LBP), which is delivered to a bank of SVM classifiers to perform the gesture recognition. The VS-LBP is a novel video descriptor that constitutes one of the most important contributions of the paper, which is able to provide much richer spatio-temporal information than other existing approaches in the state of the art with a manageable computational cost. Excellent results have been obtained outperforming other approaches of the state of the art.