841 resultados para visual object detection
Resumo:
Introduction: In team sports the ability to use peripheral vision is essential to track a number of players and the ball. By using eye-tracking devices it was found that players either use fixations and saccades to process information on the pitch or use smooth pursuit eye movements (SPEM) to keep track of single objects (Schütz, Braun, & Gegenfurtner, 2011). However, it is assumed that peripheral vision can be used best when the gaze is stable while it is unknown whether motion changes can be equally well detected when SPEM are used especially because contrast sensitivity is reduced during SPEM (Schütz, Delipetkose, Braun, Kerzel, & Gegenfurtner, 2007). Therefore, peripheral motion change detection will be examined by contrasting a fixation condition with a SPEM condition. Methods: 13 participants (7 male, 6 female) were presented with a visual display consisting of 15 white and 1 red square. Participants were instructed to follow the red square with their eyes and press a button as soon as a white square begins to move. White square movements occurred either when the red square was still (fixation condition) or moving in a circular manner with 6 °/s (pursuit condition). The to-be-detected white square movements varied in eccentricity (4 °, 8 °, 16 °) and speed (1 °/s, 2 °/s, 4 °/s) while movement time of white squares was constant at 500 ms. 180 events should be detected in total. A Vicon-integrated eye-tracking system and a button press (1000 Hz) was used to control for eye-movements and measure detection rates and response times. Response times (ms) and missed detections (%) were measured as dependent variables and analysed with a 2 (manipulation) x 3 (eccentricity) x 3 (speed) ANOVA with repeated measures on all factors. Results: Significant response time effects were found for manipulation, F(1,12) = 224.31, p < .01, ηp2 = .95, eccentricity, F(2,24) = 56.43; p < .01, ηp2 = .83, and the interaction between the two factors, F(2,24) = 64.43; p < .01, ηp2 = .84. Response times increased as a function of eccentricity for SPEM only and were overall higher than in the fixation condition. Results further showed missed events effects for manipulation, F(1,12) = 37.14; p < .01, ηp2 = .76, eccentricity, F(2,24) = 44.90; p < .01, ηp2 = .79, the interaction between the two factors, F(2,24) = 39.52; p < .01, ηp2 = .77 and the three-way interaction manipulation x eccentricity x speed, F(2,24) = 3.01; p = .03, ηp2 = .20. While less than 2% of events were missed on average in the fixation condition as well as at 4° and 8° eccentricity in the SPEM condition, missed events increased for SPEM at 16 ° eccentricity with significantly more missed events in the 4 °/s speed condition (1 °/s: M = 34.69, SD = 20.52; 2 °/s: M = 33.34, SD = 19.40; 4 °/s: M = 39.67, SD = 19.40). Discussion: It could be shown that using SPEM impairs the ability to detect peripheral motion changes at the far periphery and that fixations not only help to detect these motion changes but also to respond faster. Due to high temporal constraints especially in team sports like soccer or basketball, fast reaction are necessary for successful anticipation and decision making. Thus, it is advised to anchor gaze at a specific location if peripheral changes (e.g. movements of other players) that require a motor response have to be detected. In contrast, SPEM should only be used if a single object, like the ball in cricket or baseball, is necessary for a successful motor response. References: Schütz, A. C., Braun, D. I., & Gegenfurtner, K. R. (2011). Eye movements and perception: A selective review. Journal of Vision, 11, 1-30. Schütz, A. C., Delipetkose, E., Braun, D. I., Kerzel, D., & Gegenfurtner, K. R. (2007). Temporal contrast sensitivity during smooth pursuit eye movements. Journal of Vision, 7, 1-15.
Resumo:
In the current study it is investigated whether peripheral vision can be used to monitor multi-ple moving objects and to detect single-target changes. For this purpose, in Experiment 1, a modified MOT setup with a large projection and a constant-position centroid phase had to be checked first. Classical findings regarding the use of a virtual centroid to track multiple ob-jects and the dependency of tracking accuracy on target speed could be successfully replicat-ed. Thereafter, the main experimental variations regarding the manipulation of to-be-detected target changes could be introduced in Experiment 2. In addition to a button press used for the detection task, gaze behavior was assessed using an integrated eye-tracking system. The anal-ysis of saccadic reaction times in relation to the motor response shows that peripheral vision is naturally used to detect motion and form changes in MOT because the saccade to the target occurred after target-change offset. Furthermore, for changes of comparable task difficulties, motion changes are detected better by peripheral vision than form changes. Findings indicate that capabilities of the visual system (e.g., visual acuity) affect change detection rates and that covert-attention processes may be affected by vision-related aspects like spatial uncertainty. Moreover, it is argued that a centroid-MOT strategy might reduce the amount of saccade-related costs and that eye-tracking seems to be generally valuable to test predictions derived from theories on MOT. Finally, implications for testing covert attention in applied settings are proposed.
Resumo:
This article presents a novel system and a control strategy for visual following of a 3D moving object by an Unmanned Aerial Vehicle UAV. The presented strategy is based only on the visual information given by an adaptive tracking method based on the color information, which jointly with the dynamics of a camera fixed to a rotary wind UAV are used to develop an Image-based visual servoing IBVS system. This system is focused on continuously following a 3D moving target object, maintaining it with a fixed distance and centered on the image plane. The algorithm is validated on real flights on outdoors scenarios, showing the robustness of the proposed systems against winds perturbations, illumination and weather changes among others. The obtained results indicate that the proposed algorithms is suitable for complex controls task, such object following and pursuit, flying in formation, as well as their use for indoor navigation
Resumo:
This article presents a visual servoing system to follow a 3D moving object by a Micro Unmanned Aerial Vehicle (MUAV). The presented control strategy is based only on the visual information given by an adaptive tracking method based on the colour information. A visual fuzzy system has been developed for servoing the camera situated on a rotary wing MAUV, that also considers its own dynamics. This system is focused on continuously following of an aerial moving target object, maintaining it with a fixed safe distance and centred on the image plane. The algorithm is validated on real flights on outdoors scenarios, showing the robustness of the proposed systems against winds perturbations, illumination and weather changes among others. The obtained results indicate that the proposed algorithms is suitable for complex controls task, such object following and pursuit, flying in formation, as well as their use for indoor navigation
Resumo:
This paper proposes a new method, oriented to image real-time processing, for identifying crop rows in maize fields in the images. The vision system is designed to be installed onboard a mobile agricultural vehicle, that is, submitted to gyros, vibrations, and undesired movements. The images are captured under image perspective, being affected by the above undesired effects. The image processing consists of two main processes: image segmentation and crop row detection. The first one applies a threshold to separate green plants or pixels (crops and weeds) from the rest (soil, stones, and others). It is based on a fuzzy clustering process, which allows obtaining the threshold to be applied during the normal operation process. The crop row detection applies a method based on image perspective projection that searches for maximum accumulation of segmented green pixels along straight alignments. They determine the expected crop lines in the images. The method is robust enough to work under the above-mentioned undesired effects. It is favorably compared against the well-tested Hough transformation for line detection.
Resumo:
La medida de calidad de vídeo sigue siendo necesaria para definir los criterios que caracterizan una señal que cumpla los requisitos de visionado impuestos por el usuario. Las nuevas tecnologías, como el vídeo 3D estereoscópico o formatos más allá de la alta definición, imponen nuevos criterios que deben ser analizadas para obtener la mayor satisfacción posible del usuario. Entre los problemas detectados durante el desarrollo de esta tesis doctoral se han determinado fenómenos que afectan a distintas fases de la cadena de producción audiovisual y tipo de contenido variado. En primer lugar, el proceso de generación de contenidos debe encontrarse controlado mediante parámetros que eviten que se produzca el disconfort visual y, consecuentemente, fatiga visual, especialmente en lo relativo a contenidos de 3D estereoscópico, tanto de animación como de acción real. Por otro lado, la medida de calidad relativa a la fase de compresión de vídeo emplea métricas que en ocasiones no se encuentran adaptadas a la percepción del usuario. El empleo de modelos psicovisuales y diagramas de atención visual permitirían ponderar las áreas de la imagen de manera que se preste mayor importancia a los píxeles que el usuario enfocará con mayor probabilidad. Estos dos bloques se relacionan a través de la definición del término saliencia. Saliencia es la capacidad del sistema visual para caracterizar una imagen visualizada ponderando las áreas que más atractivas resultan al ojo humano. La saliencia en generación de contenidos estereoscópicos se refiere principalmente a la profundidad simulada mediante la ilusión óptica, medida en términos de distancia del objeto virtual al ojo humano. Sin embargo, en vídeo bidimensional, la saliencia no se basa en la profundidad, sino en otros elementos adicionales, como el movimiento, el nivel de detalle, la posición de los píxeles o la aparición de caras, que serán los factores básicos que compondrán el modelo de atención visual desarrollado. Con el objetivo de detectar las características de una secuencia de vídeo estereoscópico que, con mayor probabilidad, pueden generar disconfort visual, se consultó la extensa literatura relativa a este tema y se realizaron unas pruebas subjetivas preliminares con usuarios. De esta forma, se llegó a la conclusión de que se producía disconfort en los casos en que se producía un cambio abrupto en la distribución de profundidades simuladas de la imagen, aparte de otras degradaciones como la denominada “violación de ventana”. A través de nuevas pruebas subjetivas centradas en analizar estos efectos con diferentes distribuciones de profundidades, se trataron de concretar los parámetros que definían esta imagen. Los resultados de las pruebas demuestran que los cambios abruptos en imágenes se producen en entornos con movimientos y disparidades negativas elevadas que producen interferencias en los procesos de acomodación y vergencia del ojo humano, así como una necesidad en el aumento de los tiempos de enfoque del cristalino. En la mejora de las métricas de calidad a través de modelos que se adaptan al sistema visual humano, se realizaron también pruebas subjetivas que ayudaron a determinar la importancia de cada uno de los factores a la hora de enmascarar una determinada degradación. Los resultados demuestran una ligera mejora en los resultados obtenidos al aplicar máscaras de ponderación y atención visual, los cuales aproximan los parámetros de calidad objetiva a la respuesta del ojo humano. ABSTRACT Video quality assessment is still a necessary tool for defining the criteria to characterize a signal with the viewing requirements imposed by the final user. New technologies, such as 3D stereoscopic video and formats of HD and beyond HD oblige to develop new analysis of video features for obtaining the highest user’s satisfaction. Among the problems detected during the process of this doctoral thesis, it has been determined that some phenomena affect to different phases in the audiovisual production chain, apart from the type of content. On first instance, the generation of contents process should be enough controlled through parameters that avoid the occurrence of visual discomfort in observer’s eye, and consequently, visual fatigue. It is especially necessary controlling sequences of stereoscopic 3D, with both animation and live-action contents. On the other hand, video quality assessment, related to compression processes, should be improved because some objective metrics are adapted to user’s perception. The use of psychovisual models and visual attention diagrams allow the weighting of image regions of interest, giving more importance to the areas which the user will focus most probably. These two work fields are related together through the definition of the term saliency. Saliency is the capacity of human visual system for characterizing an image, highlighting the areas which result more attractive to the human eye. Saliency in generation of 3DTV contents refers mainly to the simulated depth of the optic illusion, i.e. the distance from the virtual object to the human eye. On the other hand, saliency is not based on virtual depth, but on other features, such as motion, level of detail, position of pixels in the frame or face detection, which are the basic features that are part of the developed visual attention model, as demonstrated with tests. Extensive literature involving visual comfort assessment was looked up, and the development of new preliminary subjective assessment with users was performed, in order to detect the features that increase the probability of discomfort to occur. With this methodology, the conclusions drawn confirmed that one common source of visual discomfort was when an abrupt change of disparity happened in video transitions, apart from other degradations, such as window violation. New quality assessment was performed to quantify the distribution of disparities over different sequences. The results confirmed that abrupt changes in negative parallax environment produce accommodation-vergence mismatches derived from the increasing time for human crystalline to focus the virtual objects. On the other side, for developing metrics that adapt to human visual system, additional subjective tests were developed to determine the importance of each factor, which masks a concrete distortion. Results demonstrated slight improvement after applying visual attention to objective metrics. This process of weighing pixels approximates the quality results to human eye’s response.
Resumo:
In optimal foraging theory, search time is a key variable defining the value of a prey type. But the sensory-perceptual processes that constrain the search for food have rarely been considered. Here we evaluate the flight behavior of bumblebees (Bombus terrestris) searching for artificial flowers of various sizes and colors. When flowers were large, search times correlated well with the color contrast of the targets with their green foliage-type background, as predicted by a model of color opponent coding using inputs from the bees' UV, blue, and green receptors. Targets that made poor color contrast with their backdrop, such as white, UV-reflecting ones, or red flowers, took longest to detect, even though brightness contrast with the background was pronounced. When searching for small targets, bees changed their strategy in several ways. They flew significantly slower and closer to the ground, so increasing the minimum detectable area subtended by an object on the ground. In addition, they used a different neuronal channel for flower detection. Instead of color contrast, they used only the green receptor signal for detection. We relate these findings to temporal and spatial limitations of different neuronal channels involved in stimulus detection and recognition. Thus, foraging speed may not be limited only by factors such as prey density, flight energetics, and scramble competition. Our results show that understanding the behavioral ecology of foraging can substantially gain from knowledge about mechanisms of visual information processing.
Resumo:
The purpose of the present study was to investigate by using positron emission tomography (PET) whether the cortical pathways that are involved in visual perception of spatial location and object identity are also differentially implicated in retrieval of these types of information from episodic long-term memory. Subjects studied a set of displays consisting of three unique representational line drawings arranged in different spatial configurations. Later, while undergoing PET scanning, subjects' memory for spatial location and identity of the objects in the displays was tested and compared to a perceptual baseline task involving the same displays. In comparison to the baseline task, each of the memory tasks activated both the dorsal and the ventral pathways in the right hemisphere but not to an equal extent. There was also activation of the right prefrontal cortex. When PET scans of the memory tasks were compared to each other, areas of activation were very circumscribed and restricted to the right hemisphere: For retrieval of object identity, the area was in the inferior temporal cortex in the region of the fusiform gyrus (area 37), whereas for retrieval of spatial location, it was in the inferior parietal lobule in the region of the supramarginal gyrus (area 40). Thus, our study shows that distinct neural pathways are activated during retrieval of information about spatial location and object identity from long-term memory.
Resumo:
In this paper, we present a novel coarse-to-fine visual localization approach: contextual visual localization. This approach relies on three elements: (i) a minimal-complexity classifier for performing fast coarse localization (submap classification); (ii) an optimized saliency detector which exploits the visual statistics of the submap; and (iii) a fast view-matching algorithm which filters initial matchings with a structural criterion. The latter algorithm yields fine localization. Our experiments show that these elements have been successfully integrated for solving the global localization problem. Context, that is, the awareness of being in a particular submap, is defined by a supervised classifier tuned for a minimal set of features. Visual context is exploited both for tuning (optimizing) the saliency detection process, and to select potential matching views in the visual database, close enough to the query view.
Resumo:
New low cost sensors and open free libraries for 3D image processing are making important advances in robot vision applications possible, such as three-dimensional object recognition, semantic mapping, navigation and localization of robots, human detection and/or gesture recognition for human-machine interaction. In this paper, a novel method for recognizing and tracking the fingers of a human hand is presented. This method is based on point clouds from range images captured by a RGBD sensor. It works in real time and it does not require visual marks, camera calibration or previous knowledge of the environment. Moreover, it works successfully even when multiple objects appear in the scene or when the ambient light is changed. Furthermore, this method was designed to develop a human interface to control domestic or industrial devices, remotely. In this paper, the method was tested by operating a robotic hand. Firstly, the human hand was recognized and the fingers were detected. Secondly, the movement of the fingers was analysed and mapped to be imitated by a robotic hand.
Resumo:
3D sensors provides valuable information for mobile robotic tasks like scene classification or object recognition, but these sensors often produce noisy data that makes impossible applying classical keypoint detection and feature extraction techniques. Therefore, noise removal and downsampling have become essential steps in 3D data processing. In this work, we propose the use of a 3D filtering and down-sampling technique based on a Growing Neural Gas (GNG) network. GNG method is able to deal with outliers presents in the input data. These features allows to represent 3D spaces, obtaining an induced Delaunay Triangulation of the input space. Experiments show how the state-of-the-art keypoint detectors improve their performance using GNG output representation as input data. Descriptors extracted on improved keypoints perform better matching in robotics applications as 3D scene registration.
Resumo:
During grasping and intelligent robotic manipulation tasks, the camera position relative to the scene changes dramatically because the robot is moving to adapt its path and correctly grasp objects. This is because the camera is mounted at the robot effector. For this reason, in this type of environment, a visual recognition system must be implemented to recognize and “automatically and autonomously” obtain the positions of objects in the scene. Furthermore, in industrial environments, all objects that are manipulated by robots are made of the same material and cannot be differentiated by features such as texture or color. In this work, first, a study and analysis of 3D recognition descriptors has been completed for application in these environments. Second, a visual recognition system designed from specific distributed client-server architecture has been proposed to be applied in the recognition process of industrial objects without these appearance features. Our system has been implemented to overcome problems of recognition when the objects can only be recognized by geometric shape and the simplicity of shapes could create ambiguity. Finally, some real tests are performed and illustrated to verify the satisfactory performance of the proposed system.
Resumo:
National Highway Traffic Safety Administration, Washington, D.C.
Resumo:
National Highway Traffic Safety Administration, Office of Driver and Pedestrian Research, Washington, D.C.
Resumo:
Primates have X chromosome genes for cone photopigments with sensitivity maxima from 535 to 562 nm. Old World monkeys and apes (catarrhines) and the New World ( platyrrhine) genus Alouatta have separate genes for 535-nm ( medium wavelength; M) and 562-nm ( long wavelength; L) pigments. These pigments, together with a 425-nm ( short wavelength) pigment, permit trichromatic color vision. Other platyrrhines and prosimians have a single X chromosome gene but often with alleles for two or three M/L photopigments. Consequently, heterozygote females are trichromats, but males and homozygote females are dichromats. The criteria that affect the evolution of M/L alleles and maintain genetic polymorphism remain a puzzle, but selection for finding food may be important. We compare different types of color vision for detecting more than 100 plant species consumed by tamarins ( Saguinus spp.) in Peru. There is evidence that both frequency-dependent selection on homozygotes and heterozygote advantage favor M/L polymorphism and that trichromatic color vision is most advantageous in dim light. Also, whereas the 562-nm allele is present in all species, the occurrence of 535- to 556-nm alleles varies between species. This variation probably arises because trichromatic color vision favors widely separated pigments and equal frequencies of 535/543- and 562-nm alleles, whereas in dichromats, long-wavelength pigment alleles are fitter.