939 resultados para visual object detection
Resumo:
Nowadays, road accidents are a major public health problem, which increase is forecasted if road safety is not treated properly, dying about 1.2 million people every year around the globe. In 2012, Portugal recorded 573 fatalities in road accidents, on site, revealing the largest decreasing of the European Union for 2011, along with Denmark. Beyond the impact caused by fatalities, it was calculated that the economic and social costs of road accidents weighted about 1.17% of the Portuguese gross domestic product in 2010. Visual Analytics allows the combination of data analysis techniques with interactive visualizations, which facilitates the process of knowledge discovery in sets of large and complex data, while the Geovisual Analytics facilitates the exploration of space-time data through maps with different variables and parameters that are under analysis. In Portugal, the identification of road accident accumulation zones, in this work named black spots, has been restricted to annual fixed windows. In this work, it is presented a dynamic approach based on Visual Analytics techniques that is able to identify the displacement of black spots on sliding windows of 12 months. Moreover, with the use of different parameterizations in the formula usually used to detect black spots, it is possible to identify zones that are almost becoming black spots. Through the proposed visualizations, the study and identification of countermeasures to this social and economic problem can gain new grounds and thus the decision- making process is supported and improved.
Resumo:
This work uses computer vision algorithms related to features in the identification of medicine boxes for the visually impaired. The system is for people who have a disease that compromises his vision, hindering the identification of the correct medicine to be ingested. We use the camera, available in several popular devices such as computers, televisions and phones, to identify the box of the correct medicine and audio through the image, showing the poor information about the medication, such: as the dosage, indication and contraindications of the medication. We utilize a model of object detection using algorithms to identify the features in the boxes of drugs and playing the audio at the time of detection of feauteres in those boxes. Experiments carried out with 15 people show that where 93 % think that the system is useful and very helpful in identifying drugs for boxes. So, it is necessary to make use of this technology to help several people with visual impairments to take the right medicine, at the time indicated in advance by the physician
Resumo:
BACKGROUND: Central and peripheral vision is needed for object detection. Previous research has shown that visual target detection is affected by age. In addition, light conditions also influence visual exploration. The aim of the study was to investigate the effects of age and different light conditions on visual exploration behavior and on driving performance during simulated driving. METHODS: A fixed-base simulator with 180 degree field of view was used to simulate a motorway route under daylight and night conditions to test 29 young subjects (25-40 years) and 27 older subjects (65-78 years). Drivers' eye fixations were analyzed and assigned to regions of interests (ROI) such as street, road signs, car ahead, environment, rear view mirror, side mirror left, side mirror right, incoming car, parked car, road repair. In addition, lane-keeping and driving speed were analyzed as a measure of driving performance. RESULTS: Older drivers had longer fixations on the task relevant ROI, but had a lower frequency of checking mirrors when compared to younger drivers. In both age groups, night driving led to a less fixations on the mirror. At the performance level, older drivers showed more variation in driving speed and lane-keeping behavior, which was especially prominent at night. In younger drivers, night driving had no impact on driving speed or lane-keeping behavior. CONCLUSIONS: Older drivers' visual exploration behavior are more fixed on the task relevant ROI, especially at night, when driving performance becomes more heterogeneous than in younger drivers.
Resumo:
Hand detection on images has important applications on person activities recognition. This thesis focuses on PASCAL Visual Object Classes (VOC) system for hand detection. VOC has become a popular system for object detection, based on twenty common objects, and has been released with a successful deformable parts model in VOC2007. A hand detection on an image is made when the system gets a bounding box which overlaps with at least 50% of any ground truth bounding box for a hand on the image. The initial average precision of this detector is around 0.215 compared with a state-of-art of 0.104; however, color and frequency features for detected bounding boxes contain important information for re-scoring, and the average precision can be improved to 0.218 with these features. Results show that these features help on getting higher precision for low recall, even though the average precision is similar.
Resumo:
Report for the scientific sojourn at the Swiss Federal Institute of Technology Zurich, Switzerland, between September and December 2007. In order to make robots useful assistants for our everyday life, the ability to learn and recognize objects is of essential importance. However, object recognition in real scenes is one of the most challenging problems in computer vision, as it is necessary to deal with difficulties. Furthermore, in mobile robotics a new challenge is added to the list: computational complexity. In a dynamic world, information about the objects in the scene can become obsolete before it is ready to be used if the detection algorithm is not fast enough. Two recent object recognition techniques have achieved notable results: the constellation approach proposed by Lowe and the bag of words approach proposed by Nistér and Stewénius. The Lowe constellation approach is the one currently being used in the robot localization project of the COGNIRON project. This report is divided in two main sections. The first section is devoted to briefly review the currently used object recognition system, the Lowe approach, and bring to light the drawbacks found for object recognition in the context of indoor mobile robot navigation. Additionally the proposed improvements for the algorithm are described. In the second section the alternative bag of words method is reviewed, as well as several experiments conducted to evaluate its performance with our own object databases. Furthermore, some modifications to the original algorithm to make it suitable for object detection in unsegmented images are proposed.
Resumo:
Past multisensory experiences can influence current unisensory processing and memory performance. Repeated images are better discriminated if initially presented as auditory-visual pairs, rather than only visually. An experience's context thus plays a role in how well repetitions of certain aspects are later recognized. Here, we investigated factors during the initial multisensory experience that are essential for generating improved memory performance. Subjects discriminated repeated versus initial image presentations intermixed within a continuous recognition task. Half of initial presentations were multisensory, and all repetitions were only visual. Experiment 1 examined whether purely episodic multisensory information suffices for enhancing later discrimination performance by pairing visual objects with either tones or vibrations. We could therefore also assess whether effects can be elicited with different sensory pairings. Experiment 2 examined semantic context by manipulating the congruence between auditory and visual object stimuli within blocks of trials. Relative to images only encountered visually, accuracy in discriminating image repetitions was significantly impaired by auditory-visual, yet unaffected by somatosensory-visual multisensory memory traces. By contrast, this accuracy was selectively enhanced for visual stimuli with semantically congruent multisensory pasts and unchanged for those with semantically incongruent multisensory pasts. The collective results reveal opposing effects of purely episodic versus semantic information from auditory-visual multisensory events. Nonetheless, both types of multisensory memory traces are accessible for processing incoming stimuli and indeed result in distinct visual object processing, leading to either impaired or enhanced performance relative to unisensory memory traces. We discuss these results as supporting a model of object-based multisensory interactions.
Resumo:
We perceive our environment through multiple sensory channels. Nonetheless, research has traditionally focused on the investigation of sensory processing within single modalities. Thus, investigating how our brain integrates multisensory information is of crucial importance for understanding how organisms cope with a constantly changing and dynamic environment. During my thesis I have investigated how multisensory events impact our perception and brain responses, either when auditory-visual stimuli were presented simultaneously or how multisensory events at one point in time impact later unisensory processing. In "Looming signals reveal synergistic principles of multisensory integration" (Cappe, Thelen et al., 2012) we investigated the neuronal substrates involved in motion detection in depth under multisensory vs. unisensory conditions. We have shown that congruent auditory-visual looming (i.e. approaching) signals are preferentially integrated by the brain. Further, we show that early effects under these conditions are relevant for behavior, effectively speeding up responses to these combined stimulus presentations. In "Electrical neuroimaging of memory discrimination based on single-trial multisensory learning" (Thelen et al., 2012), we investigated the behavioral impact of single encounters with meaningless auditory-visual object parings upon subsequent visual object recognition. In addition to showing that these encounters lead to impaired recognition accuracy upon repeated visual presentations, we have shown that the brain discriminates images as soon as ~100ms post-stimulus onset according to the initial encounter context. In "Single-trial multisensory memories affect later visual and auditory object recognition" (Thelen et al., in review) we have addressed whether auditory object recognition is affected by single-trial multisensory memories, and whether recognition accuracy of sounds was similarly affected by the initial encounter context as visual objects. We found that this is in fact the case. We propose that a common underlying brain network is differentially involved during encoding and retrieval of images and sounds based on our behavioral findings. - Nous percevons l'environnement qui nous entoure à l'aide de plusieurs organes sensoriels. Antérieurement, la recherche sur la perception s'est focalisée sur l'étude des systèmes sensoriels indépendamment les uns des autres. Cependant, l'étude des processus cérébraux qui soutiennent l'intégration de l'information multisensorielle est d'une importance cruciale pour comprendre comment notre cerveau travail en réponse à un monde dynamique en perpétuel changement. Pendant ma thèse, j'ai ainsi étudié comment des événements multisensoriels impactent notre perception immédiate et/ou ultérieure et comment ils sont traités par notre cerveau. Dans l'étude " Looming signals reveal synergistic principles of multisensory integration" (Cappe, Thelen et al., 2012), nous nous sommes intéressés aux processus neuronaux impliqués dans la détection de mouvements à l'aide de l'utilisation de stimuli audio-visuels seuls ou combinés. Nos résultats ont montré que notre cerveau intègre de manière préférentielle des stimuli audio-visuels combinés s'approchant de l'observateur. De plus, nous avons montré que des effets précoces, observés au niveau de la réponse cérébrale, influencent notre comportement, en accélérant la détection de ces stimuli. Dans l'étude "Electrical neuroimaging of memory discrimination based on single-trial multisensory learning" (Thelen et al., 2012), nous nous sommes intéressés à l'impact qu'a la présentation d'un stimulus audio-visuel sur l'exactitude de reconnaissance d'une image. Nous avons étudié comment la présentation d'une combinaison audio-visuelle sans signification, impacte, au niveau comportementale et cérébral, sur la reconnaissance ultérieure de l'image. Les résultats ont montré que l'exactitude de la reconnaissance d'images, présentées dans le passé, avec un son sans signification, est inférieure à celle obtenue dans le cas d'images présentées seules. De plus, notre cerveau différencie ces deux types de stimuli très tôt dans le traitement d'images. Dans l'étude "Single-trial multisensory memories affect later visual and auditory object recognition" (Thelen et al., in review), nous nous sommes posés la question si l'exactitude de ia reconnaissance de sons était affectée de manière semblable par la présentation d'événements multisensoriels passés. Ceci a été vérifié par nos résultats. Nous avons proposé que cette similitude puisse être expliquée par le recrutement différentiel d'un réseau neuronal commun.
Resumo:
A recently developed technique, polarimetric radar interferometry, is applied to tackle the problem of the detection of buried objects embedded in surface clutter. An experiment with a fully polarimetric radar in an anechoic chamber has been carried out using different frequency bands and baselines. The processed results show the ability of this technique to detect buried plastic mines and to measure their depth. This technique enables the detection of plastic mines even if their backscatter response is much lower than that of the surface clutter.
Resumo:
This thesis is about detection of local image features. The research topic belongs to the wider area of object detection, which is a machine vision and pattern recognition problem where an object must be detected (located) in an image. State-of-the-art object detection methods often divide the problem into separate interest point detection and local image description steps, but in this thesis a different technique is used, leading to higher quality image features which enable more precise localization. Instead of using interest point detection the landmark positions are marked manually. Therefore, the quality of the image features is not limited by the interest point detection phase and the learning of image features is simplified. The approach combines both interest point detection and local description into one phase for detection. Computational efficiency of the descriptor is therefore important, leaving out many of the commonly used descriptors as unsuitably heavy. Multiresolution Gabor features has been the main descriptor in this thesis and improving their efficiency is a significant part. Actual image features are formed from descriptors by using a classifierwhich can then recognize similar looking patches in new images. The main classifier is based on Gaussian mixture models. Classifiers are used in one-class classifier configuration where there are only positive training samples without explicit background class. The local image feature detection method has been tested with two freely available face detection databases and a proprietary license plate database. The localization performance was very good in these experiments. Other applications applying the same under-lying techniques are also presented, including object categorization and fault detection.
Resumo:
Visual object tracking has been one of the most popular research topics in the field of computer vision recently. Specifically, hand tracking has attracted significant attention since it would enable many useful practical applications. However, hand tracking is still a very challenging problem which cannot be considered solved. The fact that almost every aspect of hand appearance can change is the fundamental reason for this difficulty. This thesis focused on 2D-based hand tracking in high-speed camera videos. During the project, a toolbox for this purpose was collected which contains nine different tracking methods. In the experiments, these methods were tested and compared against each other with both high-speed videos recorded during the project and publicly available normal speed videos. The results revealed that tracking accuracies varied considerably depending on the video and the method. Therefore, no single method was clearly the best in all videos, but three methods, CT, HT, and TLD, performed better than the others overall. Moreover, the results provide insights about the suitability of each method to different types and situations of hand tracking.
Resumo:
This thesis researches automatic traffic sign inventory and condition analysis using machine vision and pattern recognition methods. Automatic traffic sign inventory and condition analysis can be used to more efficient road maintenance, improving the maintenance processes, and to enable intelligent driving systems. Automatic traffic sign detection and classification has been researched before from the viewpoint of self-driving vehicles, driver assistance systems, and the use of signs in mapping services. Machine vision based inventory of traffic signs consists of detection, classification, localization, and condition analysis of traffic signs. The produced machine vision system performance is estimated with three datasets, from which two of have been been collected for this thesis. Based on the experiments almost all traffic signs can be detected, classified, and located and their condition analysed. In future, the inventory system performance has to be verified in challenging conditions and the system has to be pilot tested.
Resumo:
Object detection is a fundamental task of computer vision that is utilized as a core part in a number of industrial and scientific applications, for example, in robotics, where objects need to be correctly detected and localized prior to being grasped and manipulated. Existing object detectors vary in (i) the amount of supervision they need for training, (ii) the type of a learning method adopted (generative or discriminative) and (iii) the amount of spatial information used in the object model (model-free, using no spatial information in the object model, or model-based, with the explicit spatial model of an object). Although some existing methods report good performance in the detection of certain objects, the results tend to be application specific and no universal method has been found that clearly outperforms all others in all areas. This work proposes a novel generative part-based object detector. The generative learning procedure of the developed method allows learning from positive examples only. The detector is based on finding semantically meaningful parts of the object (i.e. a part detector) that can provide additional information to object location, for example, pose. The object class model, i.e. the appearance of the object parts and their spatial variance, constellation, is explicitly modelled in a fully probabilistic manner. The appearance is based on bio-inspired complex-valued Gabor features that are transformed to part probabilities by an unsupervised Gaussian Mixture Model (GMM). The proposed novel randomized GMM enables learning from only a few training examples. The probabilistic spatial model of the part configurations is constructed with a mixture of 2D Gaussians. The appearance of the parts of the object is learned in an object canonical space that removes geometric variations from the part appearance model. Robustness to pose variations is achieved by object pose quantization, which is more efficient than previously used scale and orientation shifts in the Gabor feature space. Performance of the resulting generative object detector is characterized by high recall with low precision, i.e. the generative detector produces large number of false positive detections. Thus a discriminative classifier is used to prune false positive candidate detections produced by the generative detector improving its precision while keeping high recall. Using only a small number of positive examples, the developed object detector performs comparably to state-of-the-art discriminative methods.
Resumo:
This thesis presents a perceptual system for a humanoid robot that integrates abilities such as object localization and recognition with the deeper developmental machinery required to forge those competences out of raw physical experiences. It shows that a robotic platform can build up and maintain a system for object localization, segmentation, and recognition, starting from very little. What the robot starts with is a direct solution to achieving figure/ground separation: it simply 'pokes around' in a region of visual ambiguity and watches what happens. If the arm passes through an area, that area is recognized as free space. If the arm collides with an object, causing it to move, the robot can use that motion to segment the object from the background. Once the robot can acquire reliable segmented views of objects, it learns from them, and from then on recognizes and segments those objects without further contact. Both low-level and high-level visual features can also be learned in this way, and examples are presented for both: orientation detection and affordance recognition, respectively. The motivation for this work is simple. Training on large corpora of annotated real-world data has proven crucial for creating robust solutions to perceptual problems such as speech recognition and face detection. But the powerful tools used during training of such systems are typically stripped away at deployment. Ideally they should remain, particularly for unstable tasks such as object detection, where the set of objects needed in a task tomorrow might be different from the set of objects needed today. The key limiting factor is access to training data, but as this thesis shows, that need not be a problem on a robotic platform that can actively probe its environment, and carry out experiments to resolve ambiguity. This work is an instance of a general approach to learning a new perceptual judgment: find special situations in which the perceptual judgment is easy and study these situations to find correlated features that can be observed more generally.
Resumo:
This paper describes a new approach to detect and track maritime objects in real time. The approach particularly addresses the highly dynamic maritime environment, panning cameras, target scale changes, and operates on both visible and thermal imagery. Object detection is based on agglomerative clustering of temporally stable features. Object extents are first determined based on persistence of detected features and their relative separation and motion attributes. An explicit cluster merging and splitting process handles object creation and separation. Stable object clus- ters are tracked frame-to-frame. The effectiveness of the approach is demonstrated on four challenging real-world public datasets.
Resumo:
This paper presents the two datasets (ARENA and P5) and the challenge that form a part of the PETS 2015 workshop. The datasets consist of scenarios recorded by us- ing multiple visual and thermal sensors. The scenarios in ARENA dataset involve different staged activities around a parked vehicle in a parking lot in UK and those in P5 dataset involve different staged activities around the perimeter of a nuclear power plant in Sweden. The scenarios of each dataset are grouped into ‘Normal’, ‘Warning’ and ‘Alarm’ categories. The Challenge specifically includes tasks that account for different steps in a video understanding system: Low-Level Video Analysis (object detection and tracking), Mid-Level Video Analysis (‘atomic’ event detection) and High-Level Video Analysis (‘complex’ event detection). The evaluation methodology used for the Challenge includes well-established measures.