7 resultados para visual object detection
em Duke University
Resumo:
Current state of the art techniques for landmine detection in ground penetrating radar (GPR) utilize statistical methods to identify characteristics of a landmine response. This research makes use of 2-D slices of data in which subsurface landmine responses have hyperbolic shapes. Various methods from the field of visual image processing are adapted to the 2-D GPR data, producing superior landmine detection results. This research goes on to develop a physics-based GPR augmentation method motivated by current advances in visual object detection. This GPR specific augmentation is used to mitigate issues caused by insufficient training sets. This work shows that augmentation improves detection performance under training conditions that are normally very difficult. Finally, this work introduces the use of convolutional neural networks as a method to learn feature extraction parameters. These learned convolutional features outperform hand-designed features in GPR detection tasks. This work presents a number of methods, both borrowed from and motivated by the substantial work in visual image processing. The methods developed and presented in this work show an improvement in overall detection performance and introduce a method to improve the robustness of statistical classification.
Resumo:
As we look around a scene, we perceive it as continuous and stable even though each saccadic eye movement changes the visual input to the retinas. How the brain achieves this perceptual stabilization is unknown, but a major hypothesis is that it relies on presaccadic remapping, a process in which neurons shift their visual sensitivity to a new location in the scene just before each saccade. This hypothesis is difficult to test in vivo because complete, selective inactivation of remapping is currently intractable. We tested it in silico with a hierarchical, sheet-based neural network model of the visual and oculomotor system. The model generated saccadic commands to move a video camera abruptly. Visual input from the camera and internal copies of the saccadic movement commands, or corollary discharge, converged at a map-level simulation of the frontal eye field (FEF), a primate brain area known to receive such inputs. FEF output was combined with eye position signals to yield a suitable coordinate frame for guiding arm movements of a robot. Our operational definition of perceptual stability was "useful stability,” quantified as continuously accurate pointing to a visual object despite camera saccades. During training, the emergence of useful stability was correlated tightly with the emergence of presaccadic remapping in the FEF. Remapping depended on corollary discharge but its timing was synchronized to the updating of eye position. When coupled to predictive eye position signals, remapping served to stabilize the target representation for continuously accurate pointing. Graded inactivations of pathways in the model replicated, and helped to interpret, previous in vivo experiments. The results support the hypothesis that visual stability requires presaccadic remapping, provide explanations for the function and timing of remapping, and offer testable hypotheses for in vivo studies. We conclude that remapping allows for seamless coordinate frame transformations and quick actions despite visual afferent lags. With visual remapping in place for behavior, it may be exploited for perceptual continuity.
Resumo:
As we look around a scene, we perceive it as continuous and stable even though each saccadic eye movement changes the visual input to the retinas. How the brain achieves this perceptual stabilization is unknown, but a major hypothesis is that it relies on presaccadic remapping, a process in which neurons shift their visual sensitivity to a new location in the scene just before each saccade. This hypothesis is difficult to test in vivo because complete, selective inactivation of remapping is currently intractable. We tested it in silico with a hierarchical, sheet-based neural network model of the visual and oculomotor system. The model generated saccadic commands to move a video camera abruptly. Visual input from the camera and internal copies of the saccadic movement commands, or corollary discharge, converged at a map-level simulation of the frontal eye field (FEF), a primate brain area known to receive such inputs. FEF output was combined with eye position signals to yield a suitable coordinate frame for guiding arm movements of a robot. Our operational definition of perceptual stability was "useful stability," quantified as continuously accurate pointing to a visual object despite camera saccades. During training, the emergence of useful stability was correlated tightly with the emergence of presaccadic remapping in the FEF. Remapping depended on corollary discharge but its timing was synchronized to the updating of eye position. When coupled to predictive eye position signals, remapping served to stabilize the target representation for continuously accurate pointing. Graded inactivations of pathways in the model replicated, and helped to interpret, previous in vivo experiments. The results support the hypothesis that visual stability requires presaccadic remapping, provide explanations for the function and timing of remapping, and offer testable hypotheses for in vivo studies. We conclude that remapping allows for seamless coordinate frame transformations and quick actions despite visual afferent lags. With visual remapping in place for behavior, it may be exploited for perceptual continuity.
Resumo:
This work explores the use of statistical methods in describing and estimating camera poses, as well as the information feedback loop between camera pose and object detection. Surging development in robotics and computer vision has pushed the need for algorithms that infer, understand, and utilize information about the position and orientation of the sensor platforms when observing and/or interacting with their environment.
The first contribution of this thesis is the development of a set of statistical tools for representing and estimating the uncertainty in object poses. A distribution for representing the joint uncertainty over multiple object positions and orientations is described, called the mirrored normal-Bingham distribution. This distribution generalizes both the normal distribution in Euclidean space, and the Bingham distribution on the unit hypersphere. It is shown to inherit many of the convenient properties of these special cases: it is the maximum-entropy distribution with fixed second moment, and there is a generalized Laplace approximation whose result is the mirrored normal-Bingham distribution. This distribution and approximation method are demonstrated by deriving the analytical approximation to the wrapped-normal distribution. Further, it is shown how these tools can be used to represent the uncertainty in the result of a bundle adjustment problem.
Another application of these methods is illustrated as part of a novel camera pose estimation algorithm based on object detections. The autocalibration task is formulated as a bundle adjustment problem using prior distributions over the 3D points to enforce the objects' structure and their relationship with the scene geometry. This framework is very flexible and enables the use of off-the-shelf computational tools to solve specialized autocalibration problems. Its performance is evaluated using a pedestrian detector to provide head and foot location observations, and it proves much faster and potentially more accurate than existing methods.
Finally, the information feedback loop between object detection and camera pose estimation is closed by utilizing camera pose information to improve object detection in scenarios with significant perspective warping. Methods are presented that allow the inverse perspective mapping traditionally applied to images to be applied instead to features computed from those images. For the special case of HOG-like features, which are used by many modern object detection systems, these methods are shown to provide substantial performance benefits over unadapted detectors while achieving real-time frame rates, orders of magnitude faster than comparable image warping methods.
The statistical tools and algorithms presented here are especially promising for mobile cameras, providing the ability to autocalibrate and adapt to the camera pose in real time. In addition, these methods have wide-ranging potential applications in diverse areas of computer vision, robotics, and imaging.
Resumo:
Understanding animals' spatial perception is a critical step toward discerning their cognitive processes. The spatial sense is multimodal and based on both the external world and mental representations of that world. Navigation in each species depends upon its evolutionary history, physiology, and ecological niche. We carried out foraging experiments on wild vervet monkeys (Chlorocebus pygerythrus) at Lake Nabugabo, Uganda, to determine the types of cues used to detect food and whether associative cues could be used to find hidden food. Our first and second set of experiments differentiated between vervets' use of global spatial cues (including the arrangement of feeding platforms within the surrounding vegetation) and/or local layout cues (the position of platforms relative to one another), relative to the use of goal-object cues on each platform. Our third experiment provided an associative cue to the presence of food with global spatial, local layout, and goal-object cues disguised. Vervets located food above chance levels when goal-object cues and associative cues were present, and visual signals were the predominant goal-object cues that they attended to. With similar sample sizes and methods as previous studies on New World monkeys, vervets were not able to locate food using only global spatial cues and local layout cues, unlike all five species of platyrrhines thus far tested. Relative to these platyrrhines, the spatial location of food may need to stay the same for a longer time period before vervets encode this information, and goal-object cues may be more salient for them in small-scale space.
Resumo:
Our ability to track an object as the same persisting entity over time and motion may primarily rely on spatiotemporal representations which encode some, but not all, of an object's features. Previous researchers using the 'object reviewing' paradigm have demonstrated that such representations can store featural information of well-learned stimuli such as letters and words at a highly abstract level. However, it is unknown whether these representations can also store purely episodic information (i.e. information obtained from a single, novel encounter) that does not correspond to pre-existing type-representations in long-term memory. Here, in an object-reviewing experiment with novel face images as stimuli, observers still produced reliable object-specific preview benefits in dynamic displays: a preview of a novel face on a specific object speeded the recognition of that particular face at a later point when it appeared again on the same object compared to when it reappeared on a different object (beyond display-wide priming), even when all objects moved to new positions in the intervening delay. This case study demonstrates that the mid-level visual representations which keep track of persisting identity over time--e.g. 'object files', in one popular framework can store not only abstract types from long-term memory, but also specific tokens from online visual experience.
Resumo:
Visual inspection with Acetic Acid (VIA) and Visual Inspection with Lugol’s Iodine (VILI) are increasingly recommended in various cervical cancer screening protocols in low-resource settings. Although VIA is more widely used, VILI has been advocated as an easier and more specific screening test. VILI has not been well-validated as a stand-alone screening test, compared to VIA or validated for use in HIV-infected women. We carried out a randomized clinical trial to compare the diagnostic accuracy of VIA and VILI among HIV-infected women. Women attending the Family AIDS Care and Education Services (FACES) clinic in western Kenya were enrolled and randomized to undergo either VIA or VILI with colposcopy. Lesions suspicious for cervical intraepithelial neoplasia 2 or greater (CIN2+) were biopsied. Between October 2011 and June 2012, 654 were randomized to undergo VIA or VILI. The test positivity rates were 26.2% for VIA and 30.6% for VILI (p = 0.22). The rate of detection of CIN2+ was 7.7% in the VIA arm and 11.5% in the VILI arm (p = 0.10). There was no significant difference in the diagnostic performance of VIA and VILI for the detection of CIN2+. Sensitivity and specificity were 84.0% and 78.6%, respectively, for VIA and 84.2% and 76.4% for VILI. The positive and negative predictive values were 24.7% and 98.3% for VIA, and 31.7% and 97.4% for VILI. Among women with CD4+ count < 350, VILI had a significantly decreased specificity (66.2%) compared to VIA in the same group (83.9%, p = 0.02) and compared to VILI performed among women with CD4+ count ≥ 350 (79.7%, p = 0.02). VIA and VILI had similar diagnostic accuracy and rates of CIN2+ detection among HIV-infected women.