9 resultados para Robotic vision

em CaltechTHESIS


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis presents a novel framework for state estimation in the context of robotic grasping and manipulation. The overall estimation approach is based on fusing various visual cues for manipulator tracking, namely appearance and feature-based, shape-based, and silhouette-based visual cues. Similarly, a framework is developed to fuse the above visual cues, but also kinesthetic cues such as force-torque and tactile measurements, for in-hand object pose estimation. The cues are extracted from multiple sensor modalities and are fused in a variety of Kalman filters.

A hybrid estimator is developed to estimate both a continuous state (robot and object states) and discrete states, called contact modes, which specify how each finger contacts a particular object surface. A static multiple model estimator is used to compute and maintain this mode probability. The thesis also develops an estimation framework for estimating model parameters associated with object grasping. Dual and joint state-parameter estimation is explored for parameter estimation of a grasped object's mass and center of mass. Experimental results demonstrate simultaneous object localization and center of mass estimation.

Dual-arm estimation is developed for two arm robotic manipulation tasks. Two types of filters are explored; the first is an augmented filter that contains both arms in the state vector while the second runs two filters in parallel, one for each arm. These two frameworks and their performance is compared in a dual-arm task of removing a wheel from a hub.

This thesis also presents a new method for action selection involving touch. This next best touch method selects an available action for interacting with an object that will gain the most information. The algorithm employs information theory to compute an information gain metric that is based on a probabilistic belief suitable for the task. An estimation framework is used to maintain this belief over time. Kinesthetic measurements such as contact and tactile measurements are used to update the state belief after every interactive action. Simulation and experimental results are demonstrated using next best touch for object localization, specifically a door handle on a door. The next best touch theory is extended for model parameter determination. Since many objects within a particular object category share the same rough shape, principle component analysis may be used to parametrize the object mesh models. These parameters can be estimated using the action selection technique that selects the touching action which best both localizes and estimates these parameters. Simulation results are then presented involving localizing and determining a parameter of a screwdriver.

Lastly, the next best touch theory is further extended to model classes. Instead of estimating parameters, object class determination is incorporated into the information gain metric calculation. The best touching action is selected in order to best discern between the possible model classes. Simulation results are presented to validate the theory.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis is concerned with spatial filtering. What is its utility in tone reproduction? Does it exist in vision, and if so, what constraints does it impose on the nervous system?

Tone reproduction is just the art and science of taking a picture and then displaying it. The sensors available to capture an image have a greater dynamic range than the media that may be used to display it. Conventionally, spatial filtering is used to boost contrast; it ameliorates the loss of contrast that results when the sensor signal range is scaled down to fit the display range. In this thesis, a type of nonlinear spatial filtering is discussed that results in direct range reduction without range scaling. This filtering process is instantiated in a real-time image processor built using analog CMOS VLSI.

Spatial filtering must be applied with care in both artificial and natural vision systems. It is argued that the nervous system does not simply filter linearly across an image. Rather, the way that we see things implies that the nervous system filters nonlinearly. Further, many models for color vision include a high-pass filtering step in which the DC information is lost. A real-time study of filtering in color space leads to the conclusion that the nervous system is not that simple, and that it maintains DC information by referencing to white.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Waking up from a dreamless sleep, I open my eyes, recognize my wife’s face and am filled with joy. In this thesis, I used functional Magnetic Resonance Imaging (fMRI) to gain insights into the mechanisms involved in this seemingly simple daily occurrence, which poses at least three great challenges to neuroscience: how does conscious experience arise from the activity of the brain? How does the brain process visual input to the point of recognizing individual faces? How does the brain store semantic knowledge about people that we know? To start tackling the first question, I studied the neural correlates of unconscious processing of invisible faces. I was unable to image significant activations related to the processing of completely invisible faces, despite existing reports in the literature. I thus moved on to the next question and studied how recognition of a familiar person was achieved in the brain; I focused on finding invariant representations of person identity – representations that would be activated any time we think of a familiar person, read their name, see their picture, hear them talk, etc. There again, I could not find significant evidence for such representations with fMRI, even in regions where they had previously been found with single unit recordings in human patients (the Jennifer Aniston neurons). Faced with these null outcomes, the scope of my investigations eventually turned back towards the technique that I had been using, fMRI, and the recently praised analytical tools that I had been trusting, Multivariate Pattern Analysis. After a mostly disappointing attempt at replicating a strong single unit finding of a categorical response to animals in the right human amygdala with fMRI, I put fMRI decoding to an ultimate test with a unique dataset acquired in the macaque monkey. There I showed a dissociation between the ability of fMRI to pick up face viewpoint information and its inability to pick up face identity information, which I mostly traced back to the poor clustering of identity selective units. Though fMRI decoding is a powerful new analytical tool, it does not rid fMRI of its inherent limitations as a hemodynamics-based measure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Flies are particularly adept at balancing the competing demands of delay tolerance, performance, and robustness during flight, which invites thoughtful examination of their multimodal feedback architecture. This dissertation examines stabilization requirements for inner-loop feedback strategies in the flapping flight of Drosophila, the fruit fly, against the backdrop of sensorimotor transformations present in the animal. Flies have evolved multiple specializations to reduce sensorimotor latency, but sensory delay during flight is still significant on the timescale of body dynamics. I explored the effect of sensor delay on flight stability and performance for yaw turns using a dynamically-scaled robot equipped with a real-time feedback system that performed active turns in response to measured yaw torque. The results show a fundamental tradeoff between sensor delay and permissible feedback gain, and suggest that fast mechanosensory feedback provides a source of active damping that compliments that contributed by passive effects. Presented in the context of these findings, a control architecture whereby a haltere-mediated inner-loop proportional controller provides damping for slower visually-mediated feedback is consistent with tethered-flight measurements, free-flight observations, and engineering design principles. Additionally, I investigated how flies adjust stroke features to regulate and stabilize level forward flight. The results suggest that few changes to hovering kinematics are actually required to meet steady-state lift and thrust requirements at different flight speeds, and the primary driver of equilibrium velocity is the aerodynamic pitch moment. This finding is consistent with prior hypotheses and observations regarding the relationship between body pitch and flight speed in fruit flies. The results also show that the dynamics may be stabilized with additional pitch damping, but the magnitude of required damping increases with flight speed. I posit that differences in stroke deviation between the upstroke and downstroke might play a critical role in this stabilization. Fast mechanosensory feedback of the pitch rate could enable active damping, which would inherently exhibit gain scheduling with flight speed if pitch torque is regulated by adjusting stroke deviation. Such a control scheme would provide an elegant solution for flight stabilization across a wide range of flight speeds.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

For a hungry fruit fly, locating and landing on a fermenting fruit where it can feed, find mates, and lay eggs, is an essential and difficult task requiring the integration of both olfactory and visual cues. Understanding how flies accomplish this will help provide a comprehensive ethological context for the expanding knowledge of their neural circuits involved in processing olfaction and vision, as well as inspire novel engineering solutions for control and estimation in computationally limited robotic applications. In this thesis, I use novel high throughput methods to develop a detailed overview of how flies track odor plumes, land, and regulate flight speed. Finally, I provide an example of how these insights can be applied to robotic applications to simplify complicated estimation problems. To localize an odor source, flies exhibit three iterative, reflex-driven behaviors. Upon encountering an attractive plume, flies increase their flight speed and turn upwind using visual cues. After losing the plume, flies begin zigzagging crosswind, again using visual cues to control their heading. After sensing an attractive odor, flies become more attracted to small visual features, which increases their chances of finding the plume source. Their changes in heading are largely controlled by open-loop maneuvers called saccades, which they direct towards and away from visual features. If a fly decides to land on an object, it begins to decelerate so as to maintain a stereotypical ratio of expansion to retinal size. Once they reach a stereotypical distance from the target, flies extend their legs in preparation for touchdown. Although it is unclear what cues they use to trigger this behavior, previous studies have indicated that it is likely under visual control. In Chapter 3, I use a nonlinear control theoretic analysis and robotic testbed to propose a novel and putative mechanism for how a fly might visually estimate distance by actively decelerating according to a visual control law. Throughout these behaviors, a common theme is the visual control of flight speed. Using genetic tools I show that the neuromodulator octopamine plays an important role in regulating flight speed, and propose a neural circuit for how this controller might be implemented in the flies brain. Two general biological and engineering principles are evident across my experiments: (1) complex behaviors, such as foraging, can emerge from the interactions of simple independent sensory-motor modules; (2) flies control their behavior in such a way that simplifies complex estimation problems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis addresses a series of topics related to the question of how people find the foreground objects from complex scenes. With both computer vision modeling, as well as psychophysical analyses, we explore the computational principles for low- and mid-level vision.

We first explore the computational methods of generating saliency maps from images and image sequences. We propose an extremely fast algorithm called Image Signature that detects the locations in the image that attract human eye gazes. With a series of experimental validations based on human behavioral data collected from various psychophysical experiments, we conclude that the Image Signature and its spatial-temporal extension, the Phase Discrepancy, are among the most accurate algorithms for saliency detection under various conditions.

In the second part, we bridge the gap between fixation prediction and salient object segmentation with two efforts. First, we propose a new dataset that contains both fixation and object segmentation information. By simultaneously presenting the two types of human data in the same dataset, we are able to analyze their intrinsic connection, as well as understanding the drawbacks of today’s “standard” but inappropriately labeled salient object segmentation dataset. Second, we also propose an algorithm of salient object segmentation. Based on our novel discoveries on the connections of fixation data and salient object segmentation data, our model significantly outperforms all existing models on all 3 datasets with large margins.

In the third part of the thesis, we discuss topics around the human factors of boundary analysis. Closely related to salient object segmentation, boundary analysis focuses on delimiting the local contours of an object. We identify the potential pitfalls of algorithm evaluation for the problem of boundary detection. Our analysis indicates that today’s popular boundary detection datasets contain significant level of noise, which may severely influence the benchmarking results. To give further insights on the labeling process, we propose a model to characterize the principles of the human factors during the labeling process.

The analyses reported in this thesis offer new perspectives to a series of interrelating issues in low- and mid-level vision. It gives warning signs to some of today’s “standard” procedures, while proposing new directions to encourage future research.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The visual system is a remarkable platform that evolved to solve difficult computational problems such as detection, recognition, and classification of objects. Of great interest is the face-processing network, a sub-system buried deep in the temporal lobe, dedicated for analyzing specific type of objects (faces). In this thesis, I focus on the problem of face detection by the face-processing network. Insights obtained from years of developing computer-vision algorithms to solve this task have suggested that it may be efficiently and effectively solved by detection and integration of local contrast features. Does the brain use a similar strategy? To answer this question, I embark on a journey that takes me through the development and optimization of dedicated tools for targeting and perturbing deep brain structures. Data collected using MR-guided electrophysiology in early face-processing regions was found to have strong selectivity for contrast features, similar to ones used by artificial systems. While individual cells were tuned for only a small subset of features, the population as a whole encoded the full spectrum of features that are predictive to the presence of a face in an image. Together with additional evidence, my results suggest a possible computational mechanism for face detection in early face processing regions. To move from correlation to causation, I focus on adopting an emergent technology for perturbing brain activity using light: optogenetics. While this technique has the potential to overcome problems associated with the de-facto way of brain stimulation (electrical microstimulation), many open questions remain about its applicability and effectiveness for perturbing the non-human primate (NHP) brain. In a set of experiments, I use viral vectors to deliver genetically encoded optogenetic constructs to the frontal eye field and faceselective regions in NHP and examine their effects side-by-side with electrical microstimulation to assess their effectiveness in perturbing neural activity as well as behavior. Results suggest that cells are robustly and strongly modulated upon light delivery and that such perturbation can modulate and even initiate motor behavior, thus, paving the way for future explorations that may apply these tools to study connectivity and information flow in the face processing network.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Modern robots are increasingly expected to function in uncertain and dynamically challenging environments, often in proximity with humans. In addition, wide scale adoption of robots requires on-the-fly adaptability of software for diverse application. These requirements strongly suggest the need to adopt formal representations of high level goals and safety specifications, especially as temporal logic formulas. This approach allows for the use of formal verification techniques for controller synthesis that can give guarantees for safety and performance. Robots operating in unstructured environments also face limited sensing capability. Correctly inferring a robot's progress toward high level goal can be challenging.

This thesis develops new algorithms for synthesizing discrete controllers in partially known environments under specifications represented as linear temporal logic (LTL) formulas. It is inspired by recent developments in finite abstraction techniques for hybrid systems and motion planning problems. The robot and its environment is assumed to have a finite abstraction as a Partially Observable Markov Decision Process (POMDP), which is a powerful model class capable of representing a wide variety of problems. However, synthesizing controllers that satisfy LTL goals over POMDPs is a challenging problem which has received only limited attention.

This thesis proposes tractable, approximate algorithms for the control synthesis problem using Finite State Controllers (FSCs). The use of FSCs to control finite POMDPs allows for the closed system to be analyzed as finite global Markov chain. The thesis explicitly shows how transient and steady state behavior of the global Markov chains can be related to two different criteria with respect to satisfaction of LTL formulas. First, the maximization of the probability of LTL satisfaction is related to an optimization problem over a parametrization of the FSC. Analytic computation of gradients are derived which allows the use of first order optimization techniques.

The second criterion encourages rapid and frequent visits to a restricted set of states over infinite executions. It is formulated as a constrained optimization problem with a discounted long term reward objective by the novel utilization of a fundamental equation for Markov chains - the Poisson equation. A new constrained policy iteration technique is proposed to solve the resulting dynamic program, which also provides a way to escape local maxima.

The algorithms proposed in the thesis are applied to the task planning and execution challenges faced during the DARPA Autonomous Robotic Manipulation - Software challenge.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Experiments are described using the random dot stereo patterns devised by Julesz, but substituting various colors and luminances for the usual black and white random squares. The ability to perceive the patterns in depth depends on a luminance difference between the colors used. If two colors are the same luminance, then depth is not perceived although each of the individual squares which make up the patterns is easily seen due to the color difference. This is true for any combination of different colors. If different colors are used for corresponding random squares between the left and right eye patterns, stereopsis is possible for all combinations of binocular rivalry in color, provided the luminance difference is large enough. Rivalry in luminance always precludes stereopsis, regardless of the colors involved.