6 resultados para Vision-Based Forced Landing

em CaltechTHESIS


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis presents a novel framework for state estimation in the context of robotic grasping and manipulation. The overall estimation approach is based on fusing various visual cues for manipulator tracking, namely appearance and feature-based, shape-based, and silhouette-based visual cues. Similarly, a framework is developed to fuse the above visual cues, but also kinesthetic cues such as force-torque and tactile measurements, for in-hand object pose estimation. The cues are extracted from multiple sensor modalities and are fused in a variety of Kalman filters.

A hybrid estimator is developed to estimate both a continuous state (robot and object states) and discrete states, called contact modes, which specify how each finger contacts a particular object surface. A static multiple model estimator is used to compute and maintain this mode probability. The thesis also develops an estimation framework for estimating model parameters associated with object grasping. Dual and joint state-parameter estimation is explored for parameter estimation of a grasped object's mass and center of mass. Experimental results demonstrate simultaneous object localization and center of mass estimation.

Dual-arm estimation is developed for two arm robotic manipulation tasks. Two types of filters are explored; the first is an augmented filter that contains both arms in the state vector while the second runs two filters in parallel, one for each arm. These two frameworks and their performance is compared in a dual-arm task of removing a wheel from a hub.

This thesis also presents a new method for action selection involving touch. This next best touch method selects an available action for interacting with an object that will gain the most information. The algorithm employs information theory to compute an information gain metric that is based on a probabilistic belief suitable for the task. An estimation framework is used to maintain this belief over time. Kinesthetic measurements such as contact and tactile measurements are used to update the state belief after every interactive action. Simulation and experimental results are demonstrated using next best touch for object localization, specifically a door handle on a door. The next best touch theory is extended for model parameter determination. Since many objects within a particular object category share the same rough shape, principle component analysis may be used to parametrize the object mesh models. These parameters can be estimated using the action selection technique that selects the touching action which best both localizes and estimates these parameters. Simulation results are then presented involving localizing and determining a parameter of a screwdriver.

Lastly, the next best touch theory is further extended to model classes. Instead of estimating parameters, object class determination is incorporated into the information gain metric calculation. The best touching action is selected in order to best discern between the possible model classes. Simulation results are presented to validate the theory.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Waking up from a dreamless sleep, I open my eyes, recognize my wife’s face and am filled with joy. In this thesis, I used functional Magnetic Resonance Imaging (fMRI) to gain insights into the mechanisms involved in this seemingly simple daily occurrence, which poses at least three great challenges to neuroscience: how does conscious experience arise from the activity of the brain? How does the brain process visual input to the point of recognizing individual faces? How does the brain store semantic knowledge about people that we know? To start tackling the first question, I studied the neural correlates of unconscious processing of invisible faces. I was unable to image significant activations related to the processing of completely invisible faces, despite existing reports in the literature. I thus moved on to the next question and studied how recognition of a familiar person was achieved in the brain; I focused on finding invariant representations of person identity – representations that would be activated any time we think of a familiar person, read their name, see their picture, hear them talk, etc. There again, I could not find significant evidence for such representations with fMRI, even in regions where they had previously been found with single unit recordings in human patients (the Jennifer Aniston neurons). Faced with these null outcomes, the scope of my investigations eventually turned back towards the technique that I had been using, fMRI, and the recently praised analytical tools that I had been trusting, Multivariate Pattern Analysis. After a mostly disappointing attempt at replicating a strong single unit finding of a categorical response to animals in the right human amygdala with fMRI, I put fMRI decoding to an ultimate test with a unique dataset acquired in the macaque monkey. There I showed a dissociation between the ability of fMRI to pick up face viewpoint information and its inability to pick up face identity information, which I mostly traced back to the poor clustering of identity selective units. Though fMRI decoding is a powerful new analytical tool, it does not rid fMRI of its inherent limitations as a hemodynamics-based measure.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The degeneration of the outer retina usually causes blindness by affecting the photoreceptor cells. However, the ganglion cells, which consist of optic nerves, on the middle and inner retina layers are often intact. The retinal implant, which can partially restore vision by electrical stimulation, soon becomes a focus for research. Although many groups worldwide have spent a lot of effort on building devices for retinal implant, current state-of-the-art technologies still lack a reliable packaging scheme for devices with desirable high-density multi-channel features. Wireless flexible retinal implants have always been the ultimate goal for retinal prosthesis. In this dissertation, the reliable packaging scheme for a wireless flexible parylene-based retinal implants has been well developed. It can not only provide stable electrical and mechanical connections to the high-density multi-channel (1000+ channels on 5 mm × 5 mm chip area) IC chips, but also survive for more than 10 years in the human body with corrosive fluids.

The device is based on a parylene-metal-parylene sandwich structure. In which, the adhesion between the parylene layers and the metals embedded in the parylene layers have been studied. Integration technology for high-density multi-channel IC chips has also been addressed and tested with dummy and real 268-channel and 1024-channel retinal IC chips. In addition, different protection schemes have been tried in application to IC chips and discrete components to gain the longest lifetime. The effectiveness has been confirmed by the accelerated and active lifetime soaking test in saline solution. Surgical mockups have also been designed and successfully implanted inside dog's and pig's eyes. Additionally, the electrodes used to stimulate the ganglion cells have been modified to lower the interface impedance and shaped to better fit the retina. Finally, all the developed technologies have been applied on the final device with a dual-metal-layer structure.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis addresses a series of topics related to the question of how people find the foreground objects from complex scenes. With both computer vision modeling, as well as psychophysical analyses, we explore the computational principles for low- and mid-level vision.

We first explore the computational methods of generating saliency maps from images and image sequences. We propose an extremely fast algorithm called Image Signature that detects the locations in the image that attract human eye gazes. With a series of experimental validations based on human behavioral data collected from various psychophysical experiments, we conclude that the Image Signature and its spatial-temporal extension, the Phase Discrepancy, are among the most accurate algorithms for saliency detection under various conditions.

In the second part, we bridge the gap between fixation prediction and salient object segmentation with two efforts. First, we propose a new dataset that contains both fixation and object segmentation information. By simultaneously presenting the two types of human data in the same dataset, we are able to analyze their intrinsic connection, as well as understanding the drawbacks of today’s “standard” but inappropriately labeled salient object segmentation dataset. Second, we also propose an algorithm of salient object segmentation. Based on our novel discoveries on the connections of fixation data and salient object segmentation data, our model significantly outperforms all existing models on all 3 datasets with large margins.

In the third part of the thesis, we discuss topics around the human factors of boundary analysis. Closely related to salient object segmentation, boundary analysis focuses on delimiting the local contours of an object. We identify the potential pitfalls of algorithm evaluation for the problem of boundary detection. Our analysis indicates that today’s popular boundary detection datasets contain significant level of noise, which may severely influence the benchmarking results. To give further insights on the labeling process, we propose a model to characterize the principles of the human factors during the labeling process.

The analyses reported in this thesis offer new perspectives to a series of interrelating issues in low- and mid-level vision. It gives warning signs to some of today’s “standard” procedures, while proposing new directions to encourage future research.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis explores the dynamics of scale interactions in a turbulent boundary layer through a forcing-response type experimental study. An emphasis is placed on the analysis of triadic wavenumber interactions since the governing Navier-Stokes equations for the flow necessitate a direct coupling between triadically consist scales. Two sets of experiments were performed in which deterministic disturbances were introduced into the flow using a spatially-impulsive dynamic wall perturbation. Hotwire anemometry was employed to measure the downstream turbulent velocity and study the flow response to the external forcing. In the first set of experiments, which were based on a recent investigation of dynamic forcing effects in a turbulent boundary layer, a 2D (spanwise constant) spatio-temporal normal mode was excited in the flow; the streamwise length and time scales of the synthetic mode roughly correspond to the very-large-scale-motions (VLSM) found naturally in canonical flows. Correlation studies between the large- and small-scale velocity signals reveal an alteration of the natural phase relations between scales by the synthetic mode. In particular, a strong phase-locking or organizing effect is seen on directly coupled small-scales through triadic interactions. Having characterized the bulk influence of a single energetic mode on the flow dynamics, a second set of experiments aimed at isolating specific triadic interactions was performed. Two distinct 2D large-scale normal modes were excited in the flow, and the response at the corresponding sum and difference wavenumbers was isolated from the turbulent signals. Results from this experiment serve as an unique demonstration of direct non-linear interactions in a fully turbulent wall-bounded flow, and allow for examination of phase relationships involving specific interacting scales. A direct connection is also made to the Navier-Stokes resolvent operator framework developed in recent literature. Results and analysis from the present work offer insights into the dynamical structure of wall turbulence, and have interesting implications for design of practical turbulence manipulation or control strategies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Visual inputs to artificial and biological visual systems are often quantized: cameras accumulate photons from the visual world, and the brain receives action potentials from visual sensory neurons. Collecting more information quanta leads to a longer acquisition time and better performance. In many visual tasks, collecting a small number of quanta is sufficient to solve the task well. The ability to determine the right number of quanta is pivotal in situations where visual information is costly to obtain, such as photon-starved or time-critical environments. In these situations, conventional vision systems that always collect a fixed and large amount of information are infeasible. I develop a framework that judiciously determines the number of information quanta to observe based on the cost of observation and the requirement for accuracy. The framework implements the optimal speed versus accuracy tradeoff when two assumptions are met, namely that the task is fully specified probabilistically and constant over time. I also extend the framework to address scenarios that violate the assumptions. I deploy the framework to three recognition tasks: visual search (where both assumptions are satisfied), scotopic visual recognition (where the model is not specified), and visual discrimination with unknown stimulus onset (where the model is dynamic over time). Scotopic classification experiments suggest that the framework leads to dramatic improvement in photon-efficiency compared to conventional computer vision algorithms. Human psychophysics experiments confirmed that the framework provides a parsimonious and versatile explanation for human behavior under time pressure in both static and dynamic environments.