7 resultados para Learning-Content-System
em CaltechTHESIS
Resumo:
Using neuromorphic analog VLSI techniques for modeling large neural systems has several advantages over software techniques. By designing massively-parallel analog circuit arrays which are ubiquitous in neural systems, analog VLSI models are extremely fast, particularly when local interactions are important in the computation. While analog VLSI circuits are not as flexible as software methods, the constraints posed by this approach are often very similar to the constraints faced by biological systems. As a result, these constraints can offer many insights into the solutions found by evolution. This dissertation describes a hardware modeling effort to mimic the primate oculomotor system which requires both fast sensory processing and fast motor control. A one-dimensional hardware model of the primate eye has been built which simulates the physical dynamics of the biological system. It is driven by analog VLSI circuits mimicking brainstem and cortical circuits that control eye movements. In this framework, a visually-triggered saccadic system is demonstrated which generates averaging saccades. In addition, an auditory localization system, based on the neural circuits of the barn owl, is used to trigger saccades to acoustic targets in parallel with visual targets. Two different types of learning are also demonstrated on the saccadic system using floating-gate technology allowing the non-volatile storage of analog parameters directly on the chip. Finally, a model of visual attention is used to select and track moving targets against textured backgrounds, driving both saccadic and smooth pursuit eye movements to maintain the image of the target in the center of the field of view. This system represents one of the few efforts in this field to integrate both neuromorphic sensory processing and motor control in a closed-loop fashion.
Resumo:
Humans are able of distinguishing more than 5000 visual categories even in complex environments using a variety of different visual systems all working in tandem. We seem to be capable of distinguishing thousands of different odors as well. In the machine learning community, many commonly used multi-class classifiers do not scale well to such large numbers of categories. This thesis demonstrates a method of automatically creating application-specific taxonomies to aid in scaling classification algorithms to more than 100 cate- gories using both visual and olfactory data. The visual data consists of images collected online and pollen slides scanned under a microscope. The olfactory data was acquired by constructing a small portable sniffing apparatus which draws air over 10 carbon black polymer composite sensors. We investigate performance when classifying 256 visual categories, 8 or more species of pollen and 130 olfactory categories sampled from common household items and a standardized scratch-and-sniff test. Taxonomies are employed in a divide-and-conquer classification framework which improves classification time while allowing the end user to trade performance for specificity as needed. Before classification can even take place, the pollen counter and electronic nose must filter out a high volume of background “clutter” to detect the categories of interest. In the case of pollen this is done with an efficient cascade of classifiers that rule out most non-pollen before invoking slower multi-class classifiers. In the case of the electronic nose, much of the extraneous noise encountered in outdoor environments can be filtered using a sniffing strategy which preferentially samples the visensor response at frequencies that are relatively immune to background contributions from ambient water vapor. This combination of efficient background rejection with scalable classification algorithms is tested in detail for three separate projects: 1) the Caltech-256 Image Dataset, 2) the Caltech Automated Pollen Identification and Counting System (CAPICS) and 3) a portable electronic nose specially constructed for outdoor use.
Resumo:
Therapy employing epidural electrostimulation holds great potential for improving therapy for patients with spinal cord injury (SCI) (Harkema et al., 2011). Further promising results from combined therapies using electrostimulation have also been recently obtained (e.g., van den Brand et al., 2012). The devices being developed to deliver the stimulation are highly flexible, capable of delivering any individual stimulus among a combinatorially large set of stimuli (Gad et al., 2013). While this extreme flexibility is very useful for ensuring that the device can deliver an appropriate stimulus, the challenge of choosing good stimuli is quite substantial, even for expert human experimenters. To develop a fully implantable, autonomous device which can provide useful therapy, it is necessary to design an algorithmic method for choosing the stimulus parameters. Such a method can be used in a clinical setting, by caregivers who are not experts in the neurostimulator's use, and to allow the system to adapt autonomously between visits to the clinic. To create such an algorithm, this dissertation pursues the general class of active learning algorithms that includes Gaussian Process Upper Confidence Bound (GP-UCB, Srinivas et al., 2010), developing the Gaussian Process Batch Upper Confidence Bound (GP-BUCB, Desautels et al., 2012) and Gaussian Process Adaptive Upper Confidence Bound (GP-AUCB) algorithms. This dissertation develops new theoretical bounds for the performance of these and similar algorithms, empirically assesses these algorithms against a number of competitors in simulation, and applies a variant of the GP-BUCB algorithm in closed-loop to control SCI therapy via epidural electrostimulation in four live rats. The algorithm was tasked with maximizing the amplitude of evoked potentials in the rats' left tibialis anterior muscle. These experiments show that the algorithm is capable of directing these experiments sensibly, finding effective stimuli in all four animals. Further, in direct competition with an expert human experimenter, the algorithm produced superior performance in terms of average reward and comparable or superior performance in terms of maximum reward. These results indicate that variants of GP-BUCB may be suitable for autonomously directing SCI therapy.
Resumo:
In the first part of the thesis we explore three fundamental questions that arise naturally when we conceive a machine learning scenario where the training and test distributions can differ. Contrary to conventional wisdom, we show that in fact mismatched training and test distribution can yield better out-of-sample performance. This optimal performance can be obtained by training with the dual distribution. This optimal training distribution depends on the test distribution set by the problem, but not on the target function that we want to learn. We show how to obtain this distribution in both discrete and continuous input spaces, as well as how to approximate it in a practical scenario. Benefits of using this distribution are exemplified in both synthetic and real data sets.
In order to apply the dual distribution in the supervised learning scenario where the training data set is fixed, it is necessary to use weights to make the sample appear as if it came from the dual distribution. We explore the negative effect that weighting a sample can have. The theoretical decomposition of the use of weights regarding its effect on the out-of-sample error is easy to understand but not actionable in practice, as the quantities involved cannot be computed. Hence, we propose the Targeted Weighting algorithm that determines if, for a given set of weights, the out-of-sample performance will improve or not in a practical setting. This is necessary as the setting assumes there are no labeled points distributed according to the test distribution, only unlabeled samples.
Finally, we propose a new class of matching algorithms that can be used to match the training set to a desired distribution, such as the dual distribution (or the test distribution). These algorithms can be applied to very large datasets, and we show how they lead to improved performance in a large real dataset such as the Netflix dataset. Their computational complexity is the main reason for their advantage over previous algorithms proposed in the covariate shift literature.
In the second part of the thesis we apply Machine Learning to the problem of behavior recognition. We develop a specific behavior classifier to study fly aggression, and we develop a system that allows analyzing behavior in videos of animals, with minimal supervision. The system, which we call CUBA (Caltech Unsupervised Behavior Analysis), allows detecting movemes, actions, and stories from time series describing the position of animals in videos. The method summarizes the data, as well as it provides biologists with a mathematical tool to test new hypotheses. Other benefits of CUBA include finding classifiers for specific behaviors without the need for annotation, as well as providing means to discriminate groups of animals, for example, according to their genetic line.
Resumo:
Observational studies of our solar system's small-body populations (asteroids and comets) offer insight into the history of our planetary system, as these minor planets represent the left-over building blocks from its formation. The Palomar Transient Factory (PTF) survey began in 2009 as the latest wide-field sky-survey program to be conducted on the 1.2-meter Samuel Oschin telescope at Palomar Observatory. Though its main science program has been the discovery of high-energy extragalactic sources (such as supernovae), during its first five years PTF has collected nearly five million observations of over half a million unique solar system small bodies. This thesis begins to analyze this vast data set to address key population-level science topics, including: the detection rates of rare main-belt comets and small near-Earth asteroids, the spin and shape properties of asteroids as inferred from their lightcurves, the applicability of this visible light data to the interpretation of ultraviolet asteroid observations, and a comparison of the physical properties of main-belt and Jovian Trojan asteroids. Future sky-surveys would benefit from application of the analytical techniques presented herein, which include novel modeling methods and unique applications of machine-learning classification. The PTF asteroid small-body data produced in the course of this thesis work should remain a fertile source of solar system science and discovery for years to come.
Resumo:
Optical Coherence Tomography(OCT) is a popular, rapidly growing imaging technique with an increasing number of bio-medical applications due to its noninvasive nature. However, there are three major challenges in understanding and improving an OCT system: (1) Obtaining an OCT image is not easy. It either takes a real medical experiment or requires days of computer simulation. Without much data, it is difficult to study the physical processes underlying OCT imaging of different objects simply because there aren't many imaged objects. (2) Interpretation of an OCT image is also hard. This challenge is more profound than it appears. For instance, it would require a trained expert to tell from an OCT image of human skin whether there is a lesion or not. This is expensive in its own right, but even the expert cannot be sure about the exact size of the lesion or the width of the various skin layers. The take-away message is that analyzing an OCT image even from a high level would usually require a trained expert, and pixel-level interpretation is simply unrealistic. The reason is simple: we have OCT images but not their underlying ground-truth structure, so there is nothing to learn from. (3) The imaging depth of OCT is very limited (millimeter or sub-millimeter on human tissues). While OCT utilizes infrared light for illumination to stay noninvasive, the downside of this is that photons at such long wavelengths can only penetrate a limited depth into the tissue before getting back-scattered. To image a particular region of a tissue, photons first need to reach that region. As a result, OCT signals from deeper regions of the tissue are both weak (since few photons reached there) and distorted (due to multiple scatterings of the contributing photons). This fact alone makes OCT images very hard to interpret.
This thesis addresses the above challenges by successfully developing an advanced Monte Carlo simulation platform which is 10000 times faster than the state-of-the-art simulator in the literature, bringing down the simulation time from 360 hours to a single minute. This powerful simulation tool not only enables us to efficiently generate as many OCT images of objects with arbitrary structure and shape as we want on a common desktop computer, but it also provides us the underlying ground-truth of the simulated images at the same time because we dictate them at the beginning of the simulation. This is one of the key contributions of this thesis. What allows us to build such a powerful simulation tool includes a thorough understanding of the signal formation process, clever implementation of the importance sampling/photon splitting procedure, efficient use of a voxel-based mesh system in determining photon-mesh interception, and a parallel computation of different A-scans that consist a full OCT image, among other programming and mathematical tricks, which will be explained in detail later in the thesis.
Next we aim at the inverse problem: given an OCT image, predict/reconstruct its ground-truth structure on a pixel level. By solving this problem we would be able to interpret an OCT image completely and precisely without the help from a trained expert. It turns out that we can do much better. For simple structures we are able to reconstruct the ground-truth of an OCT image more than 98% correctly, and for more complicated structures (e.g., a multi-layered brain structure) we are looking at 93%. We achieved this through extensive uses of Machine Learning. The success of the Monte Carlo simulation already puts us in a great position by providing us with a great deal of data (effectively unlimited), in the form of (image, truth) pairs. Through a transformation of the high-dimensional response variable, we convert the learning task into a multi-output multi-class classification problem and a multi-output regression problem. We then build a hierarchy architecture of machine learning models (committee of experts) and train different parts of the architecture with specifically designed data sets. In prediction, an unseen OCT image first goes through a classification model to determine its structure (e.g., the number and the types of layers present in the image); then the image is handed to a regression model that is trained specifically for that particular structure to predict the length of the different layers and by doing so reconstruct the ground-truth of the image. We also demonstrate that ideas from Deep Learning can be useful to further improve the performance.
It is worth pointing out that solving the inverse problem automatically improves the imaging depth, since previously the lower half of an OCT image (i.e., greater depth) can be hardly seen but now becomes fully resolved. Interestingly, although OCT signals consisting the lower half of the image are weak, messy, and uninterpretable to human eyes, they still carry enough information which when fed into a well-trained machine learning model spits out precisely the true structure of the object being imaged. This is just another case where Artificial Intelligence (AI) outperforms human. To the best knowledge of the author, this thesis is not only a success but also the first attempt to reconstruct an OCT image at a pixel level. To even give a try on this kind of task, it would require fully annotated OCT images and a lot of them (hundreds or even thousands). This is clearly impossible without a powerful simulation tool like the one developed in this thesis.
Resumo:
In the first section of this thesis, two-dimensional properties of the human eye movement control system were studied. The vertical - horizontal interaction was investigated by using a two-dimensional target motion consisting of a sinusoid in one of the directions vertical or horizontal, and low-pass filtered Gaussian random motion of variable bandwidth (and hence information content) in the orthogonal direction. It was found that the random motion reduced the efficiency of the sinusoidal tracking. However, the sinusoidal tracking was only slightly dependent on the bandwidth of the random motion. Thus the system should be thought of as consisting of two independent channels with a small amount of mutual cross-talk.
These target motions were then rotated to discover whether or not the system is capable of recognizing the two-component nature of the target motion. That is, the sinusoid was presented along an oblique line (neither vertical nor horizontal) with the random motion orthogonal to it. The system did not simply track the vertical and horizontal components of motion, but rotated its frame of reference so that its two tracking channels coincided with the directions of the two target motion components. This recognition occurred even when the two orthogonal motions were both random, but with different bandwidths.
In the second section, time delays, prediction and power spectra were examined. Time delays were calculated in response to various periodic signals, various bandwidths of narrow-band Gaussian random motions and sinusoids. It was demonstrated that prediction occurred only when the target motion was periodic, and only if the harmonic content was such that the signal was sufficiently narrow-band. It appears as if general periodic motions are split into predictive and non-predictive components.
For unpredictable motions, the relationship between the time delay and the average speed of the retinal image was linear. Based on this I proposed a model explaining the time delays for both random and periodic motions. My experiments did not prove that the system is sampled data, or that it is continuous. However, the model can be interpreted as representative of a sample data system whose sample interval is a function of the target motion.
It was shown that increasing the bandwidth of the low-pass filtered Gaussian random motion resulted in an increase of the eye movement bandwidth. Some properties of the eyeball-muscle dynamics and the extraocular muscle "active state tension" were derived.