9 resultados para human behavior recognition
em CaltechTHESIS
Resumo:
Visual inputs to artificial and biological visual systems are often quantized: cameras accumulate photons from the visual world, and the brain receives action potentials from visual sensory neurons. Collecting more information quanta leads to a longer acquisition time and better performance. In many visual tasks, collecting a small number of quanta is sufficient to solve the task well. The ability to determine the right number of quanta is pivotal in situations where visual information is costly to obtain, such as photon-starved or time-critical environments. In these situations, conventional vision systems that always collect a fixed and large amount of information are infeasible. I develop a framework that judiciously determines the number of information quanta to observe based on the cost of observation and the requirement for accuracy. The framework implements the optimal speed versus accuracy tradeoff when two assumptions are met, namely that the task is fully specified probabilistically and constant over time. I also extend the framework to address scenarios that violate the assumptions. I deploy the framework to three recognition tasks: visual search (where both assumptions are satisfied), scotopic visual recognition (where the model is not specified), and visual discrimination with unknown stimulus onset (where the model is dynamic over time). Scotopic classification experiments suggest that the framework leads to dramatic improvement in photon-efficiency compared to conventional computer vision algorithms. Human psychophysics experiments confirmed that the framework provides a parsimonious and versatile explanation for human behavior under time pressure in both static and dynamic environments.
Resumo:
In the first part of the thesis we explore three fundamental questions that arise naturally when we conceive a machine learning scenario where the training and test distributions can differ. Contrary to conventional wisdom, we show that in fact mismatched training and test distribution can yield better out-of-sample performance. This optimal performance can be obtained by training with the dual distribution. This optimal training distribution depends on the test distribution set by the problem, but not on the target function that we want to learn. We show how to obtain this distribution in both discrete and continuous input spaces, as well as how to approximate it in a practical scenario. Benefits of using this distribution are exemplified in both synthetic and real data sets.
In order to apply the dual distribution in the supervised learning scenario where the training data set is fixed, it is necessary to use weights to make the sample appear as if it came from the dual distribution. We explore the negative effect that weighting a sample can have. The theoretical decomposition of the use of weights regarding its effect on the out-of-sample error is easy to understand but not actionable in practice, as the quantities involved cannot be computed. Hence, we propose the Targeted Weighting algorithm that determines if, for a given set of weights, the out-of-sample performance will improve or not in a practical setting. This is necessary as the setting assumes there are no labeled points distributed according to the test distribution, only unlabeled samples.
Finally, we propose a new class of matching algorithms that can be used to match the training set to a desired distribution, such as the dual distribution (or the test distribution). These algorithms can be applied to very large datasets, and we show how they lead to improved performance in a large real dataset such as the Netflix dataset. Their computational complexity is the main reason for their advantage over previous algorithms proposed in the covariate shift literature.
In the second part of the thesis we apply Machine Learning to the problem of behavior recognition. We develop a specific behavior classifier to study fly aggression, and we develop a system that allows analyzing behavior in videos of animals, with minimal supervision. The system, which we call CUBA (Caltech Unsupervised Behavior Analysis), allows detecting movemes, actions, and stories from time series describing the position of animals in videos. The method summarizes the data, as well as it provides biologists with a mathematical tool to test new hypotheses. Other benefits of CUBA include finding classifiers for specific behaviors without the need for annotation, as well as providing means to discriminate groups of animals, for example, according to their genetic line.
Resumo:
Acetyltransferases and deacetylases catalyze the addition and removal, respectively, of acetyl groups to the epsilon-amino group of protein lysine residues. This modification can affect the function of a protein through several means, including the recruitment of specific binding partners called acetyl-lysine readers. Acetyltransferases, deacetylases, and acetyl-lysine readers have emerged as crucial regulators of biological processes and prominent targets for the treatment of human disease. This work describes a combination of structural, biochemical, biophysical, cell-biological, and organismal studies undertaken on a set of proteins that cumulatively include all steps of the acetylation process: the acetyltransferase MEC-17, the deacetylase SIRT1, and the acetyl-lysine reader DPF2. Tubulin acetylation by MEC-17 is associated with stable, long-lived microtubule structures. We determined the crystal structure of the catalytic domain of human MEC-17 in complex with the cofactor acetyl-CoA. The structure in combination with an extensive enzymatic analysis of MEC-17 mutants identified residues for cofactor and substrate recognition and activity. A large, evolutionarily conserved hydrophobic surface patch distal to the active site was shown to be necessary for catalysis, suggesting that specificity is achieved by interactions with the alpha-tubulin substrate that extend outside of the modified surface loop. Experiments in C. elegans showed that while MEC-17 is required for touch sensitivity, MEC-17 enzymatic activity is dispensible for this behavior. SIRT1 deacetylates a wide range of substrates, including p53, NF-kappaB, FOXO transcription factors, and PGC-1-alpha, with roles in cellular processes ranging from energy metabolism to cell survival. SIRT1 activity is uniquely controlled by a C-terminal regulatory segment (CTR). Here we present crystal structures of the catalytic domain of human SIRT1 in complex with the CTR in an apo form and in complex with a cofactor and a pseudo-substrate peptide. The catalytic domain adopts the canonical sirtuin fold. The CTR forms a beta-hairpin structure that complements the beta-sheet of the NAD^+-binding domain, covering an essentially invariant, hydrophobic surface. A comparison of the apo and cofactor bound structures revealed conformational changes throughout catalysis, including a rotation of a smaller subdomain with respect to the larger NAD^+-binding subdomain. A biochemical analysis identified key residues in the active site, an inhibitory role for the CTR, and distinct structural features of the CTR that mediate binding and inhibition of the SIRT1 catalytic domain. DPF2 represses myeloid differentiation in acute myelogenous leukemia. Finally, we solved the crystal structure of the tandem PHD domain of human DPF2. We showed that DPF2 preferentially binds H3 tail peptides acetylated at Lys14, and binds H4 tail peptides with no preference for acetylation state. Through a structural and mutational analysis we identify the molecular basis of histone recognition. We propose a model for the role of DPF2 in AML and identify the DPF2 tandem PHD finger domain as a promising novel target for anti-leukemia therapeutics.
Resumo:
Part I of the thesis describes the olfactory searching and scanning behaviors of rats in a wind tunnel, and a detailed movement analysis of terrestrial arthropod olfactory scanning behavior. Olfactory scanning behaviors in rats may be a behavioral correlate to hippocampal place cell activity.
Part II focuses on the organization of olfactory perception, what it suggests about a natural order for chemicals in the environment, and what this in tum suggests about the organization of the olfactory system. A model of odor quality space (analogous to the "color wheel") is presented. This model defines relationships between odor qualities perceived by human subjects based on a quantitative similarity measure. Compounds containing Carbon, Nitrogen, or Sulfur elicit odors that are contiguous in this odor representation, which thus allows one to predict the broad class of odor qualities a compound is likely to elicit. Based on these findings, a natural organization for olfactory stimuli is hypothesized: the order provided by the metabolic process. This hypothesis is tested by comparing compounds that are structurally similar, perceptually similar, and metabolically similar in a psychophysical cross-adaptation paradigm. Metabolically similar compounds consistently evoked shifts in odor quality and intensity under cross-adaptation, while compounds that were structurally similar or perceptually similar did not. This suggests that the olfactory system may process metabolically similar compounds using the same neural pathways, and that metabolic similarity may be the fundamental metric about which olfactory processing is organized. In other words, the olfactory system may be organized around a biological basis.
The idea of a biological basis for olfactory perception represents a shift in how olfaction is understood. The biological view has predictive power while the current chemical view does not, and the biological view provides explanations for some of the most basic questions in olfaction, that are unanswered in the chemical view. Existing data do not disprove a biological view, and are consistent with basic hypotheses that arise from this viewpoint.
Resumo:
This study examines binding of α- and β-D-glucose in their equilibrium mixture to the glucose transporter (GLUT1) in human erythrocyte membrane preparations by an ^1H NMR method, the transferred NOE (TRNOE). This method is shown theoretically and experimentally to be a sensitive probe of weak ligand-macromolecule interactions. The TRNOEs observed are shown to arise solely from glucose binding to GLUT1. Sites at both membrane faces contribute to the TRNOEs. Binding curves obtained are consistent with a homogeneous class of sugar sites, with an apparent KD which varies (from ~30 mM to ~70 mM for both anomers) depending on the membrane preparation examined. Preparations with a higher proportion of the cytoplasmic membrane face exposed to bulk solution yield higher apparent KKDs. The glucose transport inhibitor cytochalasin B essentially eliminates the TRNOE. Nonlinearity was found in the dependence on sugar concentration of the apparent inhibition constant for cytochalasin B reversal of the TRNOE observed in the α anomer (and probably the β anomer); such nonlinearity implies the existence of ternary complexes of sugar, inhibitor and transporter. The inhibition results furthermore imply the presence of a class of relatively high-affinity (KD < 2mM) sugar sites specific for the α anomer which do not contribute to NMR-observable binding. The presence of two classes of sugar-sensitive cytochalasin B sites is also indicated. These results are compared with predictions of the alternating conformer model of glucose transport. Variation of apparent KD in the NMR-observable sites, the formation of ternary complexes and the presence of an anomer-specific site are shown to be inconsistent with this model. An alternate model is developed which reconciles these results with the known transport behavior of GLUT1. In this model, the transporter possesses (at minimum) three classes of sugar sites: (i) transport sites, which are alternately exposed to the cytoplasmic or the extracellular compartment, but never to both simultaneously, (ii) a class of sites (probably relatively low-affinity) which are confined to one compartment, and (iii) the high-affinity α anomer-specific sites, which are confined to the cytoplasmic compartment.
Resumo:
This thesis explores the problem of mobile robot navigation in dense human crowds. We begin by considering a fundamental impediment to classical motion planning algorithms called the freezing robot problem: once the environment surpasses a certain level of complexity, the planner decides that all forward paths are unsafe, and the robot freezes in place (or performs unnecessary maneuvers) to avoid collisions. Since a feasible path typically exists, this behavior is suboptimal. Existing approaches have focused on reducing predictive uncertainty by employing higher fidelity individual dynamics models or heuristically limiting the individual predictive covariance to prevent overcautious navigation. We demonstrate that both the individual prediction and the individual predictive uncertainty have little to do with this undesirable navigation behavior. Additionally, we provide evidence that dynamic agents are able to navigate in dense crowds by engaging in joint collision avoidance, cooperatively making room to create feasible trajectories. We accordingly develop interacting Gaussian processes, a prediction density that captures cooperative collision avoidance, and a "multiple goal" extension that models the goal driven nature of human decision making. Navigation naturally emerges as a statistic of this distribution.
Most importantly, we empirically validate our models in the Chandler dining hall at Caltech during peak hours, and in the process, carry out the first extensive quantitative study of robot navigation in dense human crowds (collecting data on 488 runs). The multiple goal interacting Gaussian processes algorithm performs comparably with human teleoperators in crowd densities nearing 1 person/m2, while a state of the art noncooperative planner exhibits unsafe behavior more than 3 times as often as the multiple goal extension, and twice as often as the basic interacting Gaussian process approach. Furthermore, a reactive planner based on the widely used dynamic window approach proves insufficient for crowd densities above 0.55 people/m2. We also show that our noncooperative planner or our reactive planner capture the salient characteristics of nearly any dynamic navigation algorithm. For inclusive validation purposes, we show that either our non-interacting planner or our reactive planner captures the salient characteristics of nearly any existing dynamic navigation algorithm. Based on these experimental results and theoretical observations, we conclude that a cooperation model is critical for safe and efficient robot navigation in dense human crowds.
Finally, we produce a large database of ground truth pedestrian crowd data. We make this ground truth database publicly available for further scientific study of crowd prediction models, learning from demonstration algorithms, and human robot interaction models in general.
Resumo:
Humans are particularly adept at modifying their behavior in accordance with changing environmental demands. Through various mechanisms of cognitive control, individuals are able to tailor actions to fit complex short- and long-term goals. The research described in this thesis uses functional magnetic resonance imaging to characterize the neural correlates of cognitive control at two levels of complexity: response inhibition and self-control in intertemporal choice. First, we examined changes in neural response associated with increased experience and skill in response inhibition; successful response inhibition was associated with decreased neural response over time in the right ventrolateral prefrontal cortex, a region widely implicated in cognitive control, providing evidence for increased neural efficiency with learned automaticity. We also examined a more abstract form of cognitive control using intertemporal choice. In two experiments, we identified putative neural substrates for individual differences in temporal discounting, or the tendency to prefer immediate to delayed rewards. Using dynamic causal models, we characterized the neural circuit between ventromedial prefrontal cortex, an area involved in valuation, and dorsolateral prefrontal cortex, a region implicated in self-control in intertemporal and dietary choice, and found that connectivity from dorsolateral prefrontal cortex to ventromedial prefrontal cortex increases at the time of choice, particularly when delayed rewards are chosen. Moreover, estimates of the strength of connectivity predicted out-of-sample individual rates of temporal discounting, suggesting a neurocomputational mechanism for variation in the ability to delay gratification. Next, we interrogated the hypothesis that individual differences in temporal discounting are in part explained by the ability to imagine future reward outcomes. Using a novel paradigm, we imaged neural response during the imagining of primary rewards, and identified negative correlations between activity in regions associated the processing of both real and imagined rewards (lateral orbitofrontal cortex and ventromedial prefrontal cortex, respectively) and the individual temporal discounting parameters estimated in the previous experiment. These data suggest that individuals who are better able to represent reward outcomes neurally are less susceptible to temporal discounting. Together, these findings provide further insight into role of the prefrontal cortex in implementing cognitive control, and propose neurobiological substrates for individual variation.
Resumo:
The visual system is a remarkable platform that evolved to solve difficult computational problems such as detection, recognition, and classification of objects. Of great interest is the face-processing network, a sub-system buried deep in the temporal lobe, dedicated for analyzing specific type of objects (faces). In this thesis, I focus on the problem of face detection by the face-processing network. Insights obtained from years of developing computer-vision algorithms to solve this task have suggested that it may be efficiently and effectively solved by detection and integration of local contrast features. Does the brain use a similar strategy? To answer this question, I embark on a journey that takes me through the development and optimization of dedicated tools for targeting and perturbing deep brain structures. Data collected using MR-guided electrophysiology in early face-processing regions was found to have strong selectivity for contrast features, similar to ones used by artificial systems. While individual cells were tuned for only a small subset of features, the population as a whole encoded the full spectrum of features that are predictive to the presence of a face in an image. Together with additional evidence, my results suggest a possible computational mechanism for face detection in early face processing regions. To move from correlation to causation, I focus on adopting an emergent technology for perturbing brain activity using light: optogenetics. While this technique has the potential to overcome problems associated with the de-facto way of brain stimulation (electrical microstimulation), many open questions remain about its applicability and effectiveness for perturbing the non-human primate (NHP) brain. In a set of experiments, I use viral vectors to deliver genetically encoded optogenetic constructs to the frontal eye field and faceselective regions in NHP and examine their effects side-by-side with electrical microstimulation to assess their effectiveness in perturbing neural activity as well as behavior. Results suggest that cells are robustly and strongly modulated upon light delivery and that such perturbation can modulate and even initiate motor behavior, thus, paving the way for future explorations that may apply these tools to study connectivity and information flow in the face processing network.
Resumo:
In the first section of this thesis, two-dimensional properties of the human eye movement control system were studied. The vertical - horizontal interaction was investigated by using a two-dimensional target motion consisting of a sinusoid in one of the directions vertical or horizontal, and low-pass filtered Gaussian random motion of variable bandwidth (and hence information content) in the orthogonal direction. It was found that the random motion reduced the efficiency of the sinusoidal tracking. However, the sinusoidal tracking was only slightly dependent on the bandwidth of the random motion. Thus the system should be thought of as consisting of two independent channels with a small amount of mutual cross-talk.
These target motions were then rotated to discover whether or not the system is capable of recognizing the two-component nature of the target motion. That is, the sinusoid was presented along an oblique line (neither vertical nor horizontal) with the random motion orthogonal to it. The system did not simply track the vertical and horizontal components of motion, but rotated its frame of reference so that its two tracking channels coincided with the directions of the two target motion components. This recognition occurred even when the two orthogonal motions were both random, but with different bandwidths.
In the second section, time delays, prediction and power spectra were examined. Time delays were calculated in response to various periodic signals, various bandwidths of narrow-band Gaussian random motions and sinusoids. It was demonstrated that prediction occurred only when the target motion was periodic, and only if the harmonic content was such that the signal was sufficiently narrow-band. It appears as if general periodic motions are split into predictive and non-predictive components.
For unpredictable motions, the relationship between the time delay and the average speed of the retinal image was linear. Based on this I proposed a model explaining the time delays for both random and periodic motions. My experiments did not prove that the system is sampled data, or that it is continuous. However, the model can be interpreted as representative of a sample data system whose sample interval is a function of the target motion.
It was shown that increasing the bandwidth of the low-pass filtered Gaussian random motion resulted in an increase of the eye movement bandwidth. Some properties of the eyeball-muscle dynamics and the extraocular muscle "active state tension" were derived.