3 resultados para Statistical language models

em CaltechTHESIS


Relevância:

80.00% 80.00%

Publicador:

Resumo:

The first chapter of this thesis deals with automating data gathering for single cell microfluidic tests. The programs developed saved significant amounts of time with no loss in accuracy. The technology from this chapter was applied to experiments in both Chapters 4 and 5.

The second chapter describes the use of statistical learning to prognose if an anti-angiogenic drug (Bevacizumab) would successfully treat a glioblastoma multiforme tumor. This was conducted by first measuring protein levels from 92 blood samples using the DNA-encoded antibody library platform. This allowed the measure of 35 different proteins per sample, with comparable sensitivity to ELISA. Two statistical learning models were developed in order to predict whether the treatment would succeed. The first, logistic regression, predicted with 85% accuracy and an AUC of 0.901 using a five protein panel. These five proteins were statistically significant predictors and gave insight into the mechanism behind anti-angiogenic success/failure. The second model, an ensemble model of logistic regression, kNN, and random forest, predicted with a slightly higher accuracy of 87%.

The third chapter details the development of a photocleavable conjugate that multiplexed cell surface detection in microfluidic devices. The method successfully detected streptavidin on coated beads with 92% positive predictive rate. Furthermore, chambers with 0, 1, 2, and 3+ beads were statistically distinguishable. The method was then used to detect CD3 on Jurkat T cells, yielding a positive predictive rate of 49% and false positive rate of 0%.

The fourth chapter talks about the use of measuring T cell polyfunctionality in order to predict whether a patient will succeed an adoptive T cells transfer therapy. In 15 patients, we measured 10 proteins from individual T cells (~300 cells per patient). The polyfunctional strength index was calculated, which was then correlated with the patient's progress free survival (PFS) time. 52 other parameters measured in the single cell test were correlated with the PFS. No statistical correlator has been determined, however, and more data is necessary to reach a conclusion.

Finally, the fifth chapter talks about the interactions between T cells and how that affects their protein secretion. It was observed that T cells in direct contact selectively enhance their protein secretion, in some cases by over 5 fold. This occurred for Granzyme B, Perforin, CCL4, TNFa, and IFNg. IL- 10 was shown to decrease slightly upon contact. This phenomenon held true for T cells from all patients tested (n=8). Using single cell data, the theoretical protein secretion frequency was calculated for two cells and then compared to the observed rate of secretion for both two cells not in contact, and two cells in contact. In over 90% of cases, the theoretical protein secretion rate matched that of two cells not in contact.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This thesis explores the problem of mobile robot navigation in dense human crowds. We begin by considering a fundamental impediment to classical motion planning algorithms called the freezing robot problem: once the environment surpasses a certain level of complexity, the planner decides that all forward paths are unsafe, and the robot freezes in place (or performs unnecessary maneuvers) to avoid collisions. Since a feasible path typically exists, this behavior is suboptimal. Existing approaches have focused on reducing predictive uncertainty by employing higher fidelity individual dynamics models or heuristically limiting the individual predictive covariance to prevent overcautious navigation. We demonstrate that both the individual prediction and the individual predictive uncertainty have little to do with this undesirable navigation behavior. Additionally, we provide evidence that dynamic agents are able to navigate in dense crowds by engaging in joint collision avoidance, cooperatively making room to create feasible trajectories. We accordingly develop interacting Gaussian processes, a prediction density that captures cooperative collision avoidance, and a "multiple goal" extension that models the goal driven nature of human decision making. Navigation naturally emerges as a statistic of this distribution.

Most importantly, we empirically validate our models in the Chandler dining hall at Caltech during peak hours, and in the process, carry out the first extensive quantitative study of robot navigation in dense human crowds (collecting data on 488 runs). The multiple goal interacting Gaussian processes algorithm performs comparably with human teleoperators in crowd densities nearing 1 person/m2, while a state of the art noncooperative planner exhibits unsafe behavior more than 3 times as often as the multiple goal extension, and twice as often as the basic interacting Gaussian process approach. Furthermore, a reactive planner based on the widely used dynamic window approach proves insufficient for crowd densities above 0.55 people/m2. We also show that our noncooperative planner or our reactive planner capture the salient characteristics of nearly any dynamic navigation algorithm. For inclusive validation purposes, we show that either our non-interacting planner or our reactive planner captures the salient characteristics of nearly any existing dynamic navigation algorithm. Based on these experimental results and theoretical observations, we conclude that a cooperation model is critical for safe and efficient robot navigation in dense human crowds.

Finally, we produce a large database of ground truth pedestrian crowd data. We make this ground truth database publicly available for further scientific study of crowd prediction models, learning from demonstration algorithms, and human robot interaction models in general.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The brain is perhaps the most complex system to have ever been subjected to rigorous scientific investigation. The scale is staggering: over 10^11 neurons, each making an average of 10^3 synapses, with computation occurring on scales ranging from a single dendritic spine, to an entire cortical area. Slowly, we are beginning to acquire experimental tools that can gather the massive amounts of data needed to characterize this system. However, to understand and interpret these data will also require substantial strides in inferential and statistical techniques. This dissertation attempts to meet this need, extending and applying the modern tools of latent variable modeling to problems in neural data analysis.

It is divided into two parts. The first begins with an exposition of the general techniques of latent variable modeling. A new, extremely general, optimization algorithm is proposed - called Relaxation Expectation Maximization (REM) - that may be used to learn the optimal parameter values of arbitrary latent variable models. This algorithm appears to alleviate the common problem of convergence to local, sub-optimal, likelihood maxima. REM leads to a natural framework for model size selection; in combination with standard model selection techniques the quality of fits may be further improved, while the appropriate model size is automatically and efficiently determined. Next, a new latent variable model, the mixture of sparse hidden Markov models, is introduced, and approximate inference and learning algorithms are derived for it. This model is applied in the second part of the thesis.

The second part brings the technology of part I to bear on two important problems in experimental neuroscience. The first is known as spike sorting; this is the problem of separating the spikes from different neurons embedded within an extracellular recording. The dissertation offers the first thorough statistical analysis of this problem, which then yields the first powerful probabilistic solution. The second problem addressed is that of characterizing the distribution of spike trains recorded from the same neuron under identical experimental conditions. A latent variable model is proposed. Inference and learning in this model leads to new principled algorithms for smoothing and clustering of spike data.