4 resultados para CATEGORICAL-DATA ANALYSIS
em CaltechTHESIS
Resumo:
The brain is perhaps the most complex system to have ever been subjected to rigorous scientific investigation. The scale is staggering: over 10^11 neurons, each making an average of 10^3 synapses, with computation occurring on scales ranging from a single dendritic spine, to an entire cortical area. Slowly, we are beginning to acquire experimental tools that can gather the massive amounts of data needed to characterize this system. However, to understand and interpret these data will also require substantial strides in inferential and statistical techniques. This dissertation attempts to meet this need, extending and applying the modern tools of latent variable modeling to problems in neural data analysis.
It is divided into two parts. The first begins with an exposition of the general techniques of latent variable modeling. A new, extremely general, optimization algorithm is proposed - called Relaxation Expectation Maximization (REM) - that may be used to learn the optimal parameter values of arbitrary latent variable models. This algorithm appears to alleviate the common problem of convergence to local, sub-optimal, likelihood maxima. REM leads to a natural framework for model size selection; in combination with standard model selection techniques the quality of fits may be further improved, while the appropriate model size is automatically and efficiently determined. Next, a new latent variable model, the mixture of sparse hidden Markov models, is introduced, and approximate inference and learning algorithms are derived for it. This model is applied in the second part of the thesis.
The second part brings the technology of part I to bear on two important problems in experimental neuroscience. The first is known as spike sorting; this is the problem of separating the spikes from different neurons embedded within an extracellular recording. The dissertation offers the first thorough statistical analysis of this problem, which then yields the first powerful probabilistic solution. The second problem addressed is that of characterizing the distribution of spike trains recorded from the same neuron under identical experimental conditions. A latent variable model is proposed. Inference and learning in this model leads to new principled algorithms for smoothing and clustering of spike data.
Resumo:
This thesis is an investigation into the nature of data analysis and computer software systems which support this activity.
The first chapter develops the notion of data analysis as an experimental science which has two major components: data-gathering and theory-building. The basic role of language in determining the meaningfulness of theory is stressed, and the informativeness of a language and data base pair is studied. The static and dynamic aspects of data analysis are then considered from this conceptual vantage point. The second chapter surveys the available types of computer systems which may be useful for data analysis. Particular attention is paid to the questions raised in the first chapter about the language restrictions imposed by the computer system and its dynamic properties.
The third chapter discusses the REL data analysis system, which was designed to satisfy the needs of the data analyzer in an operational relational data system. The major limitation on the use of such systems is the amount of access to data stored on a relatively slow secondary memory. This problem of the paging of data is investigated and two classes of data structure representations are found, each of which has desirable paging characteristics for certain types of queries. One representation is used by most of the generalized data base management systems in existence today, but the other is clearly preferred in the data analysis environment, as conceptualized in Chapter I.
This data representation has strong implications for a fundamental process of data analysis -- the quantification of variables. Since quantification is one of the few means of summarizing and abstracting, data analysis systems are under strong pressure to facilitate the process. Two implementations of quantification are studied: one analagous to the form of the lower predicate calculus and another more closely attuned to the data representation. A comparison of these indicates that the use of the "label class" method results in orders of magnitude improvement over the lower predicate calculus technique.
Resumo:
In this work, we further extend the recently developed adaptive data analysis method, the Sparse Time-Frequency Representation (STFR) method. This method is based on the assumption that many physical signals inherently contain AM-FM representations. We propose a sparse optimization method to extract the AM-FM representations of such signals. We prove the convergence of the method for periodic signals under certain assumptions and provide practical algorithms specifically for the non-periodic STFR, which extends the method to tackle problems that former STFR methods could not handle, including stability to noise and non-periodic data analysis. This is a significant improvement since many adaptive and non-adaptive signal processing methods are not fully capable of handling non-periodic signals. Moreover, we propose a new STFR algorithm to study intrawave signals with strong frequency modulation and analyze the convergence of this new algorithm for periodic signals. Such signals have previously remained a bottleneck for all signal processing methods. Furthermore, we propose a modified version of STFR that facilitates the extraction of intrawaves that have overlaping frequency content. We show that the STFR methods can be applied to the realm of dynamical systems and cardiovascular signals. In particular, we present a simplified and modified version of the STFR algorithm that is potentially useful for the diagnosis of some cardiovascular diseases. We further explain some preliminary work on the nature of Intrinsic Mode Functions (IMFs) and how they can have different representations in different phase coordinates. This analysis shows that the uncertainty principle is fundamental to all oscillating signals.
Resumo:
Laser interferometer gravitational wave observatory (LIGO) consists of two complex large-scale laser interferometers designed for direct detection of gravitational waves from distant astrophysical sources in the frequency range 10Hz - 5kHz. Direct detection of space-time ripples will support Einstein's general theory of relativity and provide invaluable information and new insight into physics of the Universe.
Initial phase of LIGO started in 2002, and since then data was collected during six science runs. Instrument sensitivity was improving from run to run due to the effort of commissioning team. Initial LIGO has reached designed sensitivity during the last science run, which ended in October 2010.
In parallel with commissioning and data analysis with the initial detector, LIGO group worked on research and development of the next generation detectors. Major instrument upgrade from initial to advanced LIGO started in 2010 and lasted till 2014.
This thesis describes results of commissioning work done at LIGO Livingston site from 2013 until 2015 in parallel with and after the installation of the instrument. This thesis also discusses new techniques and tools developed at the 40m prototype including adaptive filtering, estimation of quantization noise in digital filters and design of isolation kits for ground seismometers.
The first part of this thesis is devoted to the description of methods for bringing interferometer to the linear regime when collection of data becomes possible. States of longitudinal and angular controls of interferometer degrees of freedom during lock acquisition process and in low noise configuration are discussed in details.
Once interferometer is locked and transitioned to low noise regime, instrument produces astrophysics data that should be calibrated to units of meters or strain. The second part of this thesis describes online calibration technique set up in both observatories to monitor the quality of the collected data in real time. Sensitivity analysis was done to understand and eliminate noise sources of the instrument.
Coupling of noise sources to gravitational wave channel can be reduced if robust feedforward and optimal feedback control loops are implemented. The last part of this thesis describes static and adaptive feedforward noise cancellation techniques applied to Advanced LIGO interferometers and tested at the 40m prototype. Applications of optimal time domain feedback control techniques and estimators to aLIGO control loops are also discussed.
Commissioning work is still ongoing at the sites. First science run of advanced LIGO is planned for September 2015 and will last for 3-4 months. This run will be followed by a set of small instrument upgrades that will be installed on a time scale of few months. Second science run will start in spring 2016 and last for about 6 months. Since current sensitivity of advanced LIGO is already more than factor of 3 higher compared to initial detectors and keeps improving on a monthly basis, upcoming science runs have a good chance for the first direct detection of gravitational waves.