7 resultados para Curricular Support Data Analysis
em CaltechTHESIS
Resumo:
This thesis is an investigation into the nature of data analysis and computer software systems which support this activity.
The first chapter develops the notion of data analysis as an experimental science which has two major components: data-gathering and theory-building. The basic role of language in determining the meaningfulness of theory is stressed, and the informativeness of a language and data base pair is studied. The static and dynamic aspects of data analysis are then considered from this conceptual vantage point. The second chapter surveys the available types of computer systems which may be useful for data analysis. Particular attention is paid to the questions raised in the first chapter about the language restrictions imposed by the computer system and its dynamic properties.
The third chapter discusses the REL data analysis system, which was designed to satisfy the needs of the data analyzer in an operational relational data system. The major limitation on the use of such systems is the amount of access to data stored on a relatively slow secondary memory. This problem of the paging of data is investigated and two classes of data structure representations are found, each of which has desirable paging characteristics for certain types of queries. One representation is used by most of the generalized data base management systems in existence today, but the other is clearly preferred in the data analysis environment, as conceptualized in Chapter I.
This data representation has strong implications for a fundamental process of data analysis -- the quantification of variables. Since quantification is one of the few means of summarizing and abstracting, data analysis systems are under strong pressure to facilitate the process. Two implementations of quantification are studied: one analagous to the form of the lower predicate calculus and another more closely attuned to the data representation. A comparison of these indicates that the use of the "label class" method results in orders of magnitude improvement over the lower predicate calculus technique.
Resumo:
The brain is perhaps the most complex system to have ever been subjected to rigorous scientific investigation. The scale is staggering: over 10^11 neurons, each making an average of 10^3 synapses, with computation occurring on scales ranging from a single dendritic spine, to an entire cortical area. Slowly, we are beginning to acquire experimental tools that can gather the massive amounts of data needed to characterize this system. However, to understand and interpret these data will also require substantial strides in inferential and statistical techniques. This dissertation attempts to meet this need, extending and applying the modern tools of latent variable modeling to problems in neural data analysis.
It is divided into two parts. The first begins with an exposition of the general techniques of latent variable modeling. A new, extremely general, optimization algorithm is proposed - called Relaxation Expectation Maximization (REM) - that may be used to learn the optimal parameter values of arbitrary latent variable models. This algorithm appears to alleviate the common problem of convergence to local, sub-optimal, likelihood maxima. REM leads to a natural framework for model size selection; in combination with standard model selection techniques the quality of fits may be further improved, while the appropriate model size is automatically and efficiently determined. Next, a new latent variable model, the mixture of sparse hidden Markov models, is introduced, and approximate inference and learning algorithms are derived for it. This model is applied in the second part of the thesis.
The second part brings the technology of part I to bear on two important problems in experimental neuroscience. The first is known as spike sorting; this is the problem of separating the spikes from different neurons embedded within an extracellular recording. The dissertation offers the first thorough statistical analysis of this problem, which then yields the first powerful probabilistic solution. The second problem addressed is that of characterizing the distribution of spike trains recorded from the same neuron under identical experimental conditions. A latent variable model is proposed. Inference and learning in this model leads to new principled algorithms for smoothing and clustering of spike data.
Resumo:
In this work, we further extend the recently developed adaptive data analysis method, the Sparse Time-Frequency Representation (STFR) method. This method is based on the assumption that many physical signals inherently contain AM-FM representations. We propose a sparse optimization method to extract the AM-FM representations of such signals. We prove the convergence of the method for periodic signals under certain assumptions and provide practical algorithms specifically for the non-periodic STFR, which extends the method to tackle problems that former STFR methods could not handle, including stability to noise and non-periodic data analysis. This is a significant improvement since many adaptive and non-adaptive signal processing methods are not fully capable of handling non-periodic signals. Moreover, we propose a new STFR algorithm to study intrawave signals with strong frequency modulation and analyze the convergence of this new algorithm for periodic signals. Such signals have previously remained a bottleneck for all signal processing methods. Furthermore, we propose a modified version of STFR that facilitates the extraction of intrawaves that have overlaping frequency content. We show that the STFR methods can be applied to the realm of dynamical systems and cardiovascular signals. In particular, we present a simplified and modified version of the STFR algorithm that is potentially useful for the diagnosis of some cardiovascular diseases. We further explain some preliminary work on the nature of Intrinsic Mode Functions (IMFs) and how they can have different representations in different phase coordinates. This analysis shows that the uncertainty principle is fundamental to all oscillating signals.
Resumo:
Laser interferometer gravitational wave observatory (LIGO) consists of two complex large-scale laser interferometers designed for direct detection of gravitational waves from distant astrophysical sources in the frequency range 10Hz - 5kHz. Direct detection of space-time ripples will support Einstein's general theory of relativity and provide invaluable information and new insight into physics of the Universe.
Initial phase of LIGO started in 2002, and since then data was collected during six science runs. Instrument sensitivity was improving from run to run due to the effort of commissioning team. Initial LIGO has reached designed sensitivity during the last science run, which ended in October 2010.
In parallel with commissioning and data analysis with the initial detector, LIGO group worked on research and development of the next generation detectors. Major instrument upgrade from initial to advanced LIGO started in 2010 and lasted till 2014.
This thesis describes results of commissioning work done at LIGO Livingston site from 2013 until 2015 in parallel with and after the installation of the instrument. This thesis also discusses new techniques and tools developed at the 40m prototype including adaptive filtering, estimation of quantization noise in digital filters and design of isolation kits for ground seismometers.
The first part of this thesis is devoted to the description of methods for bringing interferometer to the linear regime when collection of data becomes possible. States of longitudinal and angular controls of interferometer degrees of freedom during lock acquisition process and in low noise configuration are discussed in details.
Once interferometer is locked and transitioned to low noise regime, instrument produces astrophysics data that should be calibrated to units of meters or strain. The second part of this thesis describes online calibration technique set up in both observatories to monitor the quality of the collected data in real time. Sensitivity analysis was done to understand and eliminate noise sources of the instrument.
Coupling of noise sources to gravitational wave channel can be reduced if robust feedforward and optimal feedback control loops are implemented. The last part of this thesis describes static and adaptive feedforward noise cancellation techniques applied to Advanced LIGO interferometers and tested at the 40m prototype. Applications of optimal time domain feedback control techniques and estimators to aLIGO control loops are also discussed.
Commissioning work is still ongoing at the sites. First science run of advanced LIGO is planned for September 2015 and will last for 3-4 months. This run will be followed by a set of small instrument upgrades that will be installed on a time scale of few months. Second science run will start in spring 2016 and last for about 6 months. Since current sensitivity of advanced LIGO is already more than factor of 3 higher compared to initial detectors and keeps improving on a monthly basis, upcoming science runs have a good chance for the first direct detection of gravitational waves.
Resumo:
Chlorine oxide species have received considerable attention in recent years due to their central role in the balance of stratospheric ozone. Many questions pertaining to the behavior of such species still remain unanswered and plague the ability of researchers to develop accurate chemical models of the stratosphere. Presented in this thesis are three experiments that study various properties of some specific chlorine oxide species.
In the first chapter, the reaction between ClONO_2 and protonated water clusters is investigated to elucidate a possible reaction mechanism for the heterogeneous reaction of chlorine nitrate on ice. The ionic products were various forms of protonated nitric acid, NO_2 +(H_20)_m, m = 0, 1, 2. These products are analogous to products previously reported in the literature for the neutral reaction occurring on ice surfaces. Our results support the hypothesis that the heterogeneous reaction is acid-catalyzed.
In the second chapter, the photochemistry of ClONO_2 was investigated at two wavelengths, 193 and 248 nm, using the technique of photofragmentation translational spectroscopy. At both wavelengths, the predominant dissociation pathways were Cl + NO_3 and ClO + NO_2. Channel assignments were confirmed by momentum matching the counterfragments from each channel. A one-dimensional stratospheric model using the new 248 nm branching ratio determined how our results would affect the predicted Cl_x and NO_x partitioning in the stratosphere.
Chapter three explores the photodissociation dynamics of Cl_2O at 193, 248 and 308 nm. At 193 nm, we found evidence for the concerted reaction channel, Cl_2 + O. The ClO + Cl channel was also accessed, however, the majority of the ClO fragments were formed with sufficient internal energies for spontaneous secondary dissociation to occur. At 248 and 308 nm, we only observed only the ClO + Cl channel. . Some of the ClO formed at 248 nm was formed internally hot and spontaneously dissociated. Bimodal translational energy distributions of the ClO and Cl products indicate two pathways leading to the same product exist.
Appendix A, B and C discuss the details of data analysis techniques used in Chapters 1 and 2. The development of a molecular beam source of ClO dimer is presented in Appendix D.
Resumo:
In the past many different methodologies have been devised to support software development and different sets of methodologies have been developed to support the analysis of software artefacts. We have identified this mismatch as one of the causes of the poor reliability of embedded systems software. The issue with software development styles is that they are ``analysis-agnostic.'' They do not try to structure the code in a way that lends itself to analysis. The analysis is usually applied post-mortem after the software was developed and it requires a large amount of effort. The issue with software analysis methodologies is that they do not exploit available information about the system being analyzed.
In this thesis we address the above issues by developing a new methodology, called "analysis-aware" design, that links software development styles with the capabilities of analysis tools. This methodology forms the basis of a framework for interactive software development. The framework consists of an executable specification language and a set of analysis tools based on static analysis, testing, and model checking. The language enforces an analysis-friendly code structure and offers primitives that allow users to implement their own testers and model checkers directly in the language. We introduce a new approach to static analysis that takes advantage of the capabilities of a rule-based engine. We have applied the analysis-aware methodology to the development of a smart home application.
Resumo:
This work deals with two related areas: processing of visual information in the central nervous system, and the application of computer systems to research in neurophysiology.
Certain classes of interneurons in the brain and optic lobes of the blowfly Calliphora phaenicia were previously shown to be sensitive to the direction of motion of visual stimuli. These units were identified by visual field, preferred direction of motion, and anatomical location from which recorded. The present work is addressed to the questions: (1) is there interaction between pairs of these units, and (2) if such relationships can be found, what is their nature. To answer these questions, it is essential to record from two or more units simultaneously, and to use more than a single recording electrode if recording points are to be chosen independently. Accordingly, such techniques were developed and are described.
One must also have practical, convenient means for analyzing the large volumes of data so obtained. It is shown that use of an appropriately designed computer system is a profitable approach to this problem. Both hardware and software requirements for a suitable system are discussed and an approach to computer-aided data analysis developed. A description is given of members of a collection of application programs developed for analysis of neuro-physiological data and operated in the environment of and with support from an appropriate computer system. In particular, techniques developed for classification of multiple units recorded on the same electrode are illustrated as are methods for convenient graphical manipulation of data via a computer-driven display.
By means of multiple electrode techniques and the computer-aided data acquisition and analysis system, the path followed by one of the motion detection units was traced from open optic lobe through the brain and into the opposite lobe. It is further shown that this unit and its mirror image in the opposite lobe have a mutually inhibitory relationship. This relationship is investigated. The existence of interaction between other pairs of units is also shown. For pairs of units responding to motion in the same direction, the relationship is of an excitatory nature; for those responding to motion in opposed directions, it is inhibitory.
Experience gained from use of the computer system is discussed and a critical review of the current system is given. The most useful features of the system were found to be the fast response, the ability to go from one analysis technique to another rapidly and conveniently, and the interactive nature of the display system. The shortcomings of the system were problems in real-time use and the programming barrier—the fact that building new analysis techniques requires a high degree of programming knowledge and skill. It is concluded that computer system of the kind discussed will play an increasingly important role in studies of the central nervous system.