10 resultados para Data analysis system
em CaltechTHESIS
Resumo:
This thesis is an investigation into the nature of data analysis and computer software systems which support this activity.
The first chapter develops the notion of data analysis as an experimental science which has two major components: data-gathering and theory-building. The basic role of language in determining the meaningfulness of theory is stressed, and the informativeness of a language and data base pair is studied. The static and dynamic aspects of data analysis are then considered from this conceptual vantage point. The second chapter surveys the available types of computer systems which may be useful for data analysis. Particular attention is paid to the questions raised in the first chapter about the language restrictions imposed by the computer system and its dynamic properties.
The third chapter discusses the REL data analysis system, which was designed to satisfy the needs of the data analyzer in an operational relational data system. The major limitation on the use of such systems is the amount of access to data stored on a relatively slow secondary memory. This problem of the paging of data is investigated and two classes of data structure representations are found, each of which has desirable paging characteristics for certain types of queries. One representation is used by most of the generalized data base management systems in existence today, but the other is clearly preferred in the data analysis environment, as conceptualized in Chapter I.
This data representation has strong implications for a fundamental process of data analysis -- the quantification of variables. Since quantification is one of the few means of summarizing and abstracting, data analysis systems are under strong pressure to facilitate the process. Two implementations of quantification are studied: one analagous to the form of the lower predicate calculus and another more closely attuned to the data representation. A comparison of these indicates that the use of the "label class" method results in orders of magnitude improvement over the lower predicate calculus technique.
Resumo:
The brain is perhaps the most complex system to have ever been subjected to rigorous scientific investigation. The scale is staggering: over 10^11 neurons, each making an average of 10^3 synapses, with computation occurring on scales ranging from a single dendritic spine, to an entire cortical area. Slowly, we are beginning to acquire experimental tools that can gather the massive amounts of data needed to characterize this system. However, to understand and interpret these data will also require substantial strides in inferential and statistical techniques. This dissertation attempts to meet this need, extending and applying the modern tools of latent variable modeling to problems in neural data analysis.
It is divided into two parts. The first begins with an exposition of the general techniques of latent variable modeling. A new, extremely general, optimization algorithm is proposed - called Relaxation Expectation Maximization (REM) - that may be used to learn the optimal parameter values of arbitrary latent variable models. This algorithm appears to alleviate the common problem of convergence to local, sub-optimal, likelihood maxima. REM leads to a natural framework for model size selection; in combination with standard model selection techniques the quality of fits may be further improved, while the appropriate model size is automatically and efficiently determined. Next, a new latent variable model, the mixture of sparse hidden Markov models, is introduced, and approximate inference and learning algorithms are derived for it. This model is applied in the second part of the thesis.
The second part brings the technology of part I to bear on two important problems in experimental neuroscience. The first is known as spike sorting; this is the problem of separating the spikes from different neurons embedded within an extracellular recording. The dissertation offers the first thorough statistical analysis of this problem, which then yields the first powerful probabilistic solution. The second problem addressed is that of characterizing the distribution of spike trains recorded from the same neuron under identical experimental conditions. A latent variable model is proposed. Inference and learning in this model leads to new principled algorithms for smoothing and clustering of spike data.
Resumo:
In this work, we further extend the recently developed adaptive data analysis method, the Sparse Time-Frequency Representation (STFR) method. This method is based on the assumption that many physical signals inherently contain AM-FM representations. We propose a sparse optimization method to extract the AM-FM representations of such signals. We prove the convergence of the method for periodic signals under certain assumptions and provide practical algorithms specifically for the non-periodic STFR, which extends the method to tackle problems that former STFR methods could not handle, including stability to noise and non-periodic data analysis. This is a significant improvement since many adaptive and non-adaptive signal processing methods are not fully capable of handling non-periodic signals. Moreover, we propose a new STFR algorithm to study intrawave signals with strong frequency modulation and analyze the convergence of this new algorithm for periodic signals. Such signals have previously remained a bottleneck for all signal processing methods. Furthermore, we propose a modified version of STFR that facilitates the extraction of intrawaves that have overlaping frequency content. We show that the STFR methods can be applied to the realm of dynamical systems and cardiovascular signals. In particular, we present a simplified and modified version of the STFR algorithm that is potentially useful for the diagnosis of some cardiovascular diseases. We further explain some preliminary work on the nature of Intrinsic Mode Functions (IMFs) and how they can have different representations in different phase coordinates. This analysis shows that the uncertainty principle is fundamental to all oscillating signals.
Resumo:
Forced vibration field tests and finite element studies have been conducted on Morrow Point (arch) Dam in order to investigate dynamic dam-water interaction and water compressibility. Design of the data acquisition system incorporates several special features to retrieve both amplitude and phase of the response in a low signal to noise environment. These features contributed to the success of the experimental program which, for the first time, produced field evidence of water compressibility; this effect seems to play a significant role only in the symmetric response of Morrow Point Dam in the frequency range examined. In the accompanying analysis, frequency response curves for measured accelerations and water pressures as well as their resonating shapes are compared to predictions from the current state-of-the-art finite element model for which water compressibility is both included and neglected. Calibration of the numerical model employs the antisymmetric response data since they are only slightly affected by water compressibility, and, after calibration, good agreement to the data is obtained whether or not water compressibility is included. In the effort to reproduce the symmetric response data, on which water compressibility has a significant influence, the calibrated model shows better correlation when water compressibility is included, but the agreement is still inadequate. Similar results occur using data obtained previously by others at a low water level. A successful isolation of the fundamental water resonance from the experimental data shows significantly different features from those of the numerical water model, indicating possible inaccuracy in the assumed geometry and/or boundary conditions for the reservoir. However, the investigation does suggest possible directions in which the numerical model can be improved.
Resumo:
The cytochromes P450 (P450s) are a remarkable class of heme enzymes that catalyze the metabolism of xenobiotics and the biosynthesis of signaling molecules. Controlled electron flow into the thiolate-ligated heme active site allows P450s to activate molecular oxygen and hydroxylate aliphatic C–H bonds via the formation of high-valent metal-oxo intermediates (compounds I and II). Due to the reactive nature and short lifetimes of these intermediates, many of the fundamental steps in catalysis have not been observed directly. The Gray group and others have developed photochemical methods, known as “flash-quench,” for triggering electron transfer (ET) and generating redox intermediates in proteins in the absence of native ET partners. Photo-triggering affords a high degree of temporal precision for the gating of an ET event; the initial ET and subsequent reactions can be monitored on the nanosecond-to-second timescale using transient absorption (TA) spectroscopies. Chapter 1 catalogues critical aspects of P450 structure and mechanism, including the native pathway for formation of compound I, and outlines the development of photochemical processes that can be used to artificially trigger ET in proteins. Chapters 2 and 3 describe the development of these photochemical methods to establish electronic communication between a photosensitizer and the buried P450 heme. Chapter 2 describes the design and characterization of a Ru-P450-BM3 conjugate containing a ruthenium photosensitizer covalently tethered to the P450 surface, and nanosecond-to-second kinetics of the photo-triggered ET event are presented. By analyzing data at multiple wavelengths, we have identified the formation of multiple ET intermediates, including the catalytically relevant compound II; this intermediate is generated by oxidation of a bound water molecule in the ferric resting state enzyme. The work in Chapter 3 probes the role of a tryptophan residue situated between the photosensitizer and heme in the aforementioned Ru-P450 BM3 conjugate. Replacement of this tryptophan with histidine does not perturb the P450 structure, yet it completely eliminates the ET reactivity described in Chapter 2. The presence of an analogous tryptophan in Ru-P450 CYP119 conjugates also is necessary for observing oxidative ET, but the yield of heme oxidation is lower. Chapter 4 offers a basic description of the theoretical underpinnings required to analyze ET. Single-step ET theory is first presented, followed by extensions to multistep ET: electron “hopping.” The generation of “hopping maps” and use of a hopping map program to analyze the rate advantage of hopping over single-step ET is described, beginning with an established rhenium-tryptophan-azurin hopping system. This ET analysis is then applied to the Ru-tryptophan-P450 systems described in Chapter 2; this strongly supports the presence of hopping in Ru-P450 conjugates. Chapter 5 explores the implementation of flash-quench and other phototriggered methods to examine the native reductive ET and gas binding events that activate molecular oxygen. In particular, TA kinetics that demonstrate heme reduction on the microsecond timescale for four Ru-P450 conjugates are presented. In addition, we implement laser flash-photolysis of P450 ferrous–CO to study the rates of CO rebinding in the thermophilic P450 CYP119 at variable temperature. Chapter 6 describes the development and implementation of air-sensitive potentiometric redox titrations to determine the solution reduction potentials of a series of P450 BM3 mutants, which were designed for non-native cyclopropanation of styrene in vivo. An important conclusion from this work is that substitution of the axial cysteine for serine shifts the wild type reduction potential positive by 130 mV, facilitating reduction by biological redox cofactors in the presence of poorly-bound substrates. While this mutation abolishes oxygenation activity, these mutants are capable of catalyzing the cyclopropanation of styrene, even within the confines of an E. coli cell. Four appendices are also provided, including photochemical heme oxidation in ruthenium-modified nitric oxide synthase (Appendix A), general protocols (Appendix B), Chapter-specific notes (Appendix C) and Matlab scripts used for data analysis (Appendix D).
Resumo:
From studies of protoplanetary disks to extrasolar planets and planetary debris, we aim to understand the full evolution of a planetary system. Observational constraints from ground- and space-based instrumentation allows us to measure the properties of objects near and far and are central to developing this understanding. We present here three observational campaigns that, when combined with theoretical models, reveal characteristics of different stages and remnants of planet formation. The Kuiper Belt provides evidence of chemical and dynamical activity that reveals clues to its primordial environment and subsequent evolution. Large samples of this population can only be assembled at optical wavelengths, with thermal measurements at infrared and sub-mm wavelengths currently available for only the largest and closest bodies. We measure the size and shape of one particular object precisely here, in hopes of better understanding its unique dynamical history and layered composition.
Molecular organic chemistry is one of the most fundamental and widespread facets of the universe, and plays a key role in planet formation. A host of carbon-containing molecules vibrationally emit in the near-infrared when excited by warm gas, T~1000 K. The NIRSPEC instrument at the W.M. Keck Observatory is uniquely configured to study large ranges of this wavelength region at high spectral resolution. Using this facility we present studies of warm CO gas in protoplanetary disks, with a new code for precise excitation modeling. A parameterized suite of models demonstrates the abilities of the code and matches observational constraints such as line strength and shape. We use the models to probe various disk parameters as well, which are easily extensible to others with known disk emission spectra such as water, carbon dioxide, acetylene, and hydrogen cyanide.
Lastly, the existence of molecules in extrasolar planets can also be studied with NIRSPEC and reveals a great deal about the evolution of the protoplanetary gas. The species we observe in protoplanetary disks are also often present in exoplanet atmospheres, and are abundant in Earth's atmosphere as well. Thus, a sophisticated telluric removal code is necessary to analyze these high dynamic range, high-resolution spectra. We present observations of a hot Jupiter, revealing water in its atmosphere and demonstrating a new technique for exoplanet mass determination and atmospheric characterization. We will also be applying this atmospheric removal code to the aforementioned disk observations, to improve our data analysis and probe less abundant species. Guiding models using observations is the only way to develop an accurate understanding of the timescales and processes involved. The futures of the modeling and of the observations are bright, and the end goal of realizing a unified model of planet formation will require both theory and data, from a diverse collection of sources.
Resumo:
Planetary atmospheres exist in a seemingly endless variety of physical and chemical environments. There are an equally diverse number of methods by which we can study and characterize atmospheric composition. In order to better understand the fundamental chemistry and physical processes underlying all planetary atmospheres, my research of the past four years has focused on two distinct topics. First, I focused on the data analysis and spectral retrieval of observations obtained by the Ultraviolet Imaging Spectrograph (UVIS) instrument onboard the Cassini spacecraft while in orbit around Saturn. These observations consisted of stellar occultation measurements of Titan's upper atmosphere, probing the chemical composition in the region 300 to 1500 km above Titan's surface. I examined the relative abundances of Titan's two most prevalent chemical species, nitrogen and methane. I also focused on the aerosols that are formed through chemistry involving these two major species, and determined the vertical profiles of aerosol particles as a function of time and latitude. Moving beyond our own solar system, my second topic of investigation involved analysis of infra-red light curves from the Spitzer space telescope, obtained as it measured the light from stars hosting planets of their own. I focused on both transit and eclipse modeling during Spitzer data reduction and analysis. In my initial work, I utilized the data to search for transits of planets a few Earth masses in size. In more recent research, I analyzed secondary eclipses of three exoplanets and constrained the range of possible temperatures and compositions of their atmospheres.
Resumo:
Laser interferometer gravitational wave observatory (LIGO) consists of two complex large-scale laser interferometers designed for direct detection of gravitational waves from distant astrophysical sources in the frequency range 10Hz - 5kHz. Direct detection of space-time ripples will support Einstein's general theory of relativity and provide invaluable information and new insight into physics of the Universe.
Initial phase of LIGO started in 2002, and since then data was collected during six science runs. Instrument sensitivity was improving from run to run due to the effort of commissioning team. Initial LIGO has reached designed sensitivity during the last science run, which ended in October 2010.
In parallel with commissioning and data analysis with the initial detector, LIGO group worked on research and development of the next generation detectors. Major instrument upgrade from initial to advanced LIGO started in 2010 and lasted till 2014.
This thesis describes results of commissioning work done at LIGO Livingston site from 2013 until 2015 in parallel with and after the installation of the instrument. This thesis also discusses new techniques and tools developed at the 40m prototype including adaptive filtering, estimation of quantization noise in digital filters and design of isolation kits for ground seismometers.
The first part of this thesis is devoted to the description of methods for bringing interferometer to the linear regime when collection of data becomes possible. States of longitudinal and angular controls of interferometer degrees of freedom during lock acquisition process and in low noise configuration are discussed in details.
Once interferometer is locked and transitioned to low noise regime, instrument produces astrophysics data that should be calibrated to units of meters or strain. The second part of this thesis describes online calibration technique set up in both observatories to monitor the quality of the collected data in real time. Sensitivity analysis was done to understand and eliminate noise sources of the instrument.
Coupling of noise sources to gravitational wave channel can be reduced if robust feedforward and optimal feedback control loops are implemented. The last part of this thesis describes static and adaptive feedforward noise cancellation techniques applied to Advanced LIGO interferometers and tested at the 40m prototype. Applications of optimal time domain feedback control techniques and estimators to aLIGO control loops are also discussed.
Commissioning work is still ongoing at the sites. First science run of advanced LIGO is planned for September 2015 and will last for 3-4 months. This run will be followed by a set of small instrument upgrades that will be installed on a time scale of few months. Second science run will start in spring 2016 and last for about 6 months. Since current sensitivity of advanced LIGO is already more than factor of 3 higher compared to initial detectors and keeps improving on a monthly basis, upcoming science runs have a good chance for the first direct detection of gravitational waves.
Resumo:
This work deals with two related areas: processing of visual information in the central nervous system, and the application of computer systems to research in neurophysiology.
Certain classes of interneurons in the brain and optic lobes of the blowfly Calliphora phaenicia were previously shown to be sensitive to the direction of motion of visual stimuli. These units were identified by visual field, preferred direction of motion, and anatomical location from which recorded. The present work is addressed to the questions: (1) is there interaction between pairs of these units, and (2) if such relationships can be found, what is their nature. To answer these questions, it is essential to record from two or more units simultaneously, and to use more than a single recording electrode if recording points are to be chosen independently. Accordingly, such techniques were developed and are described.
One must also have practical, convenient means for analyzing the large volumes of data so obtained. It is shown that use of an appropriately designed computer system is a profitable approach to this problem. Both hardware and software requirements for a suitable system are discussed and an approach to computer-aided data analysis developed. A description is given of members of a collection of application programs developed for analysis of neuro-physiological data and operated in the environment of and with support from an appropriate computer system. In particular, techniques developed for classification of multiple units recorded on the same electrode are illustrated as are methods for convenient graphical manipulation of data via a computer-driven display.
By means of multiple electrode techniques and the computer-aided data acquisition and analysis system, the path followed by one of the motion detection units was traced from open optic lobe through the brain and into the opposite lobe. It is further shown that this unit and its mirror image in the opposite lobe have a mutually inhibitory relationship. This relationship is investigated. The existence of interaction between other pairs of units is also shown. For pairs of units responding to motion in the same direction, the relationship is of an excitatory nature; for those responding to motion in opposed directions, it is inhibitory.
Experience gained from use of the computer system is discussed and a critical review of the current system is given. The most useful features of the system were found to be the fast response, the ability to go from one analysis technique to another rapidly and conveniently, and the interactive nature of the display system. The shortcomings of the system were problems in real-time use and the programming barrier—the fact that building new analysis techniques requires a high degree of programming knowledge and skill. It is concluded that computer system of the kind discussed will play an increasingly important role in studies of the central nervous system.
Resumo:
STEEL, the Caltech created nonlinear large displacement analysis software, is currently used by a large number of researchers at Caltech. However, due to its complexity, lack of visualization tools (such as pre- and post-processing capabilities) rapid creation and analysis of models using this software was difficult. SteelConverter was created as a means to facilitate model creation through the use of the industry standard finite element solver ETABS. This software allows users to create models in ETABS and intelligently convert model information such as geometry, loading, releases, fixity, etc., into a format that STEEL understands. Models that would take several days to create and verify now take several hours or less. The productivity of the researcher as well as the level of confidence in the model being analyzed is greatly increased.
It has always been a major goal of Caltech to spread the knowledge created here to other universities. However, due to the complexity of STEEL it was difficult for researchers or engineers from other universities to conduct analyses. While SteelConverter did help researchers at Caltech improve their research, sending SteelConverter and its documentation to other universities was less than ideal. Issues of version control, individual computer requirements, and the difficulty of releasing updates made a more centralized solution preferred. This is where the idea for Caltech VirtualShaker was born. Through the creation of a centralized website where users could log in, submit, analyze, and process models in the cloud, all of the major concerns associated with the utilization of SteelConverter were eliminated. Caltech VirtualShaker allows users to create profiles where defaults associated with their most commonly run models are saved, and allows them to submit multiple jobs to an online virtual server to be analyzed and post-processed. The creation of this website not only allowed for more rapid distribution of this tool, but also created a means for engineers and researchers with no access to powerful computer clusters to run computationally intensive analyses without the excessive cost of building and maintaining a computer cluster.
In order to increase confidence in the use of STEEL as an analysis system, as well as verify the conversion tools, a series of comparisons were done between STEEL and ETABS. Six models of increasing complexity, ranging from a cantilever column to a twenty-story moment frame, were analyzed to determine the ability of STEEL to accurately calculate basic model properties such as elastic stiffness and damping through a free vibration analysis as well as more complex structural properties such as overall structural capacity through a pushover analysis. These analyses showed a very strong agreement between the two softwares on every aspect of each analysis. However, these analyses also showed the ability of the STEEL analysis algorithm to converge at significantly larger drifts than ETABS when using the more computationally expensive and structurally realistic fiber hinges. Following the ETABS analysis, it was decided to repeat the comparisons in a software more capable of conducting highly nonlinear analysis, called Perform. These analyses again showed a very strong agreement between the two softwares in every aspect of each analysis through instability. However, due to some limitations in Perform, free vibration analyses for the three story one bay chevron brace frame, two bay chevron brace frame, and twenty story moment frame could not be conducted. With the current trend towards ultimate capacity analysis, the ability to use fiber based models allows engineers to gain a better understanding of a building’s behavior under these extreme load scenarios.
Following this, a final study was done on Hall’s U20 structure [1] where the structure was analyzed in all three softwares and their results compared. The pushover curves from each software were compared and the differences caused by variations in software implementation explained. From this, conclusions can be drawn on the effectiveness of each analysis tool when attempting to analyze structures through the point of geometric instability. The analyses show that while ETABS was capable of accurately determining the elastic stiffness of the model, following the onset of inelastic behavior the analysis tool failed to converge. However, for the small number of time steps the ETABS analysis was converging, its results exactly matched those of STEEL, leading to the conclusion that ETABS is not an appropriate analysis package for analyzing a structure through the point of collapse when using fiber elements throughout the model. The analyses also showed that while Perform was capable of calculating the response of the structure accurately, restrictions in the material model resulted in a pushover curve that did not match that of STEEL exactly, particularly post collapse. However, such problems could be alleviated by choosing a more simplistic material model.