7 resultados para Common data environment
em CaltechTHESIS
Resumo:
The brain is perhaps the most complex system to have ever been subjected to rigorous scientific investigation. The scale is staggering: over 10^11 neurons, each making an average of 10^3 synapses, with computation occurring on scales ranging from a single dendritic spine, to an entire cortical area. Slowly, we are beginning to acquire experimental tools that can gather the massive amounts of data needed to characterize this system. However, to understand and interpret these data will also require substantial strides in inferential and statistical techniques. This dissertation attempts to meet this need, extending and applying the modern tools of latent variable modeling to problems in neural data analysis.
It is divided into two parts. The first begins with an exposition of the general techniques of latent variable modeling. A new, extremely general, optimization algorithm is proposed - called Relaxation Expectation Maximization (REM) - that may be used to learn the optimal parameter values of arbitrary latent variable models. This algorithm appears to alleviate the common problem of convergence to local, sub-optimal, likelihood maxima. REM leads to a natural framework for model size selection; in combination with standard model selection techniques the quality of fits may be further improved, while the appropriate model size is automatically and efficiently determined. Next, a new latent variable model, the mixture of sparse hidden Markov models, is introduced, and approximate inference and learning algorithms are derived for it. This model is applied in the second part of the thesis.
The second part brings the technology of part I to bear on two important problems in experimental neuroscience. The first is known as spike sorting; this is the problem of separating the spikes from different neurons embedded within an extracellular recording. The dissertation offers the first thorough statistical analysis of this problem, which then yields the first powerful probabilistic solution. The second problem addressed is that of characterizing the distribution of spike trains recorded from the same neuron under identical experimental conditions. A latent variable model is proposed. Inference and learning in this model leads to new principled algorithms for smoothing and clustering of spike data.
Resumo:
With data centers being the supporting infrastructure for a wide range of IT services, their efficiency has become a big concern to operators, as well as to society, for both economic and environmental reasons. The goal of this thesis is to design energy-efficient algorithms that reduce energy cost while minimizing compromise to service. We focus on the algorithmic challenges at different levels of energy optimization across the data center stack. The algorithmic challenge at the device level is to improve the energy efficiency of a single computational device via techniques such as job scheduling and speed scaling. We analyze the common speed scaling algorithms in both the worst-case model and stochastic model to answer some fundamental issues in the design of speed scaling algorithms. The algorithmic challenge at the local data center level is to dynamically allocate resources (e.g., servers) and to dispatch the workload in a data center. We develop an online algorithm to make a data center more power-proportional by dynamically adapting the number of active servers. The algorithmic challenge at the global data center level is to dispatch the workload across multiple data centers, considering the geographical diversity of electricity price, availability of renewable energy, and network propagation delay. We propose algorithms to jointly optimize routing and provisioning in an online manner. Motivated by the above online decision problems, we move on to study a general class of online problem named "smoothed online convex optimization", which seeks to minimize the sum of a sequence of convex functions when "smooth" solutions are preferred. This model allows us to bridge different research communities and help us get a more fundamental understanding of general online decision problems.
Resumo:
High-resolution orbital and in situ observations acquired of the Martian surface during the past two decades provide the opportunity to study the rock record of Mars at an unprecedented level of detail. This dissertation consists of four studies whose common goal is to establish new standards for the quantitative analysis of visible and near-infrared data from the surface of Mars. Through the compilation of global image inventories, application of stratigraphic and sedimentologic statistical methods, and use of laboratory analogs, this dissertation provides insight into the history of past depositional and diagenetic processes on Mars. The first study presents a global inventory of stratified deposits observed in images from the High Resolution Image Science Experiment (HiRISE) camera on-board the Mars Reconnaissance Orbiter. This work uses the widespread coverage of high-resolution orbital images to make global-scale observations about the processes controlling sediment transport and deposition on Mars. The next chapter presents a study of bed thickness distributions in Martian sedimentary deposits, showing how statistical methods can be used to establish quantitative criteria for evaluating the depositional history of stratified deposits observed in orbital images. The third study tests the ability of spectral mixing models to obtain quantitative mineral abundances from near-infrared reflectance spectra of clay and sulfate mixtures in the laboratory for application to the analysis of orbital spectra of sedimentary deposits on Mars. The final study employs a statistical analysis of the size, shape, and distribution of nodules observed by the Mars Science Laboratory Curiosity rover team in the Sheepbed mudstone at Yellowknife Bay in Gale crater. This analysis is used to evaluate hypotheses for nodule formation and to gain insight into the diagenetic history of an ancient habitable environment on Mars.
Resumo:
Complexity in the earthquake rupture process can result from many factors. This study investigates the origin of such complexity by examining several recent, large earthquakes in detail. In each case the local tectonic environment plays an important role in understanding the source of the complexity.
Several large shallow earthquakes (Ms > 7.0) along the Middle American Trench have similarities and differences between them that may lead to a better understanding of fracture and subduction processes. They are predominantly thrust events consistent with the known subduction of the Cocos plate beneath N. America. Two events occurring along this subduction zone close to triple junctions show considerable complexity. This may be attributable to a more heterogeneous stress environment in these regions and as such has implications for other subduction zone boundaries.
An event which looks complex but is actually rather simple is the 1978 Bermuda earthquake (Ms ~ 6). It is located predominantly in the mantle. Its mechanism is one of pure thrust faulting with a strike N 20°W and dip 42°NE. Its apparent complexity is caused by local crustal structure. This is an important event in terms of understanding and estimating seismic hazard on the eastern seaboard of N. America.
A study of several large strike-slip continental earthquakes identifies characteristics which are common to them and may be useful in determining what to expect from the next great earthquake on the San Andreas fault. The events are the 1976 Guatemala earthquake on the Motagua fault and two events on the Anatolian fault in Turkey (the 1967, Mudurnu Valley and 1976, E. Turkey events). An attempt to model the complex P-waveforms of these events results in good synthetic fits for the Guatemala and Mudurnu Valley events. However, the E. Turkey event proves to be too complex as it may have associated thrust or normal faulting. Several individual sources occurring at intervals of between 5 and 20 seconds characterize the Guatemala and Mudurnu Valley events. The maximum size of an individual source appears to be bounded at about 5 x 1026 dyne-cm. A detailed source study including directivity is performed on the Guatemala event. The source time history of the Mudurnu Valley event illustrates its significance in modeling strong ground motion in the near field. The complex source time series of the 1967 event produces amplitudes greater by a factor of 2.5 than a uniform model scaled to the same size for a station 20 km from the fault.
Three large and important earthquakes demonstrate an important type of complexity --- multiple-fault complexity. The first, the 1976 Philippine earthquake, an oblique thrust event, represents the first seismological evidence for a northeast dipping subduction zone beneath the island of Mindanao. A large event, following the mainshock by 12 hours, occurred outside the aftershock area and apparently resulted from motion on a subsidiary fault since the event had a strike-slip mechanism.
An aftershock of the great 1960 Chilean earthquake on June 6, 1960, proved to be an interesting discovery. It appears to be a large strike-slip event at the main rupture's southern boundary. It most likely occurred on the landward extension of the Chile Rise transform fault, in the subducting plate. The results for this event suggest that a small event triggered a series of slow events; the duration of the whole sequence being longer than 1 hour. This is indeed a "slow earthquake".
Perhaps one of the most complex of events is the recent Tangshan, China event. It began as a large strike-slip event. Within several seconds of the mainshock it may have triggered thrust faulting to the south of the epicenter. There is no doubt, however, that it triggered a large oblique normal event to the northeast, 15 hours after the mainshock. This event certainly contributed to the great loss of life-sustained as a result of the Tangshan earthquake sequence.
What has been learned from these studies has been applied to predict what one might expect from the next great earthquake on the San Andreas. The expectation from this study is that such an event would be a large complex event, not unlike, but perhaps larger than, the Guatemala or Mudurnu Valley events. That is to say, it will most likely consist of a series of individual events in sequence. It is also quite possible that the event could trigger associated faulting on neighboring fault systems such as those occurring in the Transverse Ranges. This has important bearing on the earthquake hazard estimation for the region.
Resumo:
The following work explores the processes individuals utilize when making multi-attribute choices. With the exception of extremely simple or familiar choices, most decisions we face can be classified as multi-attribute choices. In order to evaluate and make choices in such an environment, we must be able to estimate and weight the particular attributes of an option. Hence, better understanding the mechanisms involved in this process is an important step for economists and psychologists. For example, when choosing between two meals that differ in taste and nutrition, what are the mechanisms that allow us to estimate and then weight attributes when constructing value? Furthermore, how can these mechanisms be influenced by variables such as attention or common physiological states, like hunger?
In order to investigate these and similar questions, we use a combination of choice and attentional data, where the attentional data was collected by recording eye movements as individuals made decisions. Chapter 1 designs and tests a neuroeconomic model of multi-attribute choice that makes predictions about choices, response time, and how these variables are correlated with attention. Chapter 2 applies the ideas in this model to intertemporal decision-making, and finds that attention causally affects discount rates. Chapter 3 explores how hunger, a common physiological state, alters the mechanisms we utilize as we make simple decisions about foods.
Resumo:
Fast radio bursts (FRBs), a novel type of radio pulse, whose physics is not yet understood at all. Only a handful of FRBs had been detected when we started this project. Taking account of the scant observations, we put physical constraints on FRBs. We excluded proposals of a galactic origin for their extraordinarily high dispersion measures (DM), in particular stellar coronas and HII regions. Therefore our work supports an extragalactic origin for FRBs. We show that the resolved scattering tail of FRB 110220 is unlikely to be due to propagation through the intergalactic plasma. Instead the scattering is probably caused by the interstellar medium in the FRB's host galaxy, and indicates that this burst sits in the central region of that galaxy. Pulse durations of order $\ms$ constrain source sizes of FRBs implying enormous brightness temperatures and thus coherent emission. Electric fields near FRBs at cosmological distances would be so strong that they could accelerate free electrons from rest to relativistic energies in a single wave period. When we worked on FRBs, it was unclear whether they were genuine astronomical signals as distinct from `perytons', clearly terrestrial radio bursts, sharing some common properties with FRBs. Recently, in April 2015, astronomers discovered that perytons were emitted by microwave ovens. Radio chirps similar to FRBs were emitted when their doors opened while they were still heating. Evidence for the astronomical nature of FRBs has strengthened since our paper was published. Some bursts have been found to show linear and circular polarizations and Faraday rotation of the linear polarization has also been detected. I hope to resume working on FRBs in the near future. But after we completed our FRB paper, I decided to pause this project because of the lack of observational constraints.
The pulsar triple system, J0733+1715, has its orbital parameters fitted to high accuracy owing to the precise timing of the central $\ms$ pulsar. The two orbits are highly hierarchical, namely $P_{\mathrm{orb,1}}\ll P_{\mathrm{orb,2}}$, where 1 and 2 label the inner and outer white dwarf (WD) companions respectively. Moreover, their orbital planes almost coincide, providing a unique opportunity to study secular interaction associated purely with eccentricity beyond the solar system. Secular interaction only involves effect averaged over many orbits. Thus each companion can be represented by an elliptical wire with its mass distributed inversely proportional to its local orbital speed. Generally there exists a mutual torque, which vanishes only when their apsidal lines are parallel or anti-parallel. To maintain either mode, the eccentricity ratio, $e_1/e_2$, must be of the proper value, so that both apsidal lines precess together. For J0733+1715, $e_1\ll e_2$ for the parallel mode, while $e_1\gg e_2$ for the anti-parallel one. We show that the former precesses $\sim 10$ times slower than the latter. Currently the system is dominated by the parallel mode. Although only a little anti-parallel mode survives, both eccentricities especially $e_1$ oscillate on $\sim 10^3\yr$ timescale. Detectable changes would occur within $\sim 1\yr$. We demonstrate that the anti-parallel mode gets damped $\sim 10^4$ times faster than its parallel brother by any dissipative process diminishing $e_1$. If it is the tidal damping in the inner WD, we proceed to estimate its tidal quantity parameter ($Q$) to be $\sim 10^6$, which was poorly constrained by observations. However, tidal damping may also happen during the preceding low-mass X-ray binary (LMXB) phase or hydrogen thermal nuclear flashes. But, in both cases, the inner companion fills its Roche lobe and probably suffers mass/angular momentum loss, which might cause $e_1$ to grow rather than decay.
Several pairs of solar system satellites occupy mean motion resonances (MMRs). We divide these into two groups according to their proximity to exact resonance. Proximity is measured by the existence of a separatrix in phase space. MMRs between Io-Europa, Europa-Ganymede and Enceladus-Dione are too distant from exact resonance for a separatrix to appear. A separatrix is present only in the phase spaces of the Mimas-Tethys and Titan-Hyperion MMRs and their resonant arguments are the only ones to exhibit substantial librations. When a separatrix is present, tidal damping of eccentricity or inclination excites overstable librations that can lead to passage through resonance on the damping timescale. However, after investigation, we conclude that the librations in the Mimas-Tethys and Titan-Hyperion MMRs are fossils and do not result from overstability.
Rubble piles are common in the solar system. Monolithic elements touch their neighbors in small localized areas. Voids occupy a significant fraction of the volume. In a fluid-free environment, heat cannot conduct through voids; only radiation can transfer energy across them. We model the effective thermal conductivity of a rubble pile and show that it is proportional the square root of the pressure, $P$, for $P\leq \epsy^3\mu$ where $\epsy$ is the material's yield strain and $\mu$ its shear modulus. Our model provides an excellent fit to the depth dependence of the thermal conductivity in the top $140\,\mathrm{cm}$ of the lunar regolith. It also offers an explanation for the low thermal inertias of rocky asteroids and icy satellites. Lastly, we discuss how rubble piles slow down the cooling of small bodies such as asteroids.
Electromagnetic (EM) follow-up observations of gravitational wave (GW) events will help shed light on the nature of the sources, and more can be learned if the EM follow-ups can start as soon as the GW event becomes observable. In this paper, we propose a computationally efficient time-domain algorithm capable of detecting gravitational waves (GWs) from coalescing binaries of compact objects with nearly zero time delay. In case when the signal is strong enough, our algorithm also has the flexibility to trigger EM observation {\it before} the merger. The key to the efficiency of our algorithm arises from the use of chains of so-called Infinite Impulse Response (IIR) filters, which filter time-series data recursively. Computational cost is further reduced by a template interpolation technique that requires filtering to be done only for a much coarser template bank than otherwise required to sufficiently recover optimal signal-to-noise ratio. Towards future detectors with sensitivity extending to lower frequencies, our algorithm's computational cost is shown to increase rather insignificantly compared to the conventional time-domain correlation method. Moreover, at latencies of less than hundreds to thousands of seconds, this method is expected to be computationally more efficient than the straightforward frequency-domain method.
Resumo:
This thesis is an investigation into the nature of data analysis and computer software systems which support this activity.
The first chapter develops the notion of data analysis as an experimental science which has two major components: data-gathering and theory-building. The basic role of language in determining the meaningfulness of theory is stressed, and the informativeness of a language and data base pair is studied. The static and dynamic aspects of data analysis are then considered from this conceptual vantage point. The second chapter surveys the available types of computer systems which may be useful for data analysis. Particular attention is paid to the questions raised in the first chapter about the language restrictions imposed by the computer system and its dynamic properties.
The third chapter discusses the REL data analysis system, which was designed to satisfy the needs of the data analyzer in an operational relational data system. The major limitation on the use of such systems is the amount of access to data stored on a relatively slow secondary memory. This problem of the paging of data is investigated and two classes of data structure representations are found, each of which has desirable paging characteristics for certain types of queries. One representation is used by most of the generalized data base management systems in existence today, but the other is clearly preferred in the data analysis environment, as conceptualized in Chapter I.
This data representation has strong implications for a fundamental process of data analysis -- the quantification of variables. Since quantification is one of the few means of summarizing and abstracting, data analysis systems are under strong pressure to facilitate the process. Two implementations of quantification are studied: one analagous to the form of the lower predicate calculus and another more closely attuned to the data representation. A comparison of these indicates that the use of the "label class" method results in orders of magnitude improvement over the lower predicate calculus technique.