4 resultados para big data storage

em CaltechTHESIS


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Storage systems are widely used and have played a crucial rule in both consumer and industrial products, for example, personal computers, data centers, and embedded systems. However, such system suffers from issues of cost, restricted-lifetime, and reliability with the emergence of new systems and devices, such as distributed storage and flash memory, respectively. Information theory, on the other hand, provides fundamental bounds and solutions to fully utilize resources such as data density, information I/O and network bandwidth. This thesis bridges these two topics, and proposes to solve challenges in data storage using a variety of coding techniques, so that storage becomes faster, more affordable, and more reliable.

We consider the system level and study the integration of RAID schemes and distributed storage. Erasure-correcting codes are the basis of the ubiquitous RAID schemes for storage systems, where disks correspond to symbols in the code and are located in a (distributed) network. Specifically, RAID schemes are based on MDS (maximum distance separable) array codes that enable optimal storage and efficient encoding and decoding algorithms. With r redundancy symbols an MDS code can sustain r erasures. For example, consider an MDS code that can correct two erasures. It is clear that when two symbols are erased, one needs to access and transmit all the remaining information to rebuild the erasures. However, an interesting and practical question is: What is the smallest fraction of information that one needs to access and transmit in order to correct a single erasure? In Part I we will show that the lower bound of 1/2 is achievable and that the result can be generalized to codes with arbitrary number of parities and optimal rebuilding.

We consider the device level and study coding and modulation techniques for emerging non-volatile memories such as flash memory. In particular, rank modulation is a novel data representation scheme proposed by Jiang et al. for multi-level flash memory cells, in which a set of n cells stores information in the permutation induced by the different charge levels of the individual cells. It eliminates the need for discrete cell levels, as well as overshoot errors, when programming cells. In order to decrease the decoding complexity, we propose two variations of this scheme in Part II: bounded rank modulation where only small sliding windows of cells are sorted to generated permutations, and partial rank modulation where only part of the n cells are used to represent data. We study limits on the capacity of bounded rank modulation and propose encoding and decoding algorithms. We show that overlaps between windows will increase capacity. We present Gray codes spanning all possible partial-rank states and using only ``push-to-the-top'' operations. These Gray codes turn out to solve an open combinatorial problem called universal cycle, which is a sequence of integers generating all possible partial permutations.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The work presented in this thesis revolves around erasure correction coding, as applied to distributed data storage and real-time streaming communications.

First, we examine the problem of allocating a given storage budget over a set of nodes for maximum reliability. The objective is to find an allocation of the budget that maximizes the probability of successful recovery by a data collector accessing a random subset of the nodes. This optimization problem is challenging in general because of its combinatorial nature, despite its simple formulation. We study several variations of the problem, assuming different allocation models and access models, and determine the optimal allocation and the optimal symmetric allocation (in which all nonempty nodes store the same amount of data) for a variety of cases. Although the optimal allocation can have nonintuitive structure and can be difficult to find in general, our results suggest that, as a simple heuristic, reliable storage can be achieved by spreading the budget maximally over all nodes when the budget is large, and spreading it minimally over a few nodes when it is small. Coding would therefore be beneficial in the former case, while uncoded replication would suffice in the latter case.

Second, we study how distributed storage allocations affect the recovery delay in a mobile setting. Specifically, two recovery delay optimization problems are considered for a network of mobile storage nodes: the maximization of the probability of successful recovery by a given deadline, and the minimization of the expected recovery delay. We show that the first problem is closely related to the earlier allocation problem, and solve the second problem completely for the case of symmetric allocations. It turns out that the optimal allocations for the two problems can be quite different. In a simulation study, we evaluated the performance of a simple data dissemination and storage protocol for mobile delay-tolerant networks, and observed that the choice of allocation can have a significant impact on the recovery delay under a variety of scenarios.

Third, we consider a real-time streaming system where messages created at regular time intervals at a source are encoded for transmission to a receiver over a packet erasure link; the receiver must subsequently decode each message within a given delay from its creation time. For erasure models containing a limited number of erasures per coding window, per sliding window, and containing erasure bursts whose maximum length is sufficiently short or long, we show that a time-invariant intrasession code asymptotically achieves the maximum message size among all codes that allow decoding under all admissible erasure patterns. For the bursty erasure model, we also show that diagonally interleaved codes derived from specific systematic block codes are asymptotically optimal over all codes in certain cases. We also study an i.i.d. erasure model in which each transmitted packet is erased independently with the same probability; the objective is to maximize the decoding probability for a given message size. We derive an upper bound on the decoding probability for any time-invariant code, and show that the gap between this bound and the performance of a family of time-invariant intrasession codes is small when the message size and packet erasure probability are small. In a simulation study, these codes performed well against a family of random time-invariant convolutional codes under a number of scenarios.

Finally, we consider the joint problems of routing and caching for named data networking. We propose a backpressure-based policy that employs virtual interest packets to make routing and caching decisions. In a packet-level simulation, the proposed policy outperformed a basic protocol that combines shortest-path routing with least-recently-used (LRU) cache replacement.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Today our understanding of the vibrational thermodynamics of materials at low temperatures is emerging nicely, based on the harmonic model in which phonons are independent. At high temperatures, however, this understanding must accommodate how phonons interact with other phonons or with other excitations. We shall see that the phonon-phonon interactions give rise to interesting coupling problems, and essentially modify the equilibrium and non-equilibrium properties of materials, e.g., thermodynamic stability, heat capacity, optical properties and thermal transport of materials. Despite its great importance, to date the anharmonic lattice dynamics is poorly understood and most studies on lattice dynamics still rely on the harmonic or quasiharmonic models. There have been very few studies on the pure phonon anharmonicity and phonon-phonon interactions. The work presented in this thesis is devoted to the development of experimental and computational methods on this subject.

Modern inelastic scattering techniques with neutrons or photons are ideal for sorting out the anharmonic contribution. Analysis of the experimental data can generate vibrational spectra of the materials, i.e., their phonon densities of states or phonon dispersion relations. We obtained high quality data from laser Raman spectrometer, Fourier transform infrared spectrometer and inelastic neutron spectrometer. With accurate phonon spectra data, we obtained the energy shifts and lifetime broadenings of the interacting phonons, and the vibrational entropies of different materials. The understanding of them then relies on the development of the fundamental theories and the computational methods.

We developed an efficient post-processor for analyzing the anharmonic vibrations from the molecular dynamics (MD) calculations. Currently, most first principles methods are not capable of dealing with strong anharmonicity, because the interactions of phonons are ignored at finite temperatures. Our method adopts the Fourier transformed velocity autocorrelation method to handle the big data of time-dependent atomic velocities from MD calculations, and efficiently reconstructs the phonon DOS and phonon dispersion relations. Our calculations can reproduce the phonon frequency shifts and lifetime broadenings very well at various temperatures.

To understand non-harmonic interactions in a microscopic way, we have developed a numerical fitting method to analyze the decay channels of phonon-phonon interactions. Based on the quantum perturbation theory of many-body interactions, this method is used to calculate the three-phonon and four-phonon kinematics subject to the conservation of energy and momentum, taking into account the weight of phonon couplings. We can assess the strengths of phonon-phonon interactions of different channels and anharmonic orders with the calculated two-phonon DOS. This method, with high computational efficiency, is a promising direction to advance our understandings of non-harmonic lattice dynamics and thermal transport properties.

These experimental techniques and theoretical methods have been successfully performed in the study of anharmonic behaviors of metal oxides, including rutile and cuprite stuctures, and will be discussed in detail in Chapters 4 to 6. For example, for rutile titanium dioxide (TiO2), we found that the anomalous anharmonic behavior of the B1g mode can be explained by the volume effects on quasiharmonic force constants, and by the explicit cubic and quartic anharmonicity. For rutile tin dioxide (SnO2), the broadening of the B2g mode with temperature showed an unusual concave downwards curvature. This curvature was caused by a change with temperature in the number of down-conversion decay channels, originating with the wide band gap in the phonon dispersions. For silver oxide (Ag2O), strong anharmonic effects were found for both phonons and for the negative thermal expansion.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With data centers being the supporting infrastructure for a wide range of IT services, their efficiency has become a big concern to operators, as well as to society, for both economic and environmental reasons. The goal of this thesis is to design energy-efficient algorithms that reduce energy cost while minimizing compromise to service. We focus on the algorithmic challenges at different levels of energy optimization across the data center stack. The algorithmic challenge at the device level is to improve the energy efficiency of a single computational device via techniques such as job scheduling and speed scaling. We analyze the common speed scaling algorithms in both the worst-case model and stochastic model to answer some fundamental issues in the design of speed scaling algorithms. The algorithmic challenge at the local data center level is to dynamically allocate resources (e.g., servers) and to dispatch the workload in a data center. We develop an online algorithm to make a data center more power-proportional by dynamically adapting the number of active servers. The algorithmic challenge at the global data center level is to dispatch the workload across multiple data centers, considering the geographical diversity of electricity price, availability of renewable energy, and network propagation delay. We propose algorithms to jointly optimize routing and provisioning in an online manner. Motivated by the above online decision problems, we move on to study a general class of online problem named "smoothed online convex optimization", which seeks to minimize the sum of a sequence of convex functions when "smooth" solutions are preferred. This model allows us to bridge different research communities and help us get a more fundamental understanding of general online decision problems.