10 resultados para Data Storage Solutions
em CaltechTHESIS
Resumo:
Storage systems are widely used and have played a crucial rule in both consumer and industrial products, for example, personal computers, data centers, and embedded systems. However, such system suffers from issues of cost, restricted-lifetime, and reliability with the emergence of new systems and devices, such as distributed storage and flash memory, respectively. Information theory, on the other hand, provides fundamental bounds and solutions to fully utilize resources such as data density, information I/O and network bandwidth. This thesis bridges these two topics, and proposes to solve challenges in data storage using a variety of coding techniques, so that storage becomes faster, more affordable, and more reliable.
We consider the system level and study the integration of RAID schemes and distributed storage. Erasure-correcting codes are the basis of the ubiquitous RAID schemes for storage systems, where disks correspond to symbols in the code and are located in a (distributed) network. Specifically, RAID schemes are based on MDS (maximum distance separable) array codes that enable optimal storage and efficient encoding and decoding algorithms. With r redundancy symbols an MDS code can sustain r erasures. For example, consider an MDS code that can correct two erasures. It is clear that when two symbols are erased, one needs to access and transmit all the remaining information to rebuild the erasures. However, an interesting and practical question is: What is the smallest fraction of information that one needs to access and transmit in order to correct a single erasure? In Part I we will show that the lower bound of 1/2 is achievable and that the result can be generalized to codes with arbitrary number of parities and optimal rebuilding.
We consider the device level and study coding and modulation techniques for emerging non-volatile memories such as flash memory. In particular, rank modulation is a novel data representation scheme proposed by Jiang et al. for multi-level flash memory cells, in which a set of n cells stores information in the permutation induced by the different charge levels of the individual cells. It eliminates the need for discrete cell levels, as well as overshoot errors, when programming cells. In order to decrease the decoding complexity, we propose two variations of this scheme in Part II: bounded rank modulation where only small sliding windows of cells are sorted to generated permutations, and partial rank modulation where only part of the n cells are used to represent data. We study limits on the capacity of bounded rank modulation and propose encoding and decoding algorithms. We show that overlaps between windows will increase capacity. We present Gray codes spanning all possible partial-rank states and using only ``push-to-the-top'' operations. These Gray codes turn out to solve an open combinatorial problem called universal cycle, which is a sequence of integers generating all possible partial permutations.
Resumo:
The work presented in this thesis revolves around erasure correction coding, as applied to distributed data storage and real-time streaming communications.
First, we examine the problem of allocating a given storage budget over a set of nodes for maximum reliability. The objective is to find an allocation of the budget that maximizes the probability of successful recovery by a data collector accessing a random subset of the nodes. This optimization problem is challenging in general because of its combinatorial nature, despite its simple formulation. We study several variations of the problem, assuming different allocation models and access models, and determine the optimal allocation and the optimal symmetric allocation (in which all nonempty nodes store the same amount of data) for a variety of cases. Although the optimal allocation can have nonintuitive structure and can be difficult to find in general, our results suggest that, as a simple heuristic, reliable storage can be achieved by spreading the budget maximally over all nodes when the budget is large, and spreading it minimally over a few nodes when it is small. Coding would therefore be beneficial in the former case, while uncoded replication would suffice in the latter case.
Second, we study how distributed storage allocations affect the recovery delay in a mobile setting. Specifically, two recovery delay optimization problems are considered for a network of mobile storage nodes: the maximization of the probability of successful recovery by a given deadline, and the minimization of the expected recovery delay. We show that the first problem is closely related to the earlier allocation problem, and solve the second problem completely for the case of symmetric allocations. It turns out that the optimal allocations for the two problems can be quite different. In a simulation study, we evaluated the performance of a simple data dissemination and storage protocol for mobile delay-tolerant networks, and observed that the choice of allocation can have a significant impact on the recovery delay under a variety of scenarios.
Third, we consider a real-time streaming system where messages created at regular time intervals at a source are encoded for transmission to a receiver over a packet erasure link; the receiver must subsequently decode each message within a given delay from its creation time. For erasure models containing a limited number of erasures per coding window, per sliding window, and containing erasure bursts whose maximum length is sufficiently short or long, we show that a time-invariant intrasession code asymptotically achieves the maximum message size among all codes that allow decoding under all admissible erasure patterns. For the bursty erasure model, we also show that diagonally interleaved codes derived from specific systematic block codes are asymptotically optimal over all codes in certain cases. We also study an i.i.d. erasure model in which each transmitted packet is erased independently with the same probability; the objective is to maximize the decoding probability for a given message size. We derive an upper bound on the decoding probability for any time-invariant code, and show that the gap between this bound and the performance of a family of time-invariant intrasession codes is small when the message size and packet erasure probability are small. In a simulation study, these codes performed well against a family of random time-invariant convolutional codes under a number of scenarios.
Finally, we consider the joint problems of routing and caching for named data networking. We propose a backpressure-based policy that employs virtual interest packets to make routing and caching decisions. In a packet-level simulation, the proposed policy outperformed a basic protocol that combines shortest-path routing with least-recently-used (LRU) cache replacement.
Resumo:
Technology scaling has enabled drastic growth in the computational and storage capacity of integrated circuits (ICs). This constant growth drives an increasing demand for high-bandwidth communication between and within ICs. In this dissertation we focus on low-power solutions that address this demand. We divide communication links into three subcategories depending on the communication distance. Each category has a different set of challenges and requirements and is affected by CMOS technology scaling in a different manner. We start with short-range chip-to-chip links for board-level communication. Next we will discuss board-to-board links, which demand a longer communication range. Finally on-chip links with communication ranges of a few millimeters are discussed.
Electrical signaling is a natural choice for chip-to-chip communication due to efficient integration and low cost. IO data rates have increased to the point where electrical signaling is now limited by the channel bandwidth. In order to achieve multi-Gb/s data rates, complex designs that equalize the channel are necessary. In addition, a high level of parallelism is central to sustaining bandwidth growth. Decision feedback equalization (DFE) is one of the most commonly employed techniques to overcome the limited bandwidth problem of the electrical channels. A linear and low-power summer is the central block of a DFE. Conventional approaches employ current-mode techniques to implement the summer, which require high power consumption. In order to achieve low-power operation we propose performing the summation in the charge domain. This approach enables a low-power and compact realization of the DFE as well as crosstalk cancellation. A prototype receiver was fabricated in 45nm SOI CMOS to validate the functionality of the proposed technique and was tested over channels with different levels of loss and coupling. Measurement results show that the receiver can equalize channels with maximum 21dB loss while consuming about 7.5mW from a 1.2V supply. We also introduce a compact, low-power transmitter employing passive equalization. The efficacy of the proposed technique is demonstrated through implementation of a prototype in 65nm CMOS. The design achieves up to 20Gb/s data rate while consuming less than 10mW.
An alternative to electrical signaling is to employ optical signaling for chip-to-chip interconnections, which offers low channel loss and cross-talk while providing high communication bandwidth. In this work we demonstrate the possibility of building compact and low-power optical receivers. A novel RC front-end is proposed that combines dynamic offset modulation and double-sampling techniques to eliminate the need for a short time constant at the input of the receiver. Unlike conventional designs, this receiver does not require a high-gain stage that runs at the data rate, making it suitable for low-power implementations. In addition, it allows time-division multiplexing to support very high data rates. A prototype was implemented in 65nm CMOS and achieved up to 24Gb/s with less than 0.4pJ/b power efficiency per channel. As the proposed design mainly employs digital blocks, it benefits greatly from technology scaling in terms of power and area saving.
As the technology scales, the number of transistors on the chip grows. This necessitates a corresponding increase in the bandwidth of the on-chip wires. In this dissertation, we take a close look at wire scaling and investigate its effect on wire performance metrics. We explore a novel on-chip communication link based on a double-sampling architecture and dynamic offset modulation technique that enables low power consumption and high data rates while achieving high bandwidth density in 28nm CMOS technology. The functionality of the link is demonstrated using different length minimum-pitch on-chip wires. Measurement results show that the link achieves up to 20Gb/s of data rate (12.5Gb/s/$\mu$m) with better than 136fJ/b of power efficiency.
Resumo:
With data centers being the supporting infrastructure for a wide range of IT services, their efficiency has become a big concern to operators, as well as to society, for both economic and environmental reasons. The goal of this thesis is to design energy-efficient algorithms that reduce energy cost while minimizing compromise to service. We focus on the algorithmic challenges at different levels of energy optimization across the data center stack. The algorithmic challenge at the device level is to improve the energy efficiency of a single computational device via techniques such as job scheduling and speed scaling. We analyze the common speed scaling algorithms in both the worst-case model and stochastic model to answer some fundamental issues in the design of speed scaling algorithms. The algorithmic challenge at the local data center level is to dynamically allocate resources (e.g., servers) and to dispatch the workload in a data center. We develop an online algorithm to make a data center more power-proportional by dynamically adapting the number of active servers. The algorithmic challenge at the global data center level is to dispatch the workload across multiple data centers, considering the geographical diversity of electricity price, availability of renewable energy, and network propagation delay. We propose algorithms to jointly optimize routing and provisioning in an online manner. Motivated by the above online decision problems, we move on to study a general class of online problem named "smoothed online convex optimization", which seeks to minimize the sum of a sequence of convex functions when "smooth" solutions are preferred. This model allows us to bridge different research communities and help us get a more fundamental understanding of general online decision problems.
Resumo:
One of the critical problems currently being faced by agriculture industry in developing nations is the alarming rate of groundwater depletion. Irrigation accounts for over 70% of the total groundwater withdrawn everyday. Compounding this issue is the use of polluting diesel generators to pump groundwater for irrigation. This has made irrigation not only the biggest consumer of groundwater but also one of the major contributors to green house gases. The aim of this thesis is to present a solution to the energy-water nexus. To make agriculture less dependent on fossil fuels, the use of a solar-powered Stirling engine as the power generator for on-farm energy needs is discussed. The Stirling cycle is revisited and practical and ideal Stirling cycles are compared. Based on agricultural needs and financial constraints faced by farmers in developing countries, the use of a Fresnel lens as a solar-concentrator and a Beta-type Stirling engine unit is suggested for sustainable power generation on the farms. To reduce the groundwater consumption and to make irrigation more sustainable, the conceptual idea of using a Stirling engine in drip irrigation is presented. To tackle the shortage of over 37 million tonnes of cold-storage in India, the idea of cost-effective solar-powered on-farm cold storage unit is discussed.
Resumo:
Adsorption of aqueous Pb(II) and Cu(II) on α-quartz was studied as a function of time, system surface area, and chemical speciation. Experimental systems contained sodium as a major cation, hydroxide, carbonate, and chloride as major anions, and covered the pH range 4 to 8. In some cases citrate and EDTA were added as representative organic complexing agents. The adsorption equilibria were reached quickly, regardless of the system surface area. The positions of the adsorption equilibria were found to be strongly dependent on pH, ionic strength and concentration of citrate and EDTA. The addition of these non-adsorbing ligands resulted in a competition between chelation and adsorption. The experimental work also included the examination of the adsorption behavior of the doubly charged major cations Ca(II) and Mg(II) as a function of pH.
The theoretical description of the experimental systems was obtained by means of chemical equilibrium-plus-adsorption computations using two adsorption models: one mainly electrostatic (the James-Healy Model), and the other mainly chemical (the Ion Exchange-Surface Complex Formation Model). Comparisons were made between these two models.
The main difficulty in the theoretical predictions of the adsorption behavior of Cu(II) was the lack of the reliable data for the second hydrolysis constant(*β_2) The choice of the constant was made on the basis of potentiometric titratlons of Cu^(2+)
The experimental data obtained and the resulting theoretical observations were applied in models of the chemical behavior of trace metals in fresh oxic waters, with emphasis on Pb(II) and Cu(II).
Resumo:
Picric acid possesses the property, which is rare among strong electrolytes, of having a convenient distribution ratio between water and certain organic solvents such as benzene, chloroform, etc. Because of this property, picric acid offers peculiar advantages for studying the well known deviations of strong electrolytes from the law of mass action, for; by means of distribution experiments, the activities of picric acid in various aqueous solutions may be compared.
In order to interpret the results of such distribution experiments, it is necessary to know the degree of ionization of picric acid in aqueous solutions.
At least three series of determinations of the equivalent conductance of picric acid have been published, but the results are not concordant; and therefore, the degree of ionization cannot be calculated with any degree of certainty.
The object of the present investigation was to redetermine the conductance of picric acid solutions in order to obtain satisfactory data from which the degrees of ionization of its solutions might be calculated.
Resumo:
Part I
Potassium bis-(tricyanovinyl) amine, K+N[C(CN)=C(CN)2]2-, crystallizes in the monoclinic system with the space group Cc and lattice constants, a = 13.346 ± 0.003 Å, c = 8.992 ± 0.003 Å, B = 114.42 ± 0.02°, and Z = 4. Three dimensional intensity data were collected by layers perpendicular to b* and c* axes. The crystal structure was refined by the least squares method with anisotropic temperature factor to an R value of 0.064.
The average carbon-carbon and carbon-nitrogen bond distances in –C-CΞN are 1.441 ± 0.016 Å and 1.146 ± 0.014 Å respectively. The bis-(tricyanovinyl) amine anion is approximately planar. The coordination number of the potassium ion is eight with bond distances from 2.890 Å to 3.408 Å. The bond angle C-N-C of the amine nitrogen is 132.4 ± 1.9°. Among six cyano groups in the molecule, two of them are bent by what appear to be significant amounts (5.0° and 7.2°). The remaining four are linear within the experimental error. The bending can probably be explained by molecular packing forces in the crystals.
Part II
The nuclear magnetic resonance of 81Br and 127I in aqueous solutions were studied. The cation-halide ion interactions were studied by studying the effect of the Li+, Na+, K+, Mg++, Cs+ upon the line width of the halide ions. The solvent-halide ion interactions were studied by studying the effects of methanol, acetonitrile, and acetone upon the line width of 81Br and 127I in the aqueous solutions. It was found that the viscosity plays a very important role upon the halide ions line width. There is no specific cation-halide ion interaction for those ions such as Mg++, Di+, Na+, and K+, whereas the Cs+ - halide ion interaction is strong. The effect of organic solvents upon the halide ion line width in aqueous solutions is in the order acetone ˃ acetonitrile ˃ methanol. It is suggested that halide ions do form some stable complex with the solvent molecules and the reason Cs+ can replace one of the ligands in the solvent-halide ion complex.
Part III
An unusually large isotope effect on the bridge hydrogen chemical shift of the enol form of pentanedione-2, 4(acetylacetone) and 3-methylpentanedione-2, 4 has been observed. An attempt has been made to interpret this effect. It is suggested from the deuterium isotope effect studies, temperature dependence of the bridge hydrogen chemical shift studies, IR studies in the OH, OD, and C=O stretch regions, and the HMO calculations, that there may probably be two structures for the enol form of acetylacetone. The difference between these two structures arises mainly from the electronic structure of the π-system. The relative population of these two structures at various temperatures for normal acetylacetone and at room temperature for the deuterated acetylacetone were calculated.
Resumo:
Jet noise reduction is an important goal within both commercial and military aviation. Although large-scale numerical simulations are now able to simultaneously compute turbulent jets and their radiated sound, lost-cost, physically-motivated models are needed to guide noise-reduction efforts. A particularly promising modeling approach centers around certain large-scale coherent structures, called wavepackets, that are observed in jets and their radiated sound. The typical approach to modeling wavepackets is to approximate them as linear modal solutions of the Euler or Navier-Stokes equations linearized about the long-time mean of the turbulent flow field. The near-field wavepackets obtained from these models show compelling agreement with those educed from experimental and simulation data for both subsonic and supersonic jets, but the acoustic radiation is severely under-predicted in the subsonic case. This thesis contributes to two aspects of these models. First, two new solution methods are developed that can be used to efficiently compute wavepackets and their acoustic radiation, reducing the computational cost of the model by more than an order of magnitude. The new techniques are spatial integration methods and constitute a well-posed, convergent alternative to the frequently used parabolized stability equations. Using concepts related to well-posed boundary conditions, the methods are formulated for general hyperbolic equations and thus have potential applications in many fields of physics and engineering. Second, the nonlinear and stochastic forcing of wavepackets is investigated with the goal of identifying and characterizing the missing dynamics responsible for the under-prediction of acoustic radiation by linear wavepacket models for subsonic jets. Specifically, we use ensembles of large-eddy-simulation flow and force data along with two data decomposition techniques to educe the actual nonlinear forcing experienced by wavepackets in a Mach 0.9 turbulent jet. Modes with high energy are extracted using proper orthogonal decomposition, while high gain modes are identified using a novel technique called empirical resolvent-mode decomposition. In contrast to the flow and acoustic fields, the forcing field is characterized by a lack of energetic coherent structures. Furthermore, the structures that do exist are largely uncorrelated with the acoustic field. Instead, the forces that most efficiently excite an acoustic response appear to take the form of random turbulent fluctuations, implying that direct feedback from nonlinear interactions amongst wavepackets is not an essential noise source mechanism. This suggests that the essential ingredients of sound generation in high Reynolds number jets are contained within the linearized Navier-Stokes operator rather than in the nonlinear forcing terms, a conclusion that has important implications for jet noise modeling.
Resumo:
Measurements of friction and heat transfer coefficients were obtained with dilute polymer solutions flowing through electrically heated smooth and rough tubes. The polymer used was "Polyox WSR-301", and tests were performed at concentrations of 10 and 50 parts per million. The rough tubes contained a close-packed, granular type of surface with roughness-height-to-diameter ratios of 0.0138 and 0.0488 respectively. A Prandtl number range of 4.38 to 10.3 was investigated which was obtained by adjusting the bulk temperature of the solution. The Reynolds numbers in the experiments were varied from =10,000 (Pr= 10.3) to 250,000 (Pr= 4.38).
Friction reductions as high as 73% in smooth tubes and 83% in rough tubes were observed, accompanied by an even more drastic heat transfer reduction (as high as 84% in smooth tubes and 93% in rough tubes). The heat transfer coefficients with Polyox can be lower for a rough tube than for a smooth one.
The similarity rules previously developed for heat transfer with a Newtonian fluid were extended to dilute polymer solution pipe flows. A velocity profile similar to the one proposed by Deissler was taken as a model to interpret the friction and heat transfer data in smooth tubes. It was found that the observed results could be explained by assuming that the turbulent diffusivities are reduced in smooth tubes in the vicinity of the wall, which brings about a thickening of the viscous layer. A possible mechanism describing the effect of the polymer additive on rough pipe flow is also discussed.