6 resultados para coding complexity

em CaltechTHESIS


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Storage systems are widely used and have played a crucial rule in both consumer and industrial products, for example, personal computers, data centers, and embedded systems. However, such system suffers from issues of cost, restricted-lifetime, and reliability with the emergence of new systems and devices, such as distributed storage and flash memory, respectively. Information theory, on the other hand, provides fundamental bounds and solutions to fully utilize resources such as data density, information I/O and network bandwidth. This thesis bridges these two topics, and proposes to solve challenges in data storage using a variety of coding techniques, so that storage becomes faster, more affordable, and more reliable.

We consider the system level and study the integration of RAID schemes and distributed storage. Erasure-correcting codes are the basis of the ubiquitous RAID schemes for storage systems, where disks correspond to symbols in the code and are located in a (distributed) network. Specifically, RAID schemes are based on MDS (maximum distance separable) array codes that enable optimal storage and efficient encoding and decoding algorithms. With r redundancy symbols an MDS code can sustain r erasures. For example, consider an MDS code that can correct two erasures. It is clear that when two symbols are erased, one needs to access and transmit all the remaining information to rebuild the erasures. However, an interesting and practical question is: What is the smallest fraction of information that one needs to access and transmit in order to correct a single erasure? In Part I we will show that the lower bound of 1/2 is achievable and that the result can be generalized to codes with arbitrary number of parities and optimal rebuilding.

We consider the device level and study coding and modulation techniques for emerging non-volatile memories such as flash memory. In particular, rank modulation is a novel data representation scheme proposed by Jiang et al. for multi-level flash memory cells, in which a set of n cells stores information in the permutation induced by the different charge levels of the individual cells. It eliminates the need for discrete cell levels, as well as overshoot errors, when programming cells. In order to decrease the decoding complexity, we propose two variations of this scheme in Part II: bounded rank modulation where only small sliding windows of cells are sorted to generated permutations, and partial rank modulation where only part of the n cells are used to represent data. We study limits on the capacity of bounded rank modulation and propose encoding and decoding algorithms. We show that overlaps between windows will increase capacity. We present Gray codes spanning all possible partial-rank states and using only ``push-to-the-top'' operations. These Gray codes turn out to solve an open combinatorial problem called universal cycle, which is a sequence of integers generating all possible partial permutations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Life is the result of the execution of molecular programs: like how an embryo is fated to become a human or a whale, or how a person’s appearance is inherited from their parents, many biological phenomena are governed by genetic programs written in DNA molecules. At the core of such programs is the highly reliable base pairing interaction between nucleic acids. DNA nanotechnology exploits the programming power of DNA to build artificial nanostructures, molecular computers, and nanomachines. In particular, DNA origami—which is a simple yet versatile technique that allows one to create various nanoscale shapes and patterns—is at the heart of the technology. In this thesis, I describe the development of programmable self-assembly and reconfiguration of DNA origami nanostructures based on a unique strategy: rather than relying on Watson-Crick base pairing, we developed programmable bonds via the geometric arrangement of stacking interactions, which we termed stacking bonds. We further demonstrated that such bonds can be dynamically reconfigurable.

The first part of this thesis describes the design and implementation of stacking bonds. Our work addresses the fundamental question of whether one can create diverse bond types out of a single kind of attractive interaction—a question first posed implicitly by Francis Crick while seeking a deeper understanding of the origin of life and primitive genetic code. For the creation of multiple specific bonds, we used two different approaches: binary coding and shape coding of geometric arrangement of stacking interaction units, which are called blunt ends. To construct a bond space for each approach, we performed a systematic search using a computer algorithm. We used orthogonal bonds to experimentally implement the connection of five distinct DNA origami nanostructures. We also programmed the bonds to control cis/trans configuration between asymmetric nanostructures.

The second part of this thesis describes the large-scale self-assembly of DNA origami into two-dimensional checkerboard-pattern crystals via surface diffusion. We developed a protocol where the diffusion of DNA origami occurs on a substrate and is dynamically controlled by changing the cationic condition of the system. We used stacking interactions to mediate connections between the origami, because of their potential for reconfiguring during the assembly process. Assembling DNA nanostructures directly on substrate surfaces can benefit nano/microfabrication processes by eliminating a pattern transfer step. At the same time, the use of DNA origami allows high complexity and unique addressability with six-nanometer resolution within each structural unit.

The third part of this thesis describes the use of stacking bonds as dynamically breakable bonds. To break the bonds, we used biological machinery called the ParMRC system extracted from bacteria. The system ensures that, when a cell divides, each daughter cell gets one copy of the cell’s DNA by actively pushing each copy to the opposite poles of the cell. We demonstrate dynamically expandable nanostructures, which makes stacking bonds a promising candidate for reconfigurable connectors for nanoscale machine parts.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The work presented in this thesis revolves around erasure correction coding, as applied to distributed data storage and real-time streaming communications.

First, we examine the problem of allocating a given storage budget over a set of nodes for maximum reliability. The objective is to find an allocation of the budget that maximizes the probability of successful recovery by a data collector accessing a random subset of the nodes. This optimization problem is challenging in general because of its combinatorial nature, despite its simple formulation. We study several variations of the problem, assuming different allocation models and access models, and determine the optimal allocation and the optimal symmetric allocation (in which all nonempty nodes store the same amount of data) for a variety of cases. Although the optimal allocation can have nonintuitive structure and can be difficult to find in general, our results suggest that, as a simple heuristic, reliable storage can be achieved by spreading the budget maximally over all nodes when the budget is large, and spreading it minimally over a few nodes when it is small. Coding would therefore be beneficial in the former case, while uncoded replication would suffice in the latter case.

Second, we study how distributed storage allocations affect the recovery delay in a mobile setting. Specifically, two recovery delay optimization problems are considered for a network of mobile storage nodes: the maximization of the probability of successful recovery by a given deadline, and the minimization of the expected recovery delay. We show that the first problem is closely related to the earlier allocation problem, and solve the second problem completely for the case of symmetric allocations. It turns out that the optimal allocations for the two problems can be quite different. In a simulation study, we evaluated the performance of a simple data dissemination and storage protocol for mobile delay-tolerant networks, and observed that the choice of allocation can have a significant impact on the recovery delay under a variety of scenarios.

Third, we consider a real-time streaming system where messages created at regular time intervals at a source are encoded for transmission to a receiver over a packet erasure link; the receiver must subsequently decode each message within a given delay from its creation time. For erasure models containing a limited number of erasures per coding window, per sliding window, and containing erasure bursts whose maximum length is sufficiently short or long, we show that a time-invariant intrasession code asymptotically achieves the maximum message size among all codes that allow decoding under all admissible erasure patterns. For the bursty erasure model, we also show that diagonally interleaved codes derived from specific systematic block codes are asymptotically optimal over all codes in certain cases. We also study an i.i.d. erasure model in which each transmitted packet is erased independently with the same probability; the objective is to maximize the decoding probability for a given message size. We derive an upper bound on the decoding probability for any time-invariant code, and show that the gap between this bound and the performance of a family of time-invariant intrasession codes is small when the message size and packet erasure probability are small. In a simulation study, these codes performed well against a family of random time-invariant convolutional codes under a number of scenarios.

Finally, we consider the joint problems of routing and caching for named data networking. We propose a backpressure-based policy that employs virtual interest packets to make routing and caching decisions. In a packet-level simulation, the proposed policy outperformed a basic protocol that combines shortest-path routing with least-recently-used (LRU) cache replacement.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Complexity in the earthquake rupture process can result from many factors. This study investigates the origin of such complexity by examining several recent, large earthquakes in detail. In each case the local tectonic environment plays an important role in understanding the source of the complexity.

Several large shallow earthquakes (Ms > 7.0) along the Middle American Trench have similarities and differences between them that may lead to a better understanding of fracture and subduction processes. They are predominantly thrust events consistent with the known subduction of the Cocos plate beneath N. America. Two events occurring along this subduction zone close to triple junctions show considerable complexity. This may be attributable to a more heterogeneous stress environment in these regions and as such has implications for other subduction zone boundaries.

An event which looks complex but is actually rather simple is the 1978 Bermuda earthquake (Ms ~ 6). It is located predominantly in the mantle. Its mechanism is one of pure thrust faulting with a strike N 20°W and dip 42°NE. Its apparent complexity is caused by local crustal structure. This is an important event in terms of understanding and estimating seismic hazard on the eastern seaboard of N. America.

A study of several large strike-slip continental earthquakes identifies characteristics which are common to them and may be useful in determining what to expect from the next great earthquake on the San Andreas fault. The events are the 1976 Guatemala earthquake on the Motagua fault and two events on the Anatolian fault in Turkey (the 1967, Mudurnu Valley and 1976, E. Turkey events). An attempt to model the complex P-waveforms of these events results in good synthetic fits for the Guatemala and Mudurnu Valley events. However, the E. Turkey event proves to be too complex as it may have associated thrust or normal faulting. Several individual sources occurring at intervals of between 5 and 20 seconds characterize the Guatemala and Mudurnu Valley events. The maximum size of an individual source appears to be bounded at about 5 x 1026 dyne-cm. A detailed source study including directivity is performed on the Guatemala event. The source time history of the Mudurnu Valley event illustrates its significance in modeling strong ground motion in the near field. The complex source time series of the 1967 event produces amplitudes greater by a factor of 2.5 than a uniform model scaled to the same size for a station 20 km from the fault.

Three large and important earthquakes demonstrate an important type of complexity --- multiple-fault complexity. The first, the 1976 Philippine earthquake, an oblique thrust event, represents the first seismological evidence for a northeast dipping subduction zone beneath the island of Mindanao. A large event, following the mainshock by 12 hours, occurred outside the aftershock area and apparently resulted from motion on a subsidiary fault since the event had a strike-slip mechanism.

An aftershock of the great 1960 Chilean earthquake on June 6, 1960, proved to be an interesting discovery. It appears to be a large strike-slip event at the main rupture's southern boundary. It most likely occurred on the landward extension of the Chile Rise transform fault, in the subducting plate. The results for this event suggest that a small event triggered a series of slow events; the duration of the whole sequence being longer than 1 hour. This is indeed a "slow earthquake".

Perhaps one of the most complex of events is the recent Tangshan, China event. It began as a large strike-slip event. Within several seconds of the mainshock it may have triggered thrust faulting to the south of the epicenter. There is no doubt, however, that it triggered a large oblique normal event to the northeast, 15 hours after the mainshock. This event certainly contributed to the great loss of life-sustained as a result of the Tangshan earthquake sequence.

What has been learned from these studies has been applied to predict what one might expect from the next great earthquake on the San Andreas. The expectation from this study is that such an event would be a large complex event, not unlike, but perhaps larger than, the Guatemala or Mudurnu Valley events. That is to say, it will most likely consist of a series of individual events in sequence. It is also quite possible that the event could trigger associated faulting on neighboring fault systems such as those occurring in the Transverse Ranges. This has important bearing on the earthquake hazard estimation for the region.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The study of codes, classically motivated by the need to communicate information reliably in the presence of error, has found new life in fields as diverse as network communication, distributed storage of data, and even has connections to the design of linear measurements used in compressive sensing. But in all contexts, a code typically involves exploiting the algebraic or geometric structure underlying an application. In this thesis, we examine several problems in coding theory, and try to gain some insight into the algebraic structure behind them.

The first is the study of the entropy region - the space of all possible vectors of joint entropies which can arise from a set of discrete random variables. Understanding this region is essentially the key to optimizing network codes for a given network. To this end, we employ a group-theoretic method of constructing random variables producing so-called "group-characterizable" entropy vectors, which are capable of approximating any point in the entropy region. We show how small groups can be used to produce entropy vectors which violate the Ingleton inequality, a fundamental bound on entropy vectors arising from the random variables involved in linear network codes. We discuss the suitability of these groups to design codes for networks which could potentially outperform linear coding.

The second topic we discuss is the design of frames with low coherence, closely related to finding spherical codes in which the codewords are unit vectors spaced out around the unit sphere so as to minimize the magnitudes of their mutual inner products. We show how to build frames by selecting a cleverly chosen set of representations of a finite group to produce a "group code" as described by Slepian decades ago. We go on to reinterpret our method as selecting a subset of rows of a group Fourier matrix, allowing us to study and bound our frames' coherences using character theory. We discuss the usefulness of our frames in sparse signal recovery using linear measurements.

The final problem we investigate is that of coding with constraints, most recently motivated by the demand for ways to encode large amounts of data using error-correcting codes so that any small loss can be recovered from a small set of surviving data. Most often, this involves using a systematic linear error-correcting code in which each parity symbol is constrained to be a function of some subset of the message symbols. We derive bounds on the minimum distance of such a code based on its constraints, and characterize when these bounds can be achieved using subcodes of Reed-Solomon codes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The feedback coding problem for Gaussian systems in which the noise is neither white nor statistically independent between channels is formulated in terms of arbitrary linear codes at the transmitter and at the receiver. This new formulation is used to determine a number of feedback communication systems. In particular, the optimum linear code that satisfies an average power constraint on the transmitted signals is derived for a system with noiseless feedback and forward noise of arbitrary covariance. The noisy feedback problem is considered and signal sets for the forward and feedback channels are obtained with an average power constraint on each. The general formulation and results are valid for non-Gaussian systems in which the second order statistics are known, the results being applicable to the determination of error bounds via the Chebychev inequality.