8 resultados para Binary Coding
em CaltechTHESIS
Resumo:
Life is the result of the execution of molecular programs: like how an embryo is fated to become a human or a whale, or how a person’s appearance is inherited from their parents, many biological phenomena are governed by genetic programs written in DNA molecules. At the core of such programs is the highly reliable base pairing interaction between nucleic acids. DNA nanotechnology exploits the programming power of DNA to build artificial nanostructures, molecular computers, and nanomachines. In particular, DNA origami—which is a simple yet versatile technique that allows one to create various nanoscale shapes and patterns—is at the heart of the technology. In this thesis, I describe the development of programmable self-assembly and reconfiguration of DNA origami nanostructures based on a unique strategy: rather than relying on Watson-Crick base pairing, we developed programmable bonds via the geometric arrangement of stacking interactions, which we termed stacking bonds. We further demonstrated that such bonds can be dynamically reconfigurable.
The first part of this thesis describes the design and implementation of stacking bonds. Our work addresses the fundamental question of whether one can create diverse bond types out of a single kind of attractive interaction—a question first posed implicitly by Francis Crick while seeking a deeper understanding of the origin of life and primitive genetic code. For the creation of multiple specific bonds, we used two different approaches: binary coding and shape coding of geometric arrangement of stacking interaction units, which are called blunt ends. To construct a bond space for each approach, we performed a systematic search using a computer algorithm. We used orthogonal bonds to experimentally implement the connection of five distinct DNA origami nanostructures. We also programmed the bonds to control cis/trans configuration between asymmetric nanostructures.
The second part of this thesis describes the large-scale self-assembly of DNA origami into two-dimensional checkerboard-pattern crystals via surface diffusion. We developed a protocol where the diffusion of DNA origami occurs on a substrate and is dynamically controlled by changing the cationic condition of the system. We used stacking interactions to mediate connections between the origami, because of their potential for reconfiguring during the assembly process. Assembling DNA nanostructures directly on substrate surfaces can benefit nano/microfabrication processes by eliminating a pattern transfer step. At the same time, the use of DNA origami allows high complexity and unique addressability with six-nanometer resolution within each structural unit.
The third part of this thesis describes the use of stacking bonds as dynamically breakable bonds. To break the bonds, we used biological machinery called the ParMRC system extracted from bacteria. The system ensures that, when a cell divides, each daughter cell gets one copy of the cell’s DNA by actively pushing each copy to the opposite poles of the cell. We demonstrate dynamically expandable nanostructures, which makes stacking bonds a promising candidate for reconfigurable connectors for nanoscale machine parts.
Resumo:
Storage systems are widely used and have played a crucial rule in both consumer and industrial products, for example, personal computers, data centers, and embedded systems. However, such system suffers from issues of cost, restricted-lifetime, and reliability with the emergence of new systems and devices, such as distributed storage and flash memory, respectively. Information theory, on the other hand, provides fundamental bounds and solutions to fully utilize resources such as data density, information I/O and network bandwidth. This thesis bridges these two topics, and proposes to solve challenges in data storage using a variety of coding techniques, so that storage becomes faster, more affordable, and more reliable.
We consider the system level and study the integration of RAID schemes and distributed storage. Erasure-correcting codes are the basis of the ubiquitous RAID schemes for storage systems, where disks correspond to symbols in the code and are located in a (distributed) network. Specifically, RAID schemes are based on MDS (maximum distance separable) array codes that enable optimal storage and efficient encoding and decoding algorithms. With r redundancy symbols an MDS code can sustain r erasures. For example, consider an MDS code that can correct two erasures. It is clear that when two symbols are erased, one needs to access and transmit all the remaining information to rebuild the erasures. However, an interesting and practical question is: What is the smallest fraction of information that one needs to access and transmit in order to correct a single erasure? In Part I we will show that the lower bound of 1/2 is achievable and that the result can be generalized to codes with arbitrary number of parities and optimal rebuilding.
We consider the device level and study coding and modulation techniques for emerging non-volatile memories such as flash memory. In particular, rank modulation is a novel data representation scheme proposed by Jiang et al. for multi-level flash memory cells, in which a set of n cells stores information in the permutation induced by the different charge levels of the individual cells. It eliminates the need for discrete cell levels, as well as overshoot errors, when programming cells. In order to decrease the decoding complexity, we propose two variations of this scheme in Part II: bounded rank modulation where only small sliding windows of cells are sorted to generated permutations, and partial rank modulation where only part of the n cells are used to represent data. We study limits on the capacity of bounded rank modulation and propose encoding and decoding algorithms. We show that overlaps between windows will increase capacity. We present Gray codes spanning all possible partial-rank states and using only ``push-to-the-top'' operations. These Gray codes turn out to solve an open combinatorial problem called universal cycle, which is a sequence of integers generating all possible partial permutations.
Resumo:
The work presented in this thesis revolves around erasure correction coding, as applied to distributed data storage and real-time streaming communications.
First, we examine the problem of allocating a given storage budget over a set of nodes for maximum reliability. The objective is to find an allocation of the budget that maximizes the probability of successful recovery by a data collector accessing a random subset of the nodes. This optimization problem is challenging in general because of its combinatorial nature, despite its simple formulation. We study several variations of the problem, assuming different allocation models and access models, and determine the optimal allocation and the optimal symmetric allocation (in which all nonempty nodes store the same amount of data) for a variety of cases. Although the optimal allocation can have nonintuitive structure and can be difficult to find in general, our results suggest that, as a simple heuristic, reliable storage can be achieved by spreading the budget maximally over all nodes when the budget is large, and spreading it minimally over a few nodes when it is small. Coding would therefore be beneficial in the former case, while uncoded replication would suffice in the latter case.
Second, we study how distributed storage allocations affect the recovery delay in a mobile setting. Specifically, two recovery delay optimization problems are considered for a network of mobile storage nodes: the maximization of the probability of successful recovery by a given deadline, and the minimization of the expected recovery delay. We show that the first problem is closely related to the earlier allocation problem, and solve the second problem completely for the case of symmetric allocations. It turns out that the optimal allocations for the two problems can be quite different. In a simulation study, we evaluated the performance of a simple data dissemination and storage protocol for mobile delay-tolerant networks, and observed that the choice of allocation can have a significant impact on the recovery delay under a variety of scenarios.
Third, we consider a real-time streaming system where messages created at regular time intervals at a source are encoded for transmission to a receiver over a packet erasure link; the receiver must subsequently decode each message within a given delay from its creation time. For erasure models containing a limited number of erasures per coding window, per sliding window, and containing erasure bursts whose maximum length is sufficiently short or long, we show that a time-invariant intrasession code asymptotically achieves the maximum message size among all codes that allow decoding under all admissible erasure patterns. For the bursty erasure model, we also show that diagonally interleaved codes derived from specific systematic block codes are asymptotically optimal over all codes in certain cases. We also study an i.i.d. erasure model in which each transmitted packet is erased independently with the same probability; the objective is to maximize the decoding probability for a given message size. We derive an upper bound on the decoding probability for any time-invariant code, and show that the gap between this bound and the performance of a family of time-invariant intrasession codes is small when the message size and packet erasure probability are small. In a simulation study, these codes performed well against a family of random time-invariant convolutional codes under a number of scenarios.
Finally, we consider the joint problems of routing and caching for named data networking. We propose a backpressure-based policy that employs virtual interest packets to make routing and caching decisions. In a packet-level simulation, the proposed policy outperformed a basic protocol that combines shortest-path routing with least-recently-used (LRU) cache replacement.
Resumo:
The LIGO and Virgo gravitational-wave observatories are complex and extremely sensitive strain detectors that can be used to search for a wide variety of gravitational waves from astrophysical and cosmological sources. In this thesis, I motivate the search for the gravitational wave signals from coalescing black hole binary systems with total mass between 25 and 100 solar masses. The mechanisms for formation of such systems are not well-understood, and we do not have many observational constraints on the parameters that guide the formation scenarios. Detection of gravitational waves from such systems — or, in the absence of detection, the tightening of upper limits on the rate of such coalescences — will provide valuable information that can inform the astrophysics of the formation of these systems. I review the search for these systems and place upper limits on the rate of black hole binary coalescences with total mass between 25 and 100 solar masses. I then show how the sensitivity of this search can be improved by up to 40% by the the application of the multivariate statistical classifier known as a random forest of bagged decision trees to more effectively discriminate between signal and non-Gaussian instrumental noise. I also discuss the use of this classifier in the search for the ringdown signal from the merger of two black holes with total mass between 50 and 450 solar masses and present upper limits. I also apply multivariate statistical classifiers to the problem of quantifying the non-Gaussianity of LIGO data. Despite these improvements, no gravitational-wave signals have been detected in LIGO data so far. However, the use of multivariate statistical classification can significantly improve the sensitivity of the Advanced LIGO detectors to such signals.
Resumo:
The Advanced LIGO and Virgo experiments are poised to detect gravitational waves (GWs) directly for the first time this decade. The ultimate prize will be joint observation of a compact binary merger in both gravitational and electromagnetic channels. However, GW sky locations that are uncertain by hundreds of square degrees will pose a challenge. I describe a real-time detection pipeline and a rapid Bayesian parameter estimation code that will make it possible to search promptly for optical counterparts in Advanced LIGO. Having analyzed a comprehensive population of simulated GW sources, we describe the sky localization accuracy that the GW detector network will achieve as each detector comes online and progresses toward design sensitivity. Next, in preparation for the optical search with the intermediate Palomar Transient Factory (iPTF), we have developed a unique capability to detect optical afterglows of gamma-ray bursts (GRBs) detected by the Fermi Gamma-ray Burst Monitor (GBM). Its comparable error regions offer a close parallel to the Advanced LIGO problem, but Fermi's unique access to MeV-GeV photons and its near all-sky coverage may allow us to look at optical afterglows in a relatively unexplored part of the GRB parameter space. We present the discovery and broadband follow-up observations (X-ray, UV, optical, millimeter, and radio) of eight GBM-IPTF afterglows. Two of the bursts (GRB 130702A / iPTF13bxl and GRB 140606B / iPTF14bfu) are at low redshift (z=0.145 and z = 0.384, respectively), are sub-luminous with respect to "standard" cosmological bursts, and have spectroscopically confirmed broad-line type Ic supernovae. These two bursts are possibly consistent with mildly relativistic shocks breaking out from the progenitor envelopes rather than the standard mechanism of internal shocks within an ultra-relativistic jet. On a technical level, the GBM--IPTF effort is a prototype for locating and observing optical counterparts of GW events in Advanced LIGO with the Zwicky Transient Facility.
Resumo:
The study of codes, classically motivated by the need to communicate information reliably in the presence of error, has found new life in fields as diverse as network communication, distributed storage of data, and even has connections to the design of linear measurements used in compressive sensing. But in all contexts, a code typically involves exploiting the algebraic or geometric structure underlying an application. In this thesis, we examine several problems in coding theory, and try to gain some insight into the algebraic structure behind them.
The first is the study of the entropy region - the space of all possible vectors of joint entropies which can arise from a set of discrete random variables. Understanding this region is essentially the key to optimizing network codes for a given network. To this end, we employ a group-theoretic method of constructing random variables producing so-called "group-characterizable" entropy vectors, which are capable of approximating any point in the entropy region. We show how small groups can be used to produce entropy vectors which violate the Ingleton inequality, a fundamental bound on entropy vectors arising from the random variables involved in linear network codes. We discuss the suitability of these groups to design codes for networks which could potentially outperform linear coding.
The second topic we discuss is the design of frames with low coherence, closely related to finding spherical codes in which the codewords are unit vectors spaced out around the unit sphere so as to minimize the magnitudes of their mutual inner products. We show how to build frames by selecting a cleverly chosen set of representations of a finite group to produce a "group code" as described by Slepian decades ago. We go on to reinterpret our method as selecting a subset of rows of a group Fourier matrix, allowing us to study and bound our frames' coherences using character theory. We discuss the usefulness of our frames in sparse signal recovery using linear measurements.
The final problem we investigate is that of coding with constraints, most recently motivated by the demand for ways to encode large amounts of data using error-correcting codes so that any small loss can be recovered from a small set of surviving data. Most often, this involves using a systematic linear error-correcting code in which each parity symbol is constrained to be a function of some subset of the message symbols. We derive bounds on the minimum distance of such a code based on its constraints, and characterize when these bounds can be achieved using subcodes of Reed-Solomon codes.
Resumo:
The feedback coding problem for Gaussian systems in which the noise is neither white nor statistically independent between channels is formulated in terms of arbitrary linear codes at the transmitter and at the receiver. This new formulation is used to determine a number of feedback communication systems. In particular, the optimum linear code that satisfies an average power constraint on the transmitted signals is derived for a system with noiseless feedback and forward noise of arbitrary covariance. The noisy feedback problem is considered and signal sets for the forward and feedback channels are obtained with an average power constraint on each. The general formulation and results are valid for non-Gaussian systems in which the second order statistics are known, the results being applicable to the determination of error bounds via the Chebychev inequality.
Resumo:
Part 1. Many interesting visual and mechanical phenomena occur in the critical region of fluids, both for the gas-liquid and liquid-liquid transitions. The precise thermodynamic and transport behavior here has some broad consequences for the molecular theory of liquids. Previous studies in this laboratory on a liquid-liquid critical mixture via ultrasonics supported a basically classical analysis of fluid behavior by M. Fixman (e. g., the free energy is assumed analytic in intensive variables in the thermodynamics)--at least when the fluid is not too close to critical. A breakdown in classical concepts is evidenced close to critical, in some well-defined ways. We have studied herein a liquid-liquid critical system of complementary nature (possessing a lower critical mixing or consolute temperature) to all previous mixtures, to look for new qualitative critical behavior. We did not find such new behavior in the ultrasonic absorption ascribable to the critical fluctuations, but we did find extra absorption due to chemical processes (yet these are related to the mixing behavior generating the lower consolute point). We rederived, corrected, and extended Fixman's analysis to interpret our experimental results in these more complex circumstances. The entire account of theory and experiment is prefaced by an extensive introduction recounting the general status of liquid state theory. The introduction provides a context for our present work, and also points out problems deserving attention. Interest in these problems was stimulated by this work but also by work in Part 3.
Part 2. Among variational theories of electronic structure, the Hartree-Fock theory has proved particularly valuable for a practical understanding of such properties as chemical binding, electric multipole moments, and X-ray scattering intensity. It also provides the most tractable method of calculating first-order properties under external or internal one-electron perturbations, either developed explicitly in orders of perturbation theory or in the fully self-consistent method. The accuracy and consistency of first-order properties are poorer than those of zero-order properties, but this is most often due to the use of explicit approximations in solving the perturbed equations, or to inadequacy of the variational basis in size or composition. We have calculated the electric polarizabilities of H2, He, Li, Be, LiH, and N2 by Hartree-Fock theory, using exact perturbation theory or the fully self-consistent method, as dictated by convenience. By careful studies on total basis set composition, we obtained good approximations to limiting Hartree-Fock values of polarizabilities with bases of reasonable size. The values for all species, and for each direction in the molecular cases, are within 8% of experiment, or of best theoretical values in the absence of the former. Our results support the use of unadorned Hartree-Pock theory for static polarizabilities needed in interpreting electron-molecule scattering data, collision-induced light scattering experiments, and other phenomena involving experimentally inaccessible polarizabilities.
Part 3. Numerical integration of the close-coupled scattering equations has been carried out to obtain vibrational transition probabilities for some models of the electronically adiabatic H2-H2 collision. All the models use a Lennard-Jones interaction potential between nearest atoms in the collision partners. We have analyzed the results for some insight into the vibrational excitation process in its dependence on the energy of collision, the nature of the vibrational binding potential, and other factors. We conclude also that replacement of earlier, simpler models of the interaction potential by the Lennard-Jones form adds very little realism for all the complication it introduces. A brief introduction precedes the presentation of our work and places it in the context of attempts to understand the collisional activation process in chemical reactions as well as some other chemical dynamics.