905 resultados para Data storage


Relevância:

30.00% 30.00%

Publicador:

Resumo:

A repetitive sequence collection is one where portions of a base sequence of length n are repeated many times with small variations, forming a collection of total length N. Examples of such collections are version control data and genome sequences of individuals, where the differences can be expressed by lists of basic edit operations. Flexible and efficient data analysis on a such typically huge collection is plausible using suffix trees. However, suffix tree occupies O(N log N) bits, which very soon inhibits in-memory analyses. Recent advances in full-text self-indexing reduce the space of suffix tree to O(N log σ) bits, where σ is the alphabet size. In practice, the space reduction is more than 10-fold, for example on suffix tree of Human Genome. However, this reduction factor remains constant when more sequences are added to the collection. We develop a new family of self-indexes suited for the repetitive sequence collection setting. Their expected space requirement depends only on the length n of the base sequence and the number s of variations in its repeated copies. That is, the space reduction factor is no longer constant, but depends on N / n. We believe the structures developed in this work will provide a fundamental basis for storage and retrieval of individual genomes as they become available due to rapid progress in the sequencing technologies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The reversible chemical reaction of Ca(OH)2/CaO appears to be attractive for storage of solar thermal energy, in view of the nonpolluting and nontoxic nature of the reactants. This paper presents some data on thermal decomposition of calcium hydroxide pellets along with its additives of aluminum, aluminum hydroxide, zinc, and copper. The addition of aluminum and zinc powder enhanced the rate of decomposition considerably at 450°C, but copper had no effect. Considerations on the effect of additives are also discussed in some detail, though their effects are not established with certainty. There is some evidence that heat transfer into the pellet, and the number of potential nucleation sites due to thermal stresses, influence the kinetics and mechanism of decomposition.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Erasure coding techniques are used to increase the reliability of distributed storage systems while minimizing storage overhead. Also of interest is minimization of the bandwidth required to repair the system following a node failure. In a recent paper, Wu et al. characterize the tradeoff between the repair bandwidth and the amount of data stored per node. They also prove the existence of regenerating codes that achieve this tradeoff. In this paper, we introduce Exact Regenerating Codes, which are regenerating codes possessing the additional property of being able to duplicate the data stored at a failed node. Such codes require low processing and communication overheads, making the system practical and easy to maintain. Explicit construction of exact regenerating codes is provided for the minimum bandwidth point on the storage-repair bandwidth tradeoff, relevant to distributed-mail-server applications. A sub-space based approach is provided and shown to yield necessary and sufficient conditions on a linear code to possess the exact regeneration property as well as prove the uniqueness of our construction. Also included in the paper, is an explicit construction of regenerating codes for the minimum storage point for parameters relevant to storage in peer-to-peer systems. This construction supports a variable number of nodes and can handle multiple, simultaneous node failures. All constructions given in the paper are of low complexity, requiring low field size in particular.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In a storage system where individual storage nodes are prone to failure, the redundant storage of data in a distributed manner across multiple nodes is a must to ensure reliability. Reed-Solomon codes possess the reconstruction property under which the stored data can be recovered by connecting to any k of the n nodes in the network across which data is dispersed. This property can be shown to lead to vastly improved network reliability over simple replication schemes. Also of interest in such storage systems is the minimization of the repair bandwidth, i.e., the amount of data needed to be downloaded from the network in order to repair a single failed node. Reed-Solomon codes perform poorly here as they require the entire data to be downloaded. Regenerating codes are a new class of codes which minimize the repair bandwidth while retaining the reconstruction property. This paper provides an overview of regenerating codes including a discussion on the explicit construction of optimum codes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Nearly one fourth of new medicinal molecules are biopharmaceutical (protein, antibody or nucleic acid derivative) based. However, the administration of these compounds is not always that straightforward due to the fragile nature of aforementioned domains in GI-tract. In addition, these molecules often exhibit poor bioavailability when administered orally. As a result, parenteral administration is commonly preferred. In addition, shelf-life of these molecules in aqueous environments is poor, unless stored in low temperatures. Another approach is to bring these molecules to anhydrous form via lyophilization resulting in enhanced stability during storage. Proteins cannot most commonly be freeze dried by themselves so some kind of excipients are nearly always necessary. Disaccharides are commonly utilized excipients in freeze-dried formulations since they provide a rigid glassy matrix to maintain the native conformation of the protein domain. They also act as "sink"-agents, which basically mean that they can absorb some moisture from the environment and still help to protect the API itself to retain its activity and therefore offer a way to robust formulation. The aim of the present study was to investigate how four amorphous disaccharides (cellobiose, melibiose, sucrose and trehalose) behave when they are brought to different relative humidity levels. At first, solutions of each disaccharide were prepared, filled into scintillation vials and freeze dried. Initial information on how the moisture induced transformations take place, the lyophilized amorphous disaccharide cakes were placed in vacuum desiccators containing different relative humidity levels for defined period, after which selected analyzing methods were utilized to further examine the occurred transformations. Affinity to crystallization, water sorption of the disaccharides, the effect of moisture on glass transition and crystallization temperature were studied. In addition FT-IR microscopy was utilized to map the moisture distribution on a piece of lyophilized cake. Observations made during the experiments backed up the data mentioned in a previous study: melibiose and trehalose were shown to be superior over sucrose and cellobiose what comes to the ability to withstand elevated humidity and temperature, and to avoid crystallization with pharmaceutically relevant moisture contents. The difference was made evident with every utilized analyzing method. In addition, melibiose showed interesting anomalies during DVS runs, which were absent with other amorphous disaccharides. Particularly fascinating was the observation made with polarized light microscope, which revealed a possible small-scale crystallization that cannot be observed with XRPD. As a result, a suggestion can safely be made that a robust formulation is most likely obtained by utilizing either melibiose or trehalose as a stabilizing agent for biopharmaceutical freeze-dried formulations. On the other hand, more experiments should be conducted to obtain more accurate information on why these disaccharides have better tolerance for elevating humidities than others.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The study of soil microbiota and their activities is central to the understanding of many ecosystem processes such as decomposition and nutrient cycling. The collection of microbiological data from soils generally involves several sequential steps of sampling, pretreatment and laboratory measurements. The reliability of results is dependent on reliable methods in every step. The aim of this thesis was to critically evaluate some central methods and procedures used in soil microbiological studies in order to increase our understanding of the factors that affect the measurement results and to provide guidance and new approaches for the design of experiments. The thesis focuses on four major themes: 1) soil microbiological heterogeneity and sampling, 2) storage of soil samples, 3) DNA extraction from soil, and 4) quantification of specific microbial groups by the most-probable-number (MPN) procedure. Soil heterogeneity and sampling are discussed as a single theme because knowledge on spatial (horizontal and vertical) and temporal variation is crucial when designing sampling procedures. Comparison of adjacent forest, meadow and cropped field plots showed that land use has a strong impact on the degree of horizontal variation of soil enzyme activities and bacterial community structure. However, regardless of the land use, the variation of microbiological characteristics appeared not to have predictable spatial structure at 0.5-10 m. Temporal and soil depth-related patterns were studied in relation to plant growth in cropped soil. The results showed that most enzyme activities and microbial biomass have a clear decreasing trend in the top 40 cm soil profile and a temporal pattern during the growing season. A new procedure for sampling of soil microbiological characteristics based on stratified sampling and pre-characterisation of samples was developed. A practical example demonstrated the potential of the new procedure to reduce the analysis efforts involved in laborious microbiological measurements without loss of precision. The investigation of storage of soil samples revealed that freezing (-20 °C) of small sample aliquots retains the activity of hydrolytic enzymes and the structure of the bacterial community in different soil matrices relatively well whereas air-drying cannot be recommended as a storage method for soil microbiological properties due to large reductions in activity. Freezing below -70 °C was the preferred method of storage for samples with high organic matter content. Comparison of different direct DNA extraction methods showed that the cell lysis treatment has a strong impact on the molecular size of DNA obtained and on the bacterial community structure detected. An improved MPN method for the enumeration of soil naphthalene degraders was introduced as an alternative to more complex MPN protocols or the DNA-based quantification approach. The main advantage of the new method is the simple protocol and the possibility to analyse a large number of samples and replicates simultaneously.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the distributed storage setting that we consider, data is stored across n nodes in the network such that the data can be recovered by connecting to any subset of k nodes. Additionally, one can repair a failed node by connecting to any d nodes while downloading beta units of data from each. Dimakis et al. show that the repair bandwidth d beta can be considerably reduced if each node stores slightly more than the minimum required and characterize the tradeoff between the amount of storage per node and the repair bandwidth. In the exact regeneration variation, unlike the functional regeneration, the replacement for a failed node is required to store data identical to that in the failed node. This greatly reduces the complexity of system maintenance. The main result of this paper is an explicit construction of codes for all values of the system parameters at one of the two most important and extreme points of the tradeoff - the Minimum Bandwidth Regenerating point, which performs optimal exact regeneration of any failed node. A second result is a non-existence proof showing that with one possible exception, no other point on the tradeoff can be achieved for exact regeneration.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the distributed storage setting introduced by Dimakis et al., B units of data are stored across n nodes in the network in such a way that the data can be recovered by connecting to any k nodes. Additionally one can repair a failed node by connecting to any d nodes while downloading at most beta units of data from each node. In this paper, we introduce a flexible framework in which the data can be recovered by connecting to any number of nodes as long as the total amount of data downloaded is at least B. Similarly, regeneration of a failed node is possible if the new node connects to the network using links whose individual capacity is bounded above by beta(max) and whose sum capacity equals or exceeds a predetermined parameter gamma. In this flexible setting, we obtain the cut-set lower bound on the repair bandwidth along with a constructive proof for the existence of codes meeting this bound for all values of the parameters. An explicit code construction is provided which is optimal in certain parameter regimes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper considers the problem of power management and throughput maximization for energy neutral operation when using Energy Harvesting Sensors (EHS) to send data over wireless links. It is assumed that the EHS are designed to transmit data at a constant rate (using a fixed modulation and coding scheme) but are power-controlled. A framework under which the system designer can optimize the performance of EHS when the channel is Rayleigh fading is developed. For example, the highest average data rate that can be supported over a Rayleigh fading channel given the energy harvesting capability, the battery power storage efficiency and the maximum allowed transmit energy per slot is derived. Furthermore, the optimum transmission scheme that guarantees a particular data throughput is derived. The usefulness of the framework developed is illustrated through simulation results for specific examples.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, power management algorithms for energy harvesting sensors (EHS) that operate purely based on energy harvested from the environment are proposed. To maintain energy neutrality, EHS nodes schedule their utilization of the harvested power so as to save/draw energy into/from an inefficient battery during peak/low energy harvesting periods, respectively. Under this constraint, one of the key system design goals is to transmit as much data as possible given the energy harvesting profile. For implementational simplicity, it is assumed that the EHS transmits at a constant data rate with power control, when the channel is sufficiently good. By converting the data rate maximization problem into a convex optimization problem, the optimal load scheduling (power management) algorithm that maximizes the average data rate subject to energy neutrality is derived. Also, the energy storage requirements on the battery for implementing the proposed algorithm are calculated. Further, robust schemes that account for the insufficiency of battery storage capacity, or errors in the prediction of the harvested power are proposed. The superior performance of the proposed algorithms over conventional scheduling schemes are demonstrated through computations using numerical data from solar energy harvesting databases.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A distributed storage setting is considered where a file of size B is to be stored across n storage nodes. A data collector should be able to reconstruct the entire data by downloading the symbols stored in any k nodes. When a node fails, it is replaced by a new node by downloading data from some of the existing nodes. The amount of download is termed as repair bandwidth. One way to implement such a system is to store one fragment of an (n, k) MDS code in each node, in which case the repair bandwidth is B. Since repair of a failed node consumes network bandwidth, codes reducing repair bandwidth are of great interest. Most of the recent work in this area focuses on reducing the repair bandwidth of a set of k nodes which store the data in uncoded form, while the reduction in the repair bandwidth of the remaining nodes is only marginal. In this paper, we present an explicit code which reduces the repair bandwidth for all the nodes to approximately B/2. To the best of our knowledge, this is the first explicit code which reduces the repair bandwidth of all the nodes for all feasible values of the system parameters.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We consider the problem of minimizing the bandwidth required to repair a failed node when data is stored across n nodes in a distributed manner, so as to facilitate reconstruction of the entire data by connecting to any k out of the n nodes. We provide explicit and optimal constructions which permit exact replication of a failed systematic node.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we outline an approach to the task of designing network codes in a non-multicast setting. Our approach makes use of the concept of interference alignment. As an example, we consider the distributed storage problem where the data is stored across the network in n nodes and where a data collector can recover the data by connecting to any k of the n nodes and where furthermore, upon failure of a node, a new node can replicate the data stored in the failed node while minimizing the repair bandwidth.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A distributed storage setting is considered where a file of size B is to be stored across n storage nodes. A data collector should be able to reconstruct the entire data by downloading the symbols stored in any k nodes. When a node fails, it is replaced by a new node by downloading data from some of the existing nodes. The amount of download is termed as repair bandwidth. One way to implement such a system is to store one fragment of an (n, k) MDS code in each node, in which case the repair bandwidth is B. Since repair of a failed node consumes network bandwidth, codes reducing repair bandwidth are of great interest. Most of the recent work in this area focuses on reducing the repair bandwidth of a set of k nodes which store the data in uncoded form, while the reduction in the repair bandwidth of the remaining nodes is only marginal. In this paper, we present an explicit code which reduces the repair bandwidth for all the nodes to approximately B/2. To the best of our knowledge, this is the first explicit code which reduces the repair bandwidth of all the nodes for all feasible values of the system parameters.