905 resultados para Data storage


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the distributed storage coding problem we consider, data is stored across n nodes in a network, each capable of storing � symbols. It is required that the complete data can be reconstructed by downloading data from any k nodes. There is also the key additional requirement that a failed node be regenerated by connecting to any d nodes and downloading �symbols from each of them. Our goal is to minimize the repair bandwidth d�. In this paper we provide explicit constructions for several parameter sets of interest.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Regenerating codes are a class of distributed storage codes that allow for efficient repair of failed nodes, as compared to traditional erasure codes. An [n, k, d] regenerating code permits the data to be recovered by connecting to any k of the n nodes in the network, while requiring that a failed node be repaired by connecting to any d nodes. The amount of data downloaded for repair is typically much smaller than the size of the source data. Previous constructions of exact-regenerating codes have been confined to the case n = d + 1. In this paper, we present optimal, explicit constructions of (a) Minimum Bandwidth Regenerating (MBR) codes for all values of [n, k, d] and (b) Minimum Storage Regenerating (MSR) codes for all [n, k, d >= 2k - 2], using a new product-matrix framework. The product-matrix framework is also shown to significantly simplify system operation. To the best of our knowledge, these are the first constructions of exact-regenerating codes that allow the number n of nodes in the network, to be chosen independent of the other parameters. The paper also contains a simpler description, in the product-matrix framework, of a previously constructed MSR code with [n = d + 1, k, d >= 2k - 1].

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Regenerating codes are a class of recently developed codes for distributed storage that, like Reed-Solomon codes, permit data recovery from any subset of k nodes within the n-node network. However, regenerating codes possess in addition, the ability to repair a failed node by connecting to an arbitrary subset of d nodes. It has been shown that for the case of functional repair, there is a tradeoff between the amount of data stored per node and the bandwidth required to repair a failed node. A special case of functional repair is exact repair where the replacement node is required to store data identical to that in the failed node. Exact repair is of interest as it greatly simplifies system implementation. The first result of this paper is an explicit, exact-repair code for the point on the storage-bandwidth tradeoff corresponding to the minimum possible repair bandwidth, for the case when d = n-1. This code has a particularly simple graphical description, and most interestingly has the ability to carry out exact repair without any need to perform arithmetic operations. We term this ability of the code to perform repair through mere transfer of data as repair by transfer. The second result of this paper shows that the interior points on the storage-bandwidth tradeoff cannot be achieved under exact repair, thus pointing to the existence of a separate tradeoff under exact repair. Specifically, we identify a set of scenarios which we term as ``helper node pooling,'' and show that it is the necessity to satisfy such scenarios that overconstrains the system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Regenerating codes are a class of recently developed codes for distributed storage that, like Reed-Solomon codes, permit data recovery from any arbitrary of nodes. However regenerating codes possess in addition, the ability to repair a failed node by connecting to any arbitrary nodes and downloading an amount of data that is typically far less than the size of the data file. This amount of download is termed the repair bandwidth. Minimum storage regenerating (MSR) codes are a subclass of regenerating codes that require the least amount of network storage; every such code is a maximum distance separable (MDS) code. Further, when a replacement node stores data identical to that in the failed node, the repair is termed as exact. The four principal results of the paper are (a) the explicit construction of a class of MDS codes for d = n - 1 >= 2k - 1 termed the MISER code, that achieves the cut-set bound on the repair bandwidth for the exact repair of systematic nodes, (b) proof of the necessity of interference alignment in exact-repair MSR codes, (c) a proof showing the impossibility of constructing linear, exact-repair MSR codes for d < 2k - 3 in the absence of symbol extension, and (d) the construction, also explicit, of high-rate MSR codes for d = k+1. Interference alignment (IA) is a theme that runs throughout the paper: the MISER code is built on the principles of IA and IA is also a crucial component to the nonexistence proof for d < 2k - 3. To the best of our knowledge, the constructions presented in this paper are the first explicit constructions of regenerating codes that achieve the cut-set bound.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There are many applications such as software for processing customer records in telecom, patient records in hospitals, email processing software accessing a single email in a mailbox etc. which require to access a single record in a database consisting of millions of records. A basic feature of these applications is that they need to access data sets which are very large but simple. Cloud computing provides computing requirements for these kinds of new generation of applications involving very large data sets which cannot possibly be handled efficiently using traditional computing infrastructure. In this paper, we describe storage services provided by three well-known cloud service providers and give a comparison of their features with a view to characterize storage requirements of very large data sets as examples and we hope that it would act as a catalyst for the design of storage services for very large data set requirements in future. We also give a brief overview of other kinds of storage that have come up in the recent past for cloud computing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Regenerating codes are a class of codes for distributed storage networks that provide reliability and availability of data, and also perform efficient node repair. Another important aspect of a distributed storage network is its security. In this paper, we consider a threat model where an eavesdropper may gain access to the data stored in a subset of the storage nodes, and possibly also, to the data downloaded during repair of some nodes. We provide explicit constructions of regenerating codes that achieve information-theoretic secrecy capacity in this setting.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Erasure codes are an efficient means of storing data across a network in comparison to data replication, as they tend to reduce the amount of data stored in the network and offer increased resilience in the presence of node failures. The codes perform poorly though, when repair of a failed node is called for, as they typically require the entire file to be downloaded to repair a failed node. A new class of erasure codes, termed as regenerating codes were recently introduced, that do much better in this respect. However, given the variety of efficient erasure codes available in the literature, there is considerable interest in the construction of coding schemes that would enable traditional erasure codes to be used, while retaining the feature that only a fraction of the data need be downloaded for node repair. In this paper, we present a simple, yet powerful, framework that does precisely this. Under this framework, the nodes are partitioned into two types and encoded using two codes in a manner that reduces the problem of node-repair to that of erasure-decoding of the constituent codes. Depending upon the choice of the two codes, the framework can be used to avail one or more of the following advantages: simultaneous minimization of storage space and repair-bandwidth, low complexity of operation, fewer disk reads at helper nodes during repair, and error detection and correction.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Erasure codes are an efficient means of storing data across a network in comparison to data replication, as they tend to reduce the amount of data stored in the network and offer increased resilience in the presence of node failures. The codes perform poorly though, when repair of a failed node is called for, as they typically require the entire file to be downloaded to repair a failed node. A new class of erasure codes, termed as regenerating codes were recently introduced, that do much better in this respect. However, given the variety of efficient erasure codes available in the literature, there is considerable interest in the construction of coding schemes that would enable traditional erasure codes to be used, while retaining the feature that only a fraction of the data need be downloaded for node repair. In this paper, we present a simple, yet powerful, framework that does precisely this. Under this framework, the nodes are partitioned into two types and encoded using two codes in a manner that reduces the problem of node-repair to that of erasure-decoding of the constituent codes. Depending upon the choice of the two codes, the framework can be used to avail one or more of the following advantages: simultaneous minimization of storage space and repair-bandwidth, low complexity of operation, fewer disk reads at helper nodes during repair, and error detection and correction.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Regenerating codes are a class of codes proposed for providing reliability of data and efficient repair of failed nodes in distributed storage systems. In this paper, we address the fundamental problem of handling errors and erasures at the nodes or links, during the data-reconstruction and node-repair operations. We provide explicit regenerating codes that are resilient to errors and erasures, and show that these codes are optimal with respect to storage and bandwidth requirements. As a special case, we also establish the capacity of a class of distributed storage systems in the presence of malicious adversaries. While our code constructions are based on previously constructed Product-Matrix codes, we also provide necessary and sufficient conditions for introducing resilience in any regenerating code.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The amount of water stored and moving through the surface water bodies of large river basins (river, floodplains, wetlands) plays a major role in the global water and biochemical cycles and is a critical parameter for water resources management. However, the spatiotemporal variations of these freshwater reservoirs are still widely unknown at the global scale. Here, we propose a hypsographic curve approach to estimate surface freshwater storage variations over the Amazon basin combining surface water extent from a multi-satellite-technique with topographic data from the Global Digital Elevation Model (GDEM) from Advance Spaceborne Thermal Emission and Reflection Radiometer (ASTER). Monthly surface water storage variations for 1993-2007 are presented, showing a strong seasonal and interannual variability, and are evaluated against in situ river discharge and precipitation. The basin-scale mean annual amplitude of similar to 1200 km(3) is in the range of previous estimates and contributes to about half of the Gravity Recovery And Climate Experiment (GRACE) total water storage variations. For the first time, we map the surface water volume anomaly during the extreme droughts of 1997 (October-November) and 2005 (September-October) and found that during these dry events the water stored in the river and floodplains of the Amazon basin was, respectively, similar to 230 (similar to 40%) and 210 (similar to 50%) km(3) below the 1993-2007 average. This new 15 year data set of surface water volume represents an unprecedented source of information for future hydrological or climate modeling of the Amazon. It is also a first step toward the development of such database at the global scale.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The development of a viable adsorbed natural gas onboard fuel system involves synthesizing materials that meet specific storage target requirements. We assess the impact on natural gas storage due to intermediate processes involved in taking a laboratory powder sample to an onboard packed or adsorbent bed module. We illustrate that reporting the V/V (volume of gas/volume of container) capacities based on powder adsorption data without accounting for losses due to pelletization and bed porosity, grossly overestimates the working storage capacity for a given material. Using data typically found for adsorbent materials that are carbon and MOF based materials, we show that in order to meet the Department of Energy targets of 180 V/V (equivalent STP) loading at 3.5 MPa and 298 K at the onboard packed bed level, the volumetric capacity of the pelletized sample should be at least 245 V/V and the corresponding gravimetric loading varies from 0.175 to 0.38 kg/kg for pellet densities ranging from 461.5 to 1,000 . With recent revision of the DOE target to 263 V/V at the onboard packed bed level, the volumetric loadings for the pelletized sample should be about 373 V/V.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this study, we applied the integration methodology developed in the companion paper by Aires (2014) by using real satellite observations over the Mississippi Basin. The methodology provides basin-scale estimates of the four water budget components (precipitation P, evapotranspiration E, water storage change Delta S, and runoff R) in a two-step process: the Simple Weighting (SW) integration and a Postprocessing Filtering (PF) that imposes the water budget closure. A comparison with in situ observations of P and E demonstrated that PF improved the estimation of both components. A Closure Correction Model (CCM) has been derived from the integrated product (SW+PF) that allows to correct each observation data set independently, unlike the SW+PF method which requires simultaneous estimates of the four components. The CCM allows to standardize the various data sets for each component and highly decrease the budget residual (P - E - Delta S - R). As a direct application, the CCM was combined with the water budget equation to reconstruct missing values in any component. Results of a Monte Carlo experiment with synthetic gaps demonstrated the good performances of the method, except for the runoff data that has a variability of the same order of magnitude as the budget residual. Similarly, we proposed a reconstruction of Delta S between 1990 and 2002 where no Gravity Recovery and Climate Experiment data are available. Unlike most of the studies dealing with the water budget closure at the basin scale, only satellite observations and in situ runoff measurements are used. Consequently, the integrated data sets are model independent and can be used for model calibration or validation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

While the tradeoff between the amount of data stored and the repair bandwidth of an (n, k, d) regenerating code has been characterized under functional repair (FR), the case of exact repair (ER) remains unresolved. It is known that there do not exist ER codes which lie on the FR tradeoff at most of the points. The question as to whether one can asymptotically approach the FR tradeoff was settled recently by Tian who showed that in the (4, 3, 3) case, the ER region is bounded away from the FR region. The FR tradeoff serves as a trivial outer bound on the ER tradeoff. In this paper, we extend Tian's results by establishing an improved outer bound on the ER tradeoff which shows that the ER region is bounded away from the FR region, for any (n; k; d). Our approach is analytical and builds upon the framework introduced earlier by Shah et. al. Interestingly, a recently-constructed, layered regenerating code is shown to achieve a point on this outer bound for the (5, 4, 4) case. This represents the first-known instance of an optimal ER code that does not correspond to a point on the FR tradeoff.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a comprehensive and robust strategy for the estimation of battery model parameters from noise corrupted data. The deficiencies of the existing methods for parameter estimation are studied and the proposed parameter estimation strategy improves on earlier methods by working optimally for low as well as high discharge currents, providing accurate estimates even under high levels of noise, and with a wide range of initial values. Testing on different data sets confirms the performance of the proposed parameter estimation strategy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Scientific research revolves around the production, analysis, storage, management, and re-use of data. Data sharing offers important benefits for scientific progress and advancement of knowledge. However, several limitations and barriers in the general adoption of data sharing are still in place. Probably the most important challenge is that data sharing is not yet very common among scholars and is not yet seen as a regular activity among scientists, although important efforts are being invested in promoting data sharing. In addition, there is a relatively low commitment of scholars to cite data. The most important problems and challenges regarding data metrics are closely tied to the more general problems related to data sharing. The development of data metrics is dependent on the growth of data sharing practices, after all it is nothing more than the registration of researchers’ behaviour. At the same time, the availability of proper metrics can help researchers to make their data work more visible. This may subsequently act as an incentive for more data sharing and in this way a virtuous circle may be set in motion. This report seeks to further explore the possibilities of metrics for datasets (i.e. the creation of reliable data metrics) and an effective reward system that aligns the main interests of the main stakeholders involved in the process. The report reviews the current literature on data sharing and data metrics. It presents interviews with the main stakeholders on data sharing and data metrics. It also analyses the existing repositories and tools in the field of data sharing that have special relevance for the promotion and development of data metrics. On the basis of these three pillars, the report presents a number of solutions and necessary developments, as well as a set of recommendations regarding data metrics. The most important recommendations include the general adoption of data sharing and data publication among scholars; the development of a reward system for scientists that includes data metrics; reducing the costs of data publication; reducing existing negative cultural perceptions of researchers regarding data publication; developing standards for preservation, publication, identification and citation of datasets; more coordination of data repository initiatives; and further development of interoperability protocols across different actors.