952 resultados para DATA-STORAGE


Relevância:

70.00% 70.00%

Publicador:

Resumo:

The geographic location of cloud data storage centres is an important issue for many organisations and individuals due to various regulations that require data and operations to reside in specific geographic locations. Thus, cloud users may want to be sure that their stored data have not been relocated into unknown geographic regions that may compromise the security of their stored data. Albeshri et al. (2012) combined proof of storage (POS) protocols with distance-bounding protocols to address this problem. However, their scheme involves unnecessary delay when utilising typical POS schemes due to computational overhead at the server side. The aim of this paper is to improve the basic GeoProof protocol by reducing the computation overhead at the server side. We show how this can maintain the same level of security while achieving more accurate geographic assurance.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

QUT Library Research Support has simplified and streamlined the process of research data management planning, storage, discovery and reuse through collaboration and the use of integrated and tailored online tools, and a simplification of the metadata schema. This poster presents the integrated data management services a QUT, including QUT’s Data Management Planning Tool, Research Data Finder, Spatial Data Finder and Software Finder, and information on the simplified Registry Interchange Format – Collections and Services (RIF-CS) Schema. The QUT Data Management Planning (DMP) Tool was built using the Digital Curation Centre’s DMP Online Tool and modified to QUT’s needs and policies. The tool allows researchers and Higher Degree Research students to plan how to handle research data throughout the active phase of their research. The plan is promoted as a ‘live’ document’ and researchers are encouraged to update it as required. The information entered into the plan can be made private or shared with supervisors, project members and external examiners. A plan is mandatory when requesting storage space on the QUT Research Data Storage Service. QUT’s Research Data Finder is integrated with QUT’s Academic Profiles and the Data Management Planning Tool to create a seamless data management process. This process aims to encourage the creation of high quality rich records which facilitate discovery and reuse of quality data. The Registry Interchange Format – Collections and Services (RIF-CS) Schema that is used in the QUT Research Data Finder was simplified to “RIF-CS lite” to reflect mandatory and optional metadata requirements. RIF-CS lite removed schema fields that were underused or extra to the needs of the users and system. This has reduced the amount of metadata fields required from users and made integration of systems a far more simple process where field content is easily shared across services making the process of collecting metadata as transparent as possible.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The concept of big data has already outperformed traditional data management efforts in almost all industries. Other instances it has succeeded in obtaining promising results that provide value from large-scale integration and analysis of heterogeneous data sources for example Genomic and proteomic information. Big data analytics have become increasingly important in describing the data sets and analytical techniques in software applications that are so large and complex due to its significant advantages including better business decisions, cost reduction and delivery of new product and services [1]. In a similar context, the health community has experienced not only more complex and large data content, but also information systems that contain a large number of data sources with interrelated and interconnected data attributes. That have resulted in challenging, and highly dynamic environments leading to creation of big data with its enumerate complexities, for instant sharing of information with the expected security requirements of stakeholders. When comparing big data analysis with other sectors, the health sector is still in its early stages. Key challenges include accommodating the volume, velocity and variety of healthcare data with the current deluge of exponential growth. Given the complexity of big data, it is understood that while data storage and accessibility are technically manageable, the implementation of Information Accountability measures to healthcare big data might be a practical solution in support of information security, privacy and traceability measures. Transparency is one important measure that can demonstrate integrity which is a vital factor in the healthcare service. Clarity about performance expectations is considered to be another Information Accountability measure which is necessary to avoid data ambiguity and controversy about interpretation and finally, liability [2]. According to current studies [3] Electronic Health Records (EHR) are key information resources for big data analysis and is also composed of varied co-created values [3]. Common healthcare information originates from and is used by different actors and groups that facilitate understanding of the relationship for other data sources. Consequently, healthcare services often serve as an integrated service bundle. Although a critical requirement in healthcare services and analytics, it is difficult to find a comprehensive set of guidelines to adopt EHR to fulfil the big data analysis requirements. Therefore as a remedy, this research work focus on a systematic approach containing comprehensive guidelines with the accurate data that must be provided to apply and evaluate big data analysis until the necessary decision making requirements are fulfilled to improve quality of healthcare services. Hence, we believe that this approach would subsequently improve quality of life.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The world has experienced a large increase in the amount of available data. Therefore, it requires better and more specialized tools for data storage and retrieval and information privacy. Recently Electronic Health Record (EHR) Systems have emerged to fulfill this need in health systems. They play an important role in medicine by granting access to information that can be used in medical diagnosis. Traditional systems have a focus on the storage and retrieval of this information, usually leaving issues related to privacy in the background. Doctors and patients may have different objectives when using an EHR system: patients try to restrict sensible information in their medical records to avoid misuse information while doctors want to see as much information as possible to ensure a correct diagnosis. One solution to this dilemma is the Accountable e-Health model, an access protocol model based in the Information Accountability Protocol. In this model patients are warned when doctors access their restricted data. They also enable a non-restrictive access for authenticated doctors. In this work we use FluxMED, an EHR system, and augment it with aspects of the Information Accountability Protocol to address these issues. The Implementation of the Information Accountability Framework (IAF) in FluxMED provides ways for both patients and physicians to have their privacy and access needs achieved. Issues related to storage and data security are secured by FluxMED, which contains mechanisms to ensure security and data integrity. The effort required to develop a platform for the management of medical information is mitigated by the FluxMED's workflow-based architecture: the system is flexible enough to allow the type and amount of information being altered without the need to change in your source code.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This research studied distributed computing of all-to-all comparison problems with big data sets. The thesis formalised the problem, and developed a high-performance and scalable computing framework with a programming model, data distribution strategies and task scheduling policies to solve the problem. The study considered storage usage, data locality and load balancing for performance improvement in solving the problem. The research outcomes can be applied in bioinformatics, biometrics and data mining and other domains in which all-to-all comparisons are a typical computing pattern.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Solving large-scale all-to-all comparison problems using distributed computing is increasingly significant for various applications. Previous efforts to implement distributed all-to-all comparison frameworks have treated the two phases of data distribution and comparison task scheduling separately. This leads to high storage demands as well as poor data locality for the comparison tasks, thus creating a need to redistribute the data at runtime. Furthermore, most previous methods have been developed for homogeneous computing environments, so their overall performance is degraded even further when they are used in heterogeneous distributed systems. To tackle these challenges, this paper presents a data-aware task scheduling approach for solving all-to-all comparison problems in heterogeneous distributed systems. The approach formulates the requirements for data distribution and comparison task scheduling simultaneously as a constrained optimization problem. Then, metaheuristic data pre-scheduling and dynamic task scheduling strategies are developed along with an algorithmic implementation to solve the problem. The approach provides perfect data locality for all comparison tasks, avoiding rearrangement of data at runtime. It achieves load balancing among heterogeneous computing nodes, thus enhancing the overall computation time. It also reduces data storage requirements across the network. The effectiveness of the approach is demonstrated through experimental studies.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A computer-controlled laser writing system for optical integrated circuits and data storage is described. The system is characterized by holographic (649F) and high-resolution plates. A minimum linewidth of 2.5 mum is obtained by controlling the system parameters. We show that this system can also be used for data storage applications.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Storage systems are widely used and have played a crucial rule in both consumer and industrial products, for example, personal computers, data centers, and embedded systems. However, such system suffers from issues of cost, restricted-lifetime, and reliability with the emergence of new systems and devices, such as distributed storage and flash memory, respectively. Information theory, on the other hand, provides fundamental bounds and solutions to fully utilize resources such as data density, information I/O and network bandwidth. This thesis bridges these two topics, and proposes to solve challenges in data storage using a variety of coding techniques, so that storage becomes faster, more affordable, and more reliable.

We consider the system level and study the integration of RAID schemes and distributed storage. Erasure-correcting codes are the basis of the ubiquitous RAID schemes for storage systems, where disks correspond to symbols in the code and are located in a (distributed) network. Specifically, RAID schemes are based on MDS (maximum distance separable) array codes that enable optimal storage and efficient encoding and decoding algorithms. With r redundancy symbols an MDS code can sustain r erasures. For example, consider an MDS code that can correct two erasures. It is clear that when two symbols are erased, one needs to access and transmit all the remaining information to rebuild the erasures. However, an interesting and practical question is: What is the smallest fraction of information that one needs to access and transmit in order to correct a single erasure? In Part I we will show that the lower bound of 1/2 is achievable and that the result can be generalized to codes with arbitrary number of parities and optimal rebuilding.

We consider the device level and study coding and modulation techniques for emerging non-volatile memories such as flash memory. In particular, rank modulation is a novel data representation scheme proposed by Jiang et al. for multi-level flash memory cells, in which a set of n cells stores information in the permutation induced by the different charge levels of the individual cells. It eliminates the need for discrete cell levels, as well as overshoot errors, when programming cells. In order to decrease the decoding complexity, we propose two variations of this scheme in Part II: bounded rank modulation where only small sliding windows of cells are sorted to generated permutations, and partial rank modulation where only part of the n cells are used to represent data. We study limits on the capacity of bounded rank modulation and propose encoding and decoding algorithms. We show that overlaps between windows will increase capacity. We present Gray codes spanning all possible partial-rank states and using only ``push-to-the-top'' operations. These Gray codes turn out to solve an open combinatorial problem called universal cycle, which is a sequence of integers generating all possible partial permutations.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The work presented in this thesis revolves around erasure correction coding, as applied to distributed data storage and real-time streaming communications.

First, we examine the problem of allocating a given storage budget over a set of nodes for maximum reliability. The objective is to find an allocation of the budget that maximizes the probability of successful recovery by a data collector accessing a random subset of the nodes. This optimization problem is challenging in general because of its combinatorial nature, despite its simple formulation. We study several variations of the problem, assuming different allocation models and access models, and determine the optimal allocation and the optimal symmetric allocation (in which all nonempty nodes store the same amount of data) for a variety of cases. Although the optimal allocation can have nonintuitive structure and can be difficult to find in general, our results suggest that, as a simple heuristic, reliable storage can be achieved by spreading the budget maximally over all nodes when the budget is large, and spreading it minimally over a few nodes when it is small. Coding would therefore be beneficial in the former case, while uncoded replication would suffice in the latter case.

Second, we study how distributed storage allocations affect the recovery delay in a mobile setting. Specifically, two recovery delay optimization problems are considered for a network of mobile storage nodes: the maximization of the probability of successful recovery by a given deadline, and the minimization of the expected recovery delay. We show that the first problem is closely related to the earlier allocation problem, and solve the second problem completely for the case of symmetric allocations. It turns out that the optimal allocations for the two problems can be quite different. In a simulation study, we evaluated the performance of a simple data dissemination and storage protocol for mobile delay-tolerant networks, and observed that the choice of allocation can have a significant impact on the recovery delay under a variety of scenarios.

Third, we consider a real-time streaming system where messages created at regular time intervals at a source are encoded for transmission to a receiver over a packet erasure link; the receiver must subsequently decode each message within a given delay from its creation time. For erasure models containing a limited number of erasures per coding window, per sliding window, and containing erasure bursts whose maximum length is sufficiently short or long, we show that a time-invariant intrasession code asymptotically achieves the maximum message size among all codes that allow decoding under all admissible erasure patterns. For the bursty erasure model, we also show that diagonally interleaved codes derived from specific systematic block codes are asymptotically optimal over all codes in certain cases. We also study an i.i.d. erasure model in which each transmitted packet is erased independently with the same probability; the objective is to maximize the decoding probability for a given message size. We derive an upper bound on the decoding probability for any time-invariant code, and show that the gap between this bound and the performance of a family of time-invariant intrasession codes is small when the message size and packet erasure probability are small. In a simulation study, these codes performed well against a family of random time-invariant convolutional codes under a number of scenarios.

Finally, we consider the joint problems of routing and caching for named data networking. We propose a backpressure-based policy that employs virtual interest packets to make routing and caching decisions. In a packet-level simulation, the proposed policy outperformed a basic protocol that combines shortest-path routing with least-recently-used (LRU) cache replacement.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This short position paper considers issues in developing Data Architecture for the Internet of Things (IoT) through the medium of an exemplar project, Domain Expertise Capture in Authoring and Development ­Environments (DECADE). A brief discussion sets the background for IoT, and the development of the ­distinction between things and computers. The paper makes a strong argument to avoid reinvention of the wheel, and to reuse approaches to distributed heterogeneous data architectures and the lessons learned from that work, and apply them to this situation. DECADE requires an autonomous recording system, ­local data storage, semi-autonomous verification model, sign-off mechanism, qualitative and ­quantitative ­analysis ­carried out when and where required through web-service architecture, based on ontology and analytic agents, with a self-maintaining ontology model. To develop this, we describe a web-service ­architecture, ­combining a distributed data warehouse, web services for analysis agents, ontology agents and a ­verification engine, with a centrally verified outcome database maintained by certifying body for qualification/­professional status.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Remote Data acquisition and analysing systems developed for fisheries and related environmental studies have been reported. It consists of three units. The first one namely multichannel remote data acquisition system is installed at the remote place powered by a rechargeable battery. It acquires and stores the 16 channel environmental data on a battery backed up RAM. The second unit called the Field data analyser is used for insitue display and analysis of the data stored in the backed up RAM. The third unit namely Laboratory data analyser is an IBM compatible PC based unit for detailed analysis and interpretation of the data after bringing the RAM unit to the laboratory. The data collected using the system has been analysed and presented in the form of a graph. The system timer operated at negligibly low current, switches on the power to the entire remote operated system at prefixed time interval of 2 hours.Data storage at remote site on low power battery backedupRAM and retrieval and analysis of data using PC are the special i ty of the system. The remote operated system takes about 7 seconds including the 5 second stabilization time to acquire and store data and is very ideal for remote operation on rechargeable bat tery. The system can store 16 channel data scanned at 2 hour interval for 10 days on 2K backed up RAM with memory expansion facility for 8K RAM.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Data is one of the domains in grid research that deals with the storage, replication, and management of large data sets in a distributed environment. The all-data-to-all sites replication scheme such as read-one write-all and tree grid structure (TGS) are the popular techniques being used for replication and management of data in this domain. However, these techniques have its weaknesses in terms of data storage capacity and also data access times due to some number of sites must ‘agree’ in common to execute certain transactions. In this paper, we propose the all-data-to-some-sites scheme called the neighbor replication on triangular grid (NRTG) technique by considering only neighbors have the replicated data, and thus, minimizes the storage capacity as well as high update availability. Also, the technique tolerates failures such as server failures, site failure or even network partitioning using remote procedure call (RPC).

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Data in grid research deals with storage, replication, and management of large data sets in a distributed environment. The all-data-to-all-sites replication schemes, like Read-One Write-All (ROWA) and Tree Grid Structure (TGS), are the popular techniques in grid. However, these techniques have a weakness in data storage capacity and data access times. In this paper, we propose the all-data-to-some-sites scheme called the 'Neighbour Replication on Triangular Grid' (NRTG) technique. The proposed scheme minimises the storage capacity as well as data access time with high update availability. It also tolerates failures such as server and site failures.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Since Licklider in the 1960s [27] influential proponents of networked computing have envisioned electronic information in terms of a relatively small (even singular) number of 'sources', distributed through technologies such as the Internet. Most recently, Levy writes, in Becoming Virtual, that "in cyberspace, since any point is directly accessible from any other point, there is an increasing tendency to replace copies of documents with hypertext links. Ultimately, there will only need to be a single physical exemplar of the text" [13 p.61]. Hypertext implies, in theory, the end of 'the copy', and the multiplication of access points to the original. But, in practice, the Internet abounds with copying, both large and small scale, both as conscious human practice, and also as autonomous computer function. Effective and cheap data storage that encourages computer users to keep anything of use they have downloaded, lest the links they have found, 'break'; while browsers don't 'browse' the Internet - they download copies of everything to client machines. Not surprisingly, there is significant regulation against 'copying' - regulation that constrains our understanding of 'copying' to maintain a legal fiction of the 'original' for the purposes of intellectual property protection. In this paper, I will firstly demonstrate, by a series of examples, how 'copying' is more than just copyright infringement of music and software, but is a defining, multi-faceted feature of Internet behaviour. I will then argue that the Internet produces an interaction between dematerialised, digital data and human subjectivity and desire that fundamentally challenges notions of originality and copy. Walter Benjamin noted about photography: "one can make any number of prints [from a negative]; to ask for the 'authentic' print makes no sense" [4 p.224]. In cyberspace, I conclude, it makes no sense to ask which one is the copy.