987 resultados para data replication


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Honig and Samuelsson (2014) and Delmar (2015) recently had an exchange in this journal related to a replication-and-extension attempt of two papers which originally arrived at different conclusions based on the same data set. This commentary provides further clarification on the issues and links the debate to broader issues scholarly culture and practices in entrepreneurship research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mobile computing has enabled users to seamlessly access databases even when they are on the move. Mobile computing environments require data management approaches that are able to provide complete and highly available access to shared data at any time from any where. In this paper, we propose a novel replicated data protocol for achieving such goal. The proposed scheme replicates data synchronously over stationary sites based on three dimensional grid structure while objects in mobile sites are asynchronously replicated based on commonly visited sites for each user. This combination allows the proposed protocol to operate with less than full connectivity, to easily adapt to changes in group membership and not require all sites to agree to update data objects at any given time, thus giving the technique flexibility in mobile environments. The proposed replication technique is compared with a baseline replication technique and shown to exhibit high availability, fault tolerance and minimal access times of the data and services, which are very important in an environment with low-quality communication links.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data replication is one of the key components in data grid architecture as it enhances data access and reliability and minimises the cost of data transmission. In this paper, we address the problem of reducing the overheads of the replication mechanisms that drive the data management components of a data grid. We propose an approach that extends the resource broker with policies that factor in user quality of service as well as service costs when replicating and transferring data. A realistic model of the data grid was created to simulate and explore the performance of the proposed policy. The policy displayed an effective means of improving the performance of the grid network traffic and is indicated by the improvement of speed and cost of transfers by brokers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Failures are normal rather than exceptional in the cloud computing environments. To improve system avai-lability, replicating the popular data to multiple suitable locations is an advisable choice, as users can access the data from a nearby site. This is, however, not the case for replicas which must have a fixed number of copies on several locations. How to decide a reasonable number and right locations for replicas has become a challenge in the cloud computing. In this paper, a dynamic data replication strategy is put forward with a brief survey of replication strategy suitable for distributed computing environments. It includes: 1) analyzing and modeling the relationship between system availability and the number of replicas; 2) evaluating and identifying the popular data and triggering a replication operation when the popularity data passes a dynamic threshold; 3) calculating a suitable number of copies to meet a reasonable system byte effective rate requirement and placing replicas among data nodes in a balanced way; 4) designing the dynamic data replication algorithm in a cloud. Experimental results demonstrate the efficiency and effectiveness of the improved system brought by the proposed strategy in a cloud.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data grids have been adopted by many scientific communities that need to share, access, transport, process, and manage geographically distributed large data collections. Data replication is one of the main mechanisms used in data grids whereby identical copies of data are generated and stored at various distributed sites to either improve data access performance or reliability or both. However, when data updates are allowed, it is a great challenge to simultaneously improve performance and reliability while ensuring data consistency of such huge and widely distributed data. In this paper, we address this problem. We propose a new quorum-based data replication protocol with the objectives of minimizing the data update cost, providing high availability and data consistency. We compare the proposed approach with two existing approaches using response time, data consistency, data availability, and communication costs. The results show that the proposed approach performs substantially better than the benchmark approaches.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Data in grid research deals with storage, replication, and management of large data sets in a distributed environment. The all-data-to-all-sites replication schemes, like Read-One Write-All (ROWA) and Tree Grid Structure (TGS), are the popular techniques in grid. However, these techniques have a weakness in data storage capacity and data access times. In this paper, we propose the all-data-to-some-sites scheme called the 'Neighbour Replication on Triangular Grid' (NRTG) technique. The proposed scheme minimises the storage capacity as well as data access time with high update availability. It also tolerates failures such as server and site failures.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Online social networks make it easier for people to find and communicate with other people based on shared interests, values, membership in particular groups, etc. Common social networks such as Facebook and Twitter have hundreds of millions or even billions of users scattered all around the world sharing interconnected data. Users demand low latency access to not only their own data but also theirfriends’ data, often very large, e.g. videos, pictures etc. However, social network service providers have a limited monetary capital to store every piece of data everywhere to minimise users’ data access latency. Geo-distributed cloud services with virtually unlimited capabilities are suitable for large scale social networks data storage in different geographical locations. Key problems including how to optimally store and replicate these huge datasets and how to distribute the requests to different datacenters are addressed in this paper. A novel genetic algorithm-based approach is used to find a near-optimal number of replicas for every user’s data and a near-optimal placement of replicas to minimise monetary cost while satisfying latency requirements for all users. Experiments on a large Facebook dataset demonstrate our technique’s effectiveness in outperforming other representative placement and replication strategies.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The main theme of this thesis is to allow the users of cloud services to outsource their data without the need to trust the cloud provider. The method is based on combining existing proof-of-storage schemes with distance-bounding protocols. Specifically, cloud customers will be able to verify the confidentiality, integrity, availability, fairness (or mutual non-repudiation), data freshness, geographic assurance and replication of their stored data directly, without having to rely on the word of the cloud provider.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Dispersing a data object into a set of data shares is an elemental stage in distributed communication and storage systems. In comparison to data replication, data dispersal with redundancy saves space and bandwidth. Moreover, dispersing a data object to distinct communication links or storage sites limits adversarial access to whole data and tolerates loss of a part of data shares. Existing data dispersal schemes have been proposed mostly based on various mathematical transformations on the data which induce high computation overhead. This paper presents a novel data dispersal scheme where each part of a data object is replicated, without encoding, into a subset of data shares according to combinatorial design theory. Particularly, data parts are mapped to points and data shares are mapped to lines of a projective plane. Data parts are then distributed to data shares using the point and line incidence relations in the plane so that certain subsets of data shares collectively possess all data parts. The presented scheme incorporates combinatorial design theory with inseparability transformation to achieve secure data dispersal at reduced computation, communication and storage costs. Rigorous formal analysis and experimental study demonstrate significant cost-benefits of the presented scheme in comparison to existing methods.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The availability of critical services and their data can be significantly increased by replicating them on multiple systems connected with each other, even in the face of system and network failures. In some platforms such as peer-to-peer (P2P) systems, their inherent characteristic mandates the employment of some form of replication to provide acceptable service to their users. However, the problem of how best to replicate data to build highly available peer-to-peer systems is still an open problem. In this paper, we propose an approach to address the data replication problem on P2P systems. The proposed scheme is compared with other techniques and is shown to require less communication cost for an operation as well as provide higher degree of data availability.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The widespread adoption of cluster computing as a high performance computing platform has seen the growth of data intensive scientific, engineering and commercial applications such as digital libraries, climate modeling, computational chemistry, computational fluid dynamics and image repositories. However, I/O subsystem performance has not been keeping pace with processor and memory performance, and is fast becoming the dominant factor in overall system performance.  Thus, parallel I/O has become a necessity in the face of performance improvements in other areas of computing systems. This paper addresses the problem of parallel I/O scheduling on cluster computing systems in the presence of data replication.  We propose two new I/O scheduling algorithms and evaluate the relative performance of the proposed policies against two existing approaches.  Simulation results show that the proposed policies perform substantially better than the baseline policies.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In data-intensive distributed systems, replication is the most widely used approach to offer high data availability, low bandwidth consumption, increased fault-tolerance and improved scalability of the overall system. Replication-based systems implement replica control protocols that enforce a specified semantics of accessing the data. Also, the performance depends on a number of factors, the chief of which is the protocol used to maintain consistency among object replica. In this paper, we propose a new low-cost and high data availability protocol called the box-shaped grid structure for maintaining consistency of replicated data on networked distributed computing systems. We show that the proposed protocol provides high data availability, low communication costs, and increased fault-tolerance as compared to the baseline replica control protocols.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Providing reliable and efficient services are primary goals in designing a web server system. Data replication can be used to improve the reliability of the system. However, mapping mechanism is one of the primary concerns to data replication. In this paper, we propose a mapping mechanism model called enhanced domain name server (E-DNS) that dispatches the user requests through the URL-name to IP-address under Neighbor Replica Distribution Technique (NRDT) to improve the reliability of the system.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Currently, many museums, botanic gardens and herbariums keep data of biological collections and using computational tools researchers digitalize and provide access to their data using data portals. The replication of databases in portals can be accomplished through the use of protocols and data schema. However, the implementation of this solution demands a large amount of time, concerning both the transfer of fragments of data and processing data within the portal. With the growth of data digitalization in institutions, this scenario tends to be increasingly exacerbated, making it hard to maintain the records updated on the portals. As an original contribution, this research proposes analysing the data replication process to evaluate the performance of portals. The Inter-American Biodiversity Information Network (IABIN) biodiversity data portal of pollinators was used as a study case, which supports both situations: conventional data replication of records of specimen occurrences and interactions between them. With the results of this research, it is possible to simulate a situation before its implementation, thus predicting the performance of replication operations. Additionally, these results may contribute to future improvements to this process, in order to decrease the time required to make the data available in portals. © Rinton Press.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Current scientific applications have been producing large amounts of data. The processing, handling and analysis of such data require large-scale computing infrastructures such as clusters and grids. In this area, studies aim at improving the performance of data-intensive applications by optimizing data accesses. In order to achieve this goal, distributed storage systems have been considering techniques of data replication, migration, distribution, and access parallelism. However, the main drawback of those studies is that they do not take into account application behavior to perform data access optimization. This limitation motivated this paper which applies strategies to support the online prediction of application behavior in order to optimize data access operations on distributed systems, without requiring any information on past executions. In order to accomplish such a goal, this approach organizes application behaviors as time series and, then, analyzes and classifies those series according to their properties. By knowing properties, the approach selects modeling techniques to represent series and perform predictions, which are, later on, used to optimize data access operations. This new approach was implemented and evaluated using the OptorSim simulator, sponsored by the LHC-CERN project and widely employed by the scientific community. Experiments confirm this new approach reduces application execution time in about 50 percent, specially when handling large amounts of data.