104 resultados para Hadoop distributed file system (HDFS)

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Hadoop framework provides a powerful way to handle Big Data. Since Hadoop has inherent defects of high memory overhead and low computing performance in processing massive small files, we implement three methods and propose two strategies for solving small files problem in this paper. First, we implement three methods, i.e., Hadoop Archives (HAR), Sequence Files (SF) and CombineFileInputFormat (CFIF), to compensate the existing defects of Hadoop. Moreover, we propose two strategies for meeting the actual needs of different users. Finally, we evaluate the efficiency of the implemented methods and the validity of the proposed strategies. The experimental results show that our methods and strategies can improve the efficiency of massive small files processing, thereby enhancing the overall performance of Hadoop. © 2014 ISSN 1881-803X.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last a few years a number of highly publicized incidents of Distributed Denial of Service (DDoS) attacks against high-profile government and commercial websites have made people aware of the importance of providing data and services security to users. A DDoS attack is an availability attack, which is characterized by an explicit attempt from an attacker to prevent legitimate users of a service from using the desired resources. This paper introduces the vulnerability of web applications to DDoS attacks, and presents an active distributed defense system that has a deployment mixture of sub-systems to protect web applications from DDoS attacks. According to the simulation experiments, this system is effective in that it is able to defend web applications against attacks. It can avoid overall network congestion and provide more resources to legitimate web users.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A new design method for a distributed power system stabiliser for interconnected power systems is introduced in this paper. The stabiliser is of a low order, dynamic and robust. To generate the required local control signals, each local stabiliser requires information about either the rotor speed or the load angle of the other subsystems. A simple MATLAB based design algorithm is given and used on a three-machine unstable power system. The resulting stabiliser is simulated and sample results are presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A new design method for a distributed power system stabiliser for interconnected power systems is introduced in this paper. The stabiliser is of a low order, dynamic and robust. To generate the required local control signals, each local stabiliser requires information about either the rotor speed or the load angle of the other subsystems. A simple MATLAB based design algorithm is given and used on a three-machine unstable power system. The resulting stabiliser is simulated and sample results are presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Distributed Shared Memory (DSM) provides programmers with a shared memory environment in systems where memory is not physically shared. Clusters of Workstations (COWs), an often untapped source of computing power, are characterised by a very low cost/performance ratio. The combination of Clusters of Workstations (COWs) with DSM provides an environment in which the programmer can use the well known approaches and methods of programming for physically shared memory systems and parallel processing can be carried out to make full use of the computing power and cost advantages of the COW. The aim of this research is to synthesise and develop a distributed shared memory system as an integral part of an operating system in order to provide application programmers with a convenient environment in which the development and execution of parallel applications can be done easily and efficiently, and which does this in a transparent manner. Furthermore, in order to satisfy our challenging design requirements we want to demonstrate that the operating system into which the DSM system is integrated should be a distributed operating system. In this thesis a study into the synthesis of a DSM system within a microkernel and client-server based distributed operating system which uses both strict and weak consistency models, with a write-invalidate and write-update based approach for consistency maintenance is reported. Furthermore a unique automatic initialisation system which allows the programmer to start the parallel execution of a group of processes with a single library call is reported. The number and location of these processes are determined by the operating system based on system load information. The DSM system proposed has a novel approach in that it provides programmers with a complete programming environment in which they are easily able to develop and run their code or indeed run existing shared memory code. A set of demanding DSM system design requirements are presented and the incentives for the placement of the DSM system with a distributed operating system and in particular in the memory management server have been reported. The new DSM system concentrated on an event-driven set of cooperating and distributed entities, and a detailed description of the events and reactions to these events that make up the operation of the DSM system is then presented. This is followed by a pseudocode form of the detailed design of the main modules and activities of the primitives used in the proposed DSM system. Quantitative results of performance tests and qualitative results showing the ease of programming and use of the RHODOS DSM system are reported. A study of five different application is given and the results of tests carried out on these applications together with a discussion of the results are given. A discussion of how RHODOS’ DSM allows programmers to write shared memory code in an easy to use and familiar environment and a comparative evaluation of RHODOS DSM with other DSM systems is presented. In particular, the ease of use and transparency of the DSM system have been demonstrated through the description of the ease with which a moderately inexperienced undergraduate programmer was able to convert, write and run applications for the testing of the DSM system. Furthermore, the description of the tests performed using physically shared memory shows that the latter is indistinguishable from distributed shared memory; this is further evidence that the DSM system is fully transparent. This study clearly demonstrates that the aim of the research has been achieved; it is possible to develop a programmer friendly and efficient DSM system fully integrated within a distributed operating system. It is clear from this research that client-server and microkernel based distributed operating system integrated DSM makes shared memory operations transparent and almost completely removes the involvement of the programmer beyond classical activities needed to deal with shared memory. The conclusion can be drawn that DSM, when implemented within a client-server and microkernel based distributed operating system, is one of the most encouraging approaches to parallel processing since it guarantees performance improvements with minimal programmer involvement.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In autonomously managed distributed systems for collaboration, provenance can facilitate reuse of information that are interchanged, repetition of successful experiments, or to provide evidence for trust mechanisms that certain information existed at a certain period during collaboration. In this paper, we propose domain independent information provenance architecture for open collaborative distributed systems. The proposed system uses XML for interchanging information and RDF to track information provenance. The use of XML and RDF also ensures that information is universally acceptable even among heterogeneous nodes. Our proposed information provenance model can work on any operating systems or workflows.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The goal of this paper is to present the experiences gained over 15 years of research into the design and development of a services-based distributed operating system. The lessons learnt over this period, we hope, will be of value to researchers involved in the design and development of operating systems that wish to harness the collective resources of ever-expanding distributed systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we discuss the design aspects of a dynamic distributed directory scheme (DDS) to facilitate efficient and transparent access to information files in mobile environments. The proposed directory interface enables users of mobile computers to view a distributed file system on a network of computers as a globally shared file system. In order to counter some of the limitations of wireless communications, we propose improvised invalidation schemes that avoid false sharing and ensure uninterrupted usage under disconnected and low bandwidth conditions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Peer-to-peer (P2P) networks are gaining increased attention from both the scientific community and the larger Internet user community. Data retrieval algorithms lie at the center of P2P networks, and this paper addresses the problem of efficiently searching for files in unstructured P2P systems. We propose an Improved Adaptive Probabilistic Search (IAPS) algorithm that is fully distributed and bandwidth efficient. IAPS uses ant-colony optimization and takes file types into consideration in order to search for file container nodes with a high probability of success. We have performed extensive simulations to study the performance of IAPS, and we compare it with the Random Walk and Adaptive Probabilistic Search algorithms. Our experimental results show that IAPS achieves high success rates, high response rates, and significant message reduction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data deduplication is a technique for eliminating duplicate copies of data, and has been widely used in cloud storage to reduce storage space and upload bandwidth. However, there is only one copy for each file stored in cloud even if such a file is owned by a huge number of users. As a result, deduplication system improves storage utilization while reducing reliability. Furthermore, the challenge of privacy for sensitive data also arises when they are outsourced by users to cloud. Aiming to address the above security challenges, this paper makes the first attempt to formalize the notion of distributed reliable deduplication system. We propose new distributed deduplication systems with higher reliability in which the data chunks are distributed across multiple cloud servers. The security requirements of data confidentiality and tag consistency are also achieved by introducing a deterministic secret sharing scheme in distributed storage systems, instead of using convergent encryption as in previous deduplication systems. Security analysis demonstrates that our deduplication systems are secure in terms of the definitions specified in the proposed security model. As a proof of concept, we implement the proposed systems and demonstrate that the incurred overhead is very limited in realistic environments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traffic subarea division is vital for traffic system management and traffic network analysis in intelligent transportation systems (ITSs). Since existing methods may not be suitable for big traffic data processing, this paper presents a MapReduce-based Parallel Three-Phase K -Means (Par3PKM) algorithm for solving traffic subarea division problem on a widely adopted Hadoop distributed computing platform. Specifically, we first modify the distance metric and initialization strategy of K -Means and then employ a MapReduce paradigm to redesign the optimized K -Means algorithm for parallel clustering of large-scale taxi trajectories. Moreover, we propose a boundary identifying method to connect the borders of clustering results for each cluster. Finally, we divide traffic subarea of Beijing based on real-world trajectory data sets generated by 12,000 taxis in a period of one month using the proposed approach. Experimental evaluation results indicate that when compared with K -Means, Par2PK-Means, and ParCLARA, Par3PKM achieves higher efficiency, more accuracy, and better scalability and can effectively divide traffic subarea with big taxi trajectory data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Present operating systems are not built to support parallel computing––they do not provide services to manage parallelism, i.e., to globally manage parallel processes and computational resources. The cluster operating environments that are used to assist the execution of parallel applications do not provide support for both programming paradigms, message passing (MP) or distributed shared memory (DSM)––they are mainly offered as separate components implemented at the user level as library and independent server processes. Due to poor operating systems users must deal with clusters as a set of independent computers rather than to see this cluster as a single powerful computer. A single system image (SSI) of the cluster is not offered to users. There is a need for an operating system for clusters. We claim and demonstrate in this paper that it is possible to develop a cluster operating system that is able to efficiently manage parallelism; use cluster resources efficiently; support MP in the form of standard MP and PVM, and DSM; offer SSI; and make it easy to use. We show that to achieve these aims this operating system should inherit many features of a distributed operating system and provide new services which address the needs of parallel processes, cluster's resources, and application developers. In order to substantiate the claim the first version of a cluster operating system managing parallelism and offering SSI, called GENESIS, has been developed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper analyses update ordering and its impact on the performance of a distributed replication system. We propose a model for update orderings and constraints and develop a number of algorithms for implementing different ordering constraints. A performance study is then carried out to analyse the update-ordering model. We show that our model allows the definition of an ordering constraint on each update operation, and the ordering implementation takes account of detailed inter-operation semantics denoted by commutative operations and causal operations to reduce unnecessary delay and results in a better response time for update requests.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Distributed defense is a promising way to neutralize the distributed Denial-of-Service attacks by detecting and responding the attacking sources widespread around the Internet. Components of the distributed defense system will cooperate with each other to combat the attacks. Compared with the centralized defense systems, distributed defense systems can discover the attacks more timely from both source end and victim end, fight the attacks with more resources and take advantage of more flexible strategies. This paper investigates 7 distributed defense systems which make use of various strategies to mitigate the DDoS attacks. Different architectures are designed in these 7 systems to provide distributed DDoS defense solutions. We evaluate these systems in terms of deployment, detection, response, security, robustness and implementation. For each criteria, we give a recommendation on which technologies are best suitable for a successful distributed defense system based on the analysis result. Finally we propose our idea on the design of an effective distributed defense system.