977 resultados para Hadoop distributed file system (HDFS)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a prototype implementation of a Distributed File System (DFS) based on the Adaptive Information Dispersal Algorithm (AIDA). Using AIDA, a file block is encoded and dispersed into smaller blocks stored on a number of DFS nodes distributed over a network. The implementation devises file creation, read, and write operations. In particular, when reading a file, the DFS accepts an optional timing constraint, which it uses to determine the level of redundancy needed for the read operation. The tighter the timing constraint, the more nodes in the DFS are queried for encoded blocks. Write operations update all blocks in all DFS nodes--with future implementations possibly including the use of read and write quorums. This work was conducted under the supervision of Professor Azer Bestavros (best@cs.bu.edu) in the Computer Science Department as part of Mohammad Makarechian's Master's project.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Hadoop framework provides a powerful way to handle Big Data. Since Hadoop has inherent defects of high memory overhead and low computing performance in processing massive small files, we implement three methods and propose two strategies for solving small files problem in this paper. First, we implement three methods, i.e., Hadoop Archives (HAR), Sequence Files (SF) and CombineFileInputFormat (CFIF), to compensate the existing defects of Hadoop. Moreover, we propose two strategies for meeting the actual needs of different users. Finally, we evaluate the efficiency of the implemented methods and the validity of the proposed strategies. The experimental results show that our methods and strategies can improve the efficiency of massive small files processing, thereby enhancing the overall performance of Hadoop. © 2014 ISSN 1881-803X.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hundreds of Terabytes of CMS (Compact Muon Solenoid) data are being accumulated for storage day by day at the University of Nebraska-Lincoln, which is one of the eight US CMS Tier-2 sites. Managing this data includes retaining useful CMS data sets and clearing storage space for newly arriving data by deleting less useful data sets. This is an important task that is currently being done manually and it requires a large amount of time. The overall objective of this study was to develop a methodology to help identify the data sets to be deleted when there is a requirement for storage space. CMS data is stored using HDFS (Hadoop Distributed File System). HDFS logs give information regarding file access operations. Hadoop MapReduce was used to feed information in these logs to Support Vector Machines (SVMs), a machine learning algorithm applicable to classification and regression which is used in this Thesis to develop a classifier. Time elapsed in data set classification by this method is dependent on the size of the input HDFS log file since the algorithmic complexities of Hadoop MapReduce algorithms here are O(n). The SVM methodology produces a list of data sets for deletion along with their respective sizes. This methodology was also compared with a heuristic called Retention Cost which was calculated using size of the data set and the time since its last access to help decide how useful a data set is. Accuracies of both were compared by calculating the percentage of data sets predicted for deletion which were accessed at a later instance of time. Our methodology using SVMs proved to be more accurate than using the Retention Cost heuristic. This methodology could be used to solve similar problems involving other large data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In order to simplify computer management, several system administrators are adopting advanced techniques to manage software configuration of enterprise computer networks, but the tight coupling between hardware and software makes every PC an individual managed entity, lowering the scalability and increasing the costs to manage hundreds or thousands of PCs. Virtualization is an established technology, however its use is been more focused on server consolidation and virtual desktop infrastructure, not for managing distributed computers over a network. This paper discusses the feasibility of the Distributed Virtual Machine Environment, a new approach for enterprise computer management that combines virtualization and distributed system architecture as the basis of the management architecture. © 2008 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

During the last decades, we assisted to what is called “information explosion”. With the advent of the new technologies and new contexts, the volume, velocity and variety of data has increased exponentially, becoming what is known today as big data. Among them, we emphasize telecommunications operators, which gather, using network monitoring equipment, millions of network event records, the Call Detail Records (CDRs) and the Event Detail Records (EDRs), commonly known as xDRs. These records are stored and later processed to compute network performance and quality of service metrics. With the ever increasing number of collected xDRs, its generated volume needing to be stored has increased exponentially, making the current solutions based on relational databases not suited anymore. To tackle this problem, the relational data store can be replaced by Hadoop File System (HDFS). However, HDFS is simply a distributed file system, this way not supporting any aspect of the relational paradigm. To overcome this difficulty, this paper presents a framework that enables the current systems inserting data into relational databases, to keep doing it transparently when migrating to Hadoop. As proof of concept, the developed platform was integrated with the Altaia - a performance and QoS management of telecommunications networks and services.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present BitWorker, a platform for community distributed computing based on BitTorrent. Any splittable task can be easily specified by a user in a meta-information task file, such that it can be downloaded and performed by other volunteers. Peers find each other using Distributed Hash Tables, download existing results, and compute missing ones. Unlike existing distributed computing schemes relying on centralized coordination point(s), our scheme is totally distributed, therefore, highly robust. We evaluate the performance of BitWorker using mathematical models and real tests, showing processing and robustness gains. BitWorker is available for download and use by the community.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The paper proposes a solution for testing of a physical distributed generation system (DGs) along with a computer simulated network. The computer simulated network is referred as the virtual grid in this paper. Integration of DG with the virtual grid provides broad area of testing of power supplying capability and dynamic performance of a DG. It is shown that a DG can supply a part of load power while keeping Point of Common Coupling (PCC) voltage magnitude constant. To represent the actual load, a universal load along with power regenerative capability is designed with the help of voltage source converter (VSC) that mimics the load characteristic. The overall performance of the proposed scheme is verified using computer simulation studies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Distributed generation (DG) systems are usually connected to the grid using power electronic converters. Power delivered from such DG sources depends on factors like energy availability and load demand. The converters used in power conversion do not operate with their full capacity all the time. The unused or remaining capacity of the converters could be used to provide some ancillary functions like harmonic and unbalance mitigation of the power distribution system. As some of these DG sources have wide operating ranges, they need special power converters for grid interfacing. Being a single-stage buck-boost inverter, recently proposed Z-source inverter (ZSI) is a good candidate for future DG systems. This paper presents a controller design for a ZSI-based DG system to improve power quality of distribution systems. The proposed control method is tested with simulation results obtained using Matlab/Simulink/PLECS and subsequently it is experimentally validated using a laboratory prototype.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Onboard spacecraft computing system is a case of a functionally distributed system that requires continuous interaction among the nodes to control the operations at different nodes. A simple and reliable protocol is desired for such an application. This paper discusses a formal approach to specify the computing system with respect to some important issues encountered in the design and development of a protocol for the onboard distributed system. The issues considered in this paper are concurrency, exclusiveness and sequencing relationships among the various processes at different nodes. A 6-tuple model is developed for the precise specification of the system. The model also enables us to check the consistency of specification and deadlock caused due to improper specification. An example is given to illustrate the use of the proposed methodology for a typical spacecraft configuration. Although the theory is motivated by a specific application the same may be applied to other distributed computing system such as those encountered in process control industries, power plant control and other similar environments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The effect of antenna separation in a 3×3 MIMO system using RoF DAS technology is investigated. Larger antenna separation is found to improve the throughput due to reduced channel correlation and improved SNR. © 2011 Optical Society of America.