1 resultado para MapReduce
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
With the development of electronic devices, more and more mobile clients are connected to the Internet and they generate massive data every day. We live in an age of “Big Data”, and every day we generate hundreds of million magnitude data. By analyzing the data and making prediction, we can carry out better development plan. Unfortunately, traditional computation framework cannot meet the demand, so the Hadoop would be put forward. First the paper introduces the background and development status of Hadoop, compares the MapReduce in Hadoop 1.0 and YARN in Hadoop 2.0, and analyzes the advantages and disadvantages of them. Because the resource management module is the core role of YARN, so next the paper would research about the resource allocation module including the resource management, resource allocation algorithm, resource preemption model and the whole resource scheduling process from applying resource to finishing allocation. Also it would introduce the FIFO Scheduler, Capacity Scheduler, and Fair Scheduler and compare them. The main work has been done in this paper is researching and analyzing the Dominant Resource Fair algorithm of YARN, putting forward a maximum resource utilization algorithm based on Dominant Resource Fair algorithm. The paper also provides a suggestion to improve the unreasonable facts in resource preemption model. Emphasizing “fairness” during resource allocation is the core concept of Dominant Resource Fair algorithm of YARM. Because the cluster is multiple users and multiple resources, so the user’s resource request is multiple too. The DRF algorithm would divide the user’s resources into dominant resource and normal resource. For a user, the dominant resource is the one whose share is highest among all the request resources, others are normal resource. The DRF algorithm requires the dominant resource share of each user being equal. But for these cases where different users’ dominant resource amount differs greatly, emphasizing “fairness” is not suitable and can’t promote the resource utilization of the cluster. By analyzing these cases, this thesis puts forward a new allocation algorithm based on DRF. The new algorithm takes the “fairness” into consideration but not the main principle. Maximizing the resource utilization is the main principle and goal of the new algorithm. According to comparing the result of the DRF and new algorithm based on DRF, we found that the new algorithm has more high resource utilization than DRF. The last part of the thesis is to install the environment of YARN and use the Scheduler Load Simulator (SLS) to simulate the cluster environment.