Effective methods and strategies for massive small files processing based on Hadoop
Data(s) |
01/01/2014
|
---|---|
Resumo |
The Hadoop framework provides a powerful way to handle Big Data. Since Hadoop has inherent defects of high memory overhead and low computing performance in processing massive small files, we implement three methods and propose two strategies for solving small files problem in this paper. First, we implement three methods, i.e., Hadoop Archives (HAR), Sequence Files (SF) and CombineFileInputFormat (CFIF), to compensate the existing defects of Hadoop. Moreover, we propose two strategies for meeting the actual needs of different users. Finally, we evaluate the efficiency of the implemented methods and the validity of the proposed strategies. The experimental results show that our methods and strategies can improve the efficiency of massive small files processing, thereby enhancing the overall performance of Hadoop. © 2014 ISSN 1881-803X. |
Identificador | |
Idioma(s) |
eng |
Publicador |
ICIC International |
Relação |
http://dro.deakin.edu.au/eserv/DU:30072604/xia-effectivemethodsand-2014.pdf http://dro.deakin.edu.au/eserv/DU:30072604/zhang-effectivemethods-evid-2014.pdf |
Direitos |
2014, ICIC International |
Palavras-Chave | #Big Data #Hadoop distributed file system (HDFS) #Hadoop mapReduce #Small files problem |
Tipo |
Journal Article |