Effective methods and strategies for massive small files processing based on Hadoop


Autoria(s): Xia, D.; Wang, B.; Rong, Z.; Li, Y.; Zhang, Zili
Data(s)

01/01/2014

Resumo

The Hadoop framework provides a powerful way to handle Big Data. Since Hadoop has inherent defects of high memory overhead and low computing performance in processing massive small files, we implement three methods and propose two strategies for solving small files problem in this paper. First, we implement three methods, i.e., Hadoop Archives (HAR), Sequence Files (SF) and CombineFileInputFormat (CFIF), to compensate the existing defects of Hadoop. Moreover, we propose two strategies for meeting the actual needs of different users. Finally, we evaluate the efficiency of the implemented methods and the validity of the proposed strategies. The experimental results show that our methods and strategies can improve the efficiency of massive small files processing, thereby enhancing the overall performance of Hadoop. © 2014 ISSN 1881-803X.

Identificador

http://hdl.handle.net/10536/DRO/DU:30072604

Idioma(s)

eng

Publicador

ICIC International

Relação

http://dro.deakin.edu.au/eserv/DU:30072604/xia-effectivemethodsand-2014.pdf

http://dro.deakin.edu.au/eserv/DU:30072604/zhang-effectivemethods-evid-2014.pdf

Direitos

2014, ICIC International

Palavras-Chave #Big Data #Hadoop distributed file system (HDFS) #Hadoop mapReduce #Small files problem
Tipo

Journal Article