Biblioteca Digital

**Autoria(s):** Xia, D.; Wang, B.; Rong, Z.; Li, Y.; Zhang, Zili
Data(s)	01/01/2014
Resumo	The Hadoop framework provides a powerful way to handle Big Data. Since Hadoop has inherent defects of high memory overhead and low computing performance in processing massive small files, we implement three methods and propose two strategies for solving small files problem in this paper. First, we implement three methods, i.e., Hadoop Archives (HAR), Sequence Files (SF) and CombineFileInputFormat (CFIF), to compensate the existing defects of Hadoop. Moreover, we propose two strategies for meeting the actual needs of different users. Finally, we evaluate the efficiency of the implemented methods and the validity of the proposed strategies. The experimental results show that our methods and strategies can improve the efficiency of massive small files processing, thereby enhancing the overall performance of Hadoop. © 2014 ISSN 1881-803X.
Identificador	http://hdl.handle.net/10536/DRO/DU:30072604
Idioma(s)	eng
Publicador	ICIC International
Relação	http://dro.deakin.edu.au/eserv/DU:30072604/xia-effectivemethodsand-2014.pdf http://dro.deakin.edu.au/eserv/DU:30072604/zhang-effectivemethods-evid-2014.pdf
Direitos	2014, ICIC International
Palavras-Chave	#Big Data #Hadoop distributed file system (HDFS) #Hadoop mapReduce #Small files problem
Tipo	Journal Article

Acesso ao item digital