A MapReduce-based nearest neighbor approach for big-data-driven traffic flow prediction


Autoria(s): Xia, Dawen; Li, Huaqing; Wang, Binfeng; Li, Yantao; Zhang, Zili
Data(s)

01/01/2016

Resumo

In big-data-driven traffic flow prediction systems, the robustness of prediction performance depends on accuracy and timeliness. This paper presents a new MapReduce-based nearest neighbor (NN) approach for traffic flow prediction using correlation analysis (TFPC) on a Hadoop platform. In particular, we develop a real-time prediction system including two key modules, i.e., offline distributed training (ODT) and online parallel prediction (OPP). Moreover, we build a parallel k-nearest neighbor optimization classifier, which incorporates correlation information among traffic flows into the classification process. Finally, we propose a novel prediction calculation method, combining the current data observed in OPP and the classification results obtained from large-scale historical data in ODT, to generate traffic flow prediction in real time. The empirical study on real-world traffic flow big data using the leave-one-out cross validation method shows that TFPC significantly outperforms four state-of-the-art prediction approaches, i.e., autoregressive integrated moving average, Naïve Bayes, multilayer perceptron neural networks, and NN regression, in terms of accuracy, which can be improved 90.07% in the best case, with an average mean absolute percent error of 5.53%. In addition, it displays excellent speedup, scaleup, and sizeup.

Identificador

http://hdl.handle.net/10536/DRO/DU:30085586

Idioma(s)

eng

Publicador

IEEE

Relação

http://dro.deakin.edu.au/eserv/DU:30085586/zhang-amapreducebased-2016.pdf

http://www.dx.doi.org/10.1109/ACCESS.2016.2570021

Direitos

2016, IEEE

Palavras-Chave #Big data analytics #traf c ow prediction #correlation analysis #parallel classi er #Hadoop MapReduce
Tipo

Journal Article