Data-aware task scheduling for all-to-all comparison problems in heterogeneous distributed systems


Autoria(s): Zhang, Yi-Fan; Tian, Yu-Chu; Fidge, Colin; Kelly, Wayne
Data(s)

16/04/2016

Resumo

Solving large-scale all-to-all comparison problems using distributed computing is increasingly significant for various applications. Previous efforts to implement distributed all-to-all comparison frameworks have treated the two phases of data distribution and comparison task scheduling separately. This leads to high storage demands as well as poor data locality for the comparison tasks, thus creating a need to redistribute the data at runtime. Furthermore, most previous methods have been developed for homogeneous computing environments, so their overall performance is degraded even further when they are used in heterogeneous distributed systems. To tackle these challenges, this paper presents a data-aware task scheduling approach for solving all-to-all comparison problems in heterogeneous distributed systems. The approach formulates the requirements for data distribution and comparison task scheduling simultaneously as a constrained optimization problem. Then, metaheuristic data pre-scheduling and dynamic task scheduling strategies are developed along with an algorithmic implementation to solve the problem. The approach provides perfect data locality for all comparison tasks, avoiding rearrangement of data at runtime. It achieves load balancing among heterogeneous computing nodes, thus enhancing the overall computation time. It also reduces data storage requirements across the network. The effectiveness of the approach is demonstrated through experimental studies.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/94975/

Publicador

Elsevier

Relação

http://eprints.qut.edu.au/94975/1/hetero_improve_v25_Colin_withPubInfo.pdf

http://www.journals.elsevier.com/journal-of-parallel-and-distributed-computing/

DOI:10.1016/j.jpdc.2016.04.008

Zhang, Yi-Fan, Tian, Yu-Chu, Fidge, Colin, & Kelly, Wayne (2016) Data-aware task scheduling for all-to-all comparison problems in heterogeneous distributed systems. Journal of Parallel and Distributed Computing. (In Press)

Direitos

Copyright 2016 Elsevier

This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Fonte

School of Electrical Engineering & Computer Science; Science & Engineering Faculty

Palavras-Chave #080501 Distributed and Grid Systems #080599 Distributed Computing not elsewhere classified #Distributed computing #all-to-all comparison #data distribution #task scheduling #big data
Tipo

Journal Article