A general communication cost optimization framework for big data stream processing in geo-distributed data centers


Autoria(s): Gu, Lin; Zeng, Deze; Guo, Song; Xiang, Yong; Hu, Jiankun
Data(s)

01/01/2016

Resumo

With the explosion of big data, processing large numbers of continuous data streams, i.e., big data stream processing (BDSP), has become a crucial requirement for many scientific and industrial applications in recent years. By offering a pool of computation, communication and storage resources, public clouds, like Amazon's EC2, are undoubtedly the most efficient platforms to meet the ever-growing needs of BDSP. Public cloud service providers usually operate a number of geo-distributed datacenters across the globe. Different datacenter pairs are with different inter-datacenter network costs charged by Internet Service Providers (ISPs). While, inter-datacenter traffic in BDSP constitutes a large portion of a cloud provider's traffic demand over the Internet and incurs substantial communication cost, which may even become the dominant operational expenditure factor. As the datacenter resources are provided in a virtualized way, the virtual machines (VMs) for stream processing tasks can be freely deployed onto any datacenters, provided that the Service Level Agreement (SLA, e.g., quality-of-information) is obeyed. This raises the opportunity, but also a challenge, to explore the inter-datacenter network cost diversities to optimize both VM placement and load balancing towards network cost minimization with guaranteed SLA. In this paper, we first propose a general modeling framework that describes all representative inter-task relationship semantics in BDSP. Based on our novel framework, we then formulate the communication cost minimization problem for BDSP into a mixed-integer linear programming (MILP) problem and prove it to be NP-hard. We then propose a computation-efficient solution based on MILP. The high efficiency of our proposal is validated by extensive simulation based studies.

Identificador

http://hdl.handle.net/10536/DRO/DU:30080752

Idioma(s)

eng

Publicador

IEEE

Relação

http://dro.deakin.edu.au/eserv/DU:30080752/xiang-ageneralcommunication-2016.pdf

http://www.dx.doi.org/10.1109/TC.2015.2417566

Direitos

2016, IEEE

Palavras-Chave #Science & Technology #Technology #Computer Science, Hardware & Architecture #Engineering, Electrical & Electronic #Computer Science #Engineering #Big data #stream processing #network cost minimization #VM placement #geo-distributed data centers #VIRTUAL MACHINE PLACEMENT #EFFICIENCY
Tipo

Journal Article