62 resultados para stream restoration


Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the explosion of big data, processing large numbers of continuous data streams, i.e., big data stream processing (BDSP), has become a crucial requirement for many scientific and industrial applications in recent years. By offering a pool of computation, communication and storage resources, public clouds, like Amazon's EC2, are undoubtedly the most efficient platforms to meet the ever-growing needs of BDSP. Public cloud service providers usually operate a number of geo-distributed datacenters across the globe. Different datacenter pairs are with different inter-datacenter network costs charged by Internet Service Providers (ISPs). While, inter-datacenter traffic in BDSP constitutes a large portion of a cloud provider's traffic demand over the Internet and incurs substantial communication cost, which may even become the dominant operational expenditure factor. As the datacenter resources are provided in a virtualized way, the virtual machines (VMs) for stream processing tasks can be freely deployed onto any datacenters, provided that the Service Level Agreement (SLA, e.g., quality-of-information) is obeyed. This raises the opportunity, but also a challenge, to explore the inter-datacenter network cost diversities to optimize both VM placement and load balancing towards network cost minimization with guaranteed SLA. In this paper, we first propose a general modeling framework that describes all representative inter-task relationship semantics in BDSP. Based on our novel framework, we then formulate the communication cost minimization problem for BDSP into a mixed-integer linear programming (MILP) problem and prove it to be NP-hard. We then propose a computation-efficient solution based on MILP. The high efficiency of our proposal is validated by extensive simulation based studies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data is becoming the world’s new natural resourceand big data use grows quickly. The trend of computingtechnology is that everything is merged into the Internet and‘big data’ are integrated to comprise completeinformation for collective intelligence. With the increasingsize of big data, refining big data themselves to reduce data sizewhile keeping critical data (or useful information) is a newapproach direction. In this paper, we provide a novel dataconsumption model, which separates the consumption of datafrom the raw data, and thus enable cloud computing for bigdata applications. We define a new Data-as-a-Product (DaaP)concept; a data product is a small sized summary of theoriginal data and can directly answer users’ queries. Thus, weseparate the mining of big data into two classes of processingmodules: the refine modules to change raw big data into smallsizeddata products, and application-oriented mining modulesto discover desired knowledge further for applications fromwell-defined data products. Our practices of mining big streamdata, including medical sensor stream data, streams of textdata and trajectory data, demonstrated the efficiency andprecision of our DaaP model for answering users’ queries