Robust Sketching and Aggregation of Distributed Data Streams


Autoria(s): Hadjieleftheriou, Marios; Byers, John W.; Kollios, John
Data(s)

20/10/2011

20/10/2011

16/03/2005

Resumo

The data streaming model provides an attractive framework for one-pass summarization of massive data sets at a single observation point. However, in an environment where multiple data streams arrive at a set of distributed observation points, sketches must be computed remotely and then must be aggregated through a hierarchy before queries may be conducted. As a result, many sketch-based methods for the single stream case do not apply directly, as either the error introduced becomes large, or because the methods assume that the streams are non-overlapping. These limitations hinder the application of these techniques to practical problems in network traffic monitoring and aggregation in sensor networks. To address this, we develop a general framework for evaluating and enabling robust computation of duplicate-sensitive aggregate functions (e.g., SUM and QUANTILE), over data produced by distributed sources. We instantiate our approach by augmenting the Count-Min and Quantile-Digest sketches to apply in this distributed setting, and analyze their performance. We conclude with experimental evaluation to validate our analysis.

Identificador

http://hdl.handle.net/2144/1838

Idioma(s)

en_US

Publicador

Boston University Computer Science Department

Relação

BUCS Technical Reports;BUCS-TR-2005-011

Tipo

Technical Report