Robust Sketching and Aggregation of Distributed Data Streams
Data(s) |
20/10/2011
20/10/2011
16/03/2005
|
---|---|
Resumo |
The data streaming model provides an attractive framework for one-pass summarization of massive data sets at a single observation point. However, in an environment where multiple data streams arrive at a set of distributed observation points, sketches must be computed remotely and then must be aggregated through a hierarchy before queries may be conducted. As a result, many sketch-based methods for the single stream case do not apply directly, as either the error introduced becomes large, or because the methods assume that the streams are non-overlapping. These limitations hinder the application of these techniques to practical problems in network traffic monitoring and aggregation in sensor networks. To address this, we develop a general framework for evaluating and enabling robust computation of duplicate-sensitive aggregate functions (e.g., SUM and QUANTILE), over data produced by distributed sources. We instantiate our approach by augmenting the Count-Min and Quantile-Digest sketches to apply in this distributed setting, and analyze their performance. We conclude with experimental evaluation to validate our analysis. |
Identificador | |
Idioma(s) |
en_US |
Publicador |
Boston University Computer Science Department |
Relação |
BUCS Technical Reports;BUCS-TR-2005-011 |
Tipo |
Technical Report |