3 resultados para Zipf

em Boston University Digital Common


Relevância:

10.00% 10.00%

Publicador:

Resumo:

The explosion of WWW traffic necessitates an accurate picture of WWW use, and in particular requires a good understanding of client requests for WWW documents. To address this need, we have collected traces of actual executions of NCSA Mosaic, reflecting over half a million user requests for WWW documents. In this paper we describe the methods we used to collect our traces, and the formats of the collected data. Next, we present a descriptive statistical summary of the traces we collected, which identifies a number of trends and reference patterns in WWW use. In particular, we show that many characteristics of WWW use can be modelled using power-law distributions, including the distribution of document sizes, the popularity of documents as a function of size, the distribution of user requests for documents, and the number of references to documents as a function of their overall rank in popularity (Zipf's law). Finally, we show how the power-law distributions derived from our traces can be used to guide system designers interested in caching WWW documents.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present what we believe to be the first thorough characterization of live streaming media content delivered over the Internet. Our characterization of over five million requests spanning a 28-day period is done at three increasingly granular levels, corresponding to clients, sessions, and transfers. Our findings support two important conclusions. First, we show that the nature of interactions between users and objects is fundamentally different for live versus stored objects. Access to stored objects is user driven, whereas access to live objects is object driven. This reversal of active/passive roles of users and objects leads to interesting dualities. For instance, our analysis underscores a Zipf-like profile for user interest in a given object, which is to be contrasted to the classic Zipf-like popularity of objects for a given user. Also, our analysis reveals that transfer lengths are highly variable and that this variability is due to the stickiness of clients to a particular live object, as opposed to structural (size) properties of objects. Second, based on observations we make, we conjecture that the particular characteristics of live media access workloads are likely to be highly dependent on the nature of the live content being accessed. In our study, this dependence is clear from the strong temporal correlations we observed in the traces, which we attribute to the synchronizing impact of live content on access characteristics. Based on our analyses, we present a model for live media workload generation that incorporates many of our findings, and which we implement in GISMO [19].

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Temporal locality of reference in Web request streams emerges from two distinct phenomena: the popularity of Web objects and the {\em temporal correlation} of requests. Capturing these two elements of temporal locality is important because it enables cache replacement policies to adjust how they capitalize on temporal locality based on the relative prevalence of these phenomena. In this paper, we show that temporal locality metrics proposed in the literature are unable to delineate between these two sources of temporal locality. In particular, we show that the commonly-used distribution of reference interarrival times is predominantly determined by the power law governing the popularity of documents in a request stream. To capture (and more importantly quantify) both sources of temporal locality in a request stream, we propose a new and robust metric that enables accurate delineation between locality due to popularity and that due to temporal correlation. Using this metric, we characterize the locality of reference in a number of representative proxy cache traces. Our findings show that there are measurable differences between the degrees (and sources) of temporal locality across these traces, and that these differences are effectively captured using our proposed metric. We illustrate the significance of our findings by summarizing the performance of a novel Web cache replacement policy---called GreedyDual*---which exploits both long-term popularity and short-term temporal correlation in an adaptive fashion. Our trace-driven simulation experiments (which are detailed in an accompanying Technical Report) show the superior performance of GreedyDual* when compared to other Web cache replacement policies.