6 resultados para File processing (Computer science)

em Boston University Digital Common


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In a recent paper (Changes in Web Client Access Patterns: Characteristics and Caching Implications by Barford, Bestavros, Bradley, and Crovella) we performed a variety of analyses upon user traces collected in the Boston University Computer Science department in 1995 and 1998. A sanitized version of the 1995 trace has been publicly available for some time; the 1998 trace has now been sanitized, and is available from: http://www.cs.bu.edu/techreports/1999-011-usertrace-98.gz ftp://ftp.cs.bu.edu/techreports/1999-011-usertrace-98.gz This memo discusses the format of this public version of the log, and includes additional discussion of how the data was collected, how the log was sanitized, what this log is and is not useful for, and areas of potential future research interest.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a prototype implementation of a Distributed File System (DFS) based on the Adaptive Information Dispersal Algorithm (AIDA). Using AIDA, a file block is encoded and dispersed into smaller blocks stored on a number of DFS nodes distributed over a network. The implementation devises file creation, read, and write operations. In particular, when reading a file, the DFS accepts an optional timing constraint, which it uses to determine the level of redundancy needed for the read operation. The tighter the timing constraint, the more nodes in the DFS are queried for encoded blocks. Write operations update all blocks in all DFS nodes--with future implementations possibly including the use of read and write quorums. This work was conducted under the supervision of Professor Azer Bestavros (best@cs.bu.edu) in the Computer Science Department as part of Mohammad Makarechian's Master's project.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We investigate adaptive buffer management techniques for approximate evaluation of sliding window joins over multiple data streams. In many applications, data stream processing systems have limited memory or have to deal with very high speed data streams. In both cases, computing the exact results of joins between these streams may not be feasible, mainly because the buffers used to compute the joins contain much smaller number of tuples than the tuples contained in the sliding windows. Therefore, a stream buffer management policy is needed in that case. We show that the buffer replacement policy is an important determinant of the quality of the produced results. To that end, we propose GreedyDual-Join (GDJ) an adaptive and locality-aware buffering technique for managing these buffers. GDJ exploits the temporal correlations (at both long and short time scales), which we found to be prevalent in many real data streams. We note that our algorithm is readily applicable to multiple data streams and multiple joins and requires almost no additional system resources. We report results of an experimental study using both synthetic and real-world data sets. Our results demonstrate the superiority and flexibility of our approach when contrasted to other recently proposed techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent measurements of local-area and wide-area traffic have shown that network traffic exhibits variability at a wide range of scales self-similarity. In this paper, we examine a mechanism that gives rise to self-similar network traffic and present some of its performance implications. The mechanism we study is the transfer of files or messages whose size is drawn from a heavy-tailed distribution. We examine its effects through detailed transport-level simulations of multiple TCP streams in an internetwork. First, we show that in a "realistic" client/server network environment i.e., one with bounded resources and coupling among traffic sources competing for resources the degree to which file sizes are heavy-tailed can directly determine the degree of traffic self-similarity at the link level. We show that this causal relationship is not significantly affected by changes in network resources (bottleneck bandwidth and buffer capacity), network topology, the influence of cross-traffic, or the distribution of interarrival times. Second, we show that properties of the transport layer play an important role in preserving and modulating this relationship. In particular, the reliable transmission and flow control mechanisms of TCP (Reno, Tahoe, or Vegas) serve to maintain the long-range dependency structure induced by heavy-tailed file size distributions. In contrast, if a non-flow-controlled and unreliable (UDP-based) transport protocol is used, the resulting traffic shows little self-similar characteristics: although still bursty at short time scales, it has little long-range dependence. If flow-controlled, unreliable transport is employed, the degree of traffic self-similarity is positively correlated with the degree of throttling at the source. Third, in exploring the relationship between file sizes, transport protocols, and self-similarity, we are also able to show some of the performance implications of self-similarity. We present data on the relationship between traffic self-similarity and network performance as captured by performance measures including packet loss rate, retransmission rate, and queueing delay. Increased self-similarity, as expected, results in degradation of performance. Queueing delay, in particular, exhibits a drastic increase with increasing self-similarity. Throughput-related measures such as packet loss and retransmission rate, however, increase only gradually with increasing traffic self-similarity as long as reliable, flow-controlled transport protocol is used.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A foundational issue underlying many overlay network applications ranging from routing to P2P file sharing is that of connectivity management, i.e., folding new arrivals into the existing mesh and re-wiring to cope with changing network conditions. Previous work has considered the problem from two perspectives: devising practical heuristics for specific applications designed to work well in real deployments, and providing abstractions for the underlying problem that are tractable to address via theoretical analyses, especially game-theoretic analysis. Our work unifies these two thrusts first by distilling insights gleaned from clean theoretical models, notably that under natural resource constraints, selfish players can select neighbors so as to efficiently reach near-equilibria that also provide high global performance. Using Egoist, a prototype overlay routing system we implemented on PlanetLab, we demonstrate that our neighbor selection primitives significantly outperform existing heuristics on a variety of performance metrics; that Egoist is competitive with an optimal, but unscalable full-mesh approach; and that it remains highly effective under significant churn. We also describe variants of Egoist's current design that would enable it to scale to overlays of much larger scale and allow it to cater effectively to applications, such as P2P file sharing in unstructured overlays, based on the use of primitives such as scoped-flooding rather than routing.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Personal communication devices are increasingly equipped with sensors that are able to collect and locally store information from their environs. The mobility of users carrying such devices, and hence the mobility of sensor readings in space and time, opens new horizons for interesting applications. In particular, we envision a system in which the collective sensing, storage and communication resources, and mobility of these devices could be leveraged to query the state of (possibly remote) neighborhoods. Such queries would have spatio-temporal constraints which must be met for the query answers to be useful. Using a simplified mobility model, we analytically quantify the benefits from cooperation (in terms of the system's ability to satisfy spatio-temporal constraints), which we show to go beyond simple space-time tradeoffs. In managing the limited storage resources of such cooperative systems, the goal should be to minimize the number of unsatisfiable spatio-temporal constraints. We show that Data Centric Storage (DCS), or "directed placement", is a viable approach for achieving this goal, but only when the underlying network is well connected. Alternatively, we propose, "amorphous placement", in which sensory samples are cached locally, and shuffling of cached samples is used to diffuse the sensory data throughout the whole network. We evaluate conditions under which directed versus amorphous placement strategies would be more efficient. These results lead us to propose a hybrid placement strategy, in which the spatio-temporal constraints associated with a sensory data type determine the most appropriate placement strategy for that data type. We perform an extensive simulation study to evaluate the performance of directed, amorphous, and hybrid placement protocols when applied to queries that are subject to timing constraints. Our results show that, directed placement is better for queries with moderately tight deadlines, whereas amorphous placement is better for queries with looser deadlines, and that under most operational conditions, the hybrid technique gives the best compromise.