32 resultados para distributed system

em Boston University Digital Common


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We examine the question of whether to employ the first-come-first-served (FCFS) discipline or the processor-sharing (PS) discipline at the hosts in a distributed server system. We are interested in the case in which service times are drawn from a heavy-tailed distribution, and so have very high variability. Traditional wisdom when task sizes are highly variable would prefer the PS discipline, because it allows small tasks to avoid being delayed behind large tasks in a queue. However, we show that system performance can actually be significantly better under FCFS queueing, if each task is assigned to a host based on the task's size. By task assignment, we mean an algorithm that inspects incoming tasks and assigns them to hosts for service. The particular task assignment policy we propose is called SITA-E: Size Interval Task Assignment with Equal Load. Surprisingly, under SITA-E, FCFS queueing typically outperforms the PS discipline by a factor of about two, as measured by mean waiting time and mean slowdown (waiting time of task divided by its service time). We compare the FCFS/SITA-E policy to the processor-sharing case analytically; in addition we compare it to a number of other policies in simulation. We show that the benefits of SITA-E are present even in small-scale distributed systems (four or more hosts). Furthermore, SITA-E is a static policy that does not incorporate feedback knowledge of the state of the hosts, which allows for a simple and scalable implementation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider the problem of task assignment in a distributed system (such as a distributed Web server) in which task sizes are drawn from a heavy-tailed distribution. Many task assignment algorithms are based on the heuristic that balancing the load at the server hosts will result in optimal performance. We show this conventional wisdom is less true when the task size distribution is heavy-tailed (as is the case for Web file sizes). We introduce a new task assignment policy, called Size Interval Task Assignment with Variable Load (SITA-V). SITA-V purposely operates the server hosts at different loads, and directs smaller tasks to the lighter-loaded hosts. The result is that SITA-V provably decreases the mean task slowdown by significant factors (up to 1000 or more) where the more heavy-tailed the workload, the greater the improvement factor. We evaluate the tradeoff between improvement in slowdown and increase in waiting time in a system using SITA-V, and show conditions under which SITA-V represents a particularly appealing policy. We conclude with a discussion of the use of SITA-V in a distributed Web server, and show that it is attractive because it has a simple implementation which requires no communication from the server hosts back to the task router.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Load balancing is often used to ensure that nodes in a distributed systems are equally loaded. In this paper, we show that for real-time systems, load balancing is not desirable. In particular, we propose a new load-profiling strategy that allows the nodes of a distributed system to be unequally loaded. Using load profiling, the system attempts to distribute the load amongst its nodes so as to maximize the chances of finding a node that would satisfy the computational needs of incoming real-time tasks. To that end, we describe and evaluate a distributed load-profiling protocol for dynamically scheduling time-constrained tasks in a loosely-coupled distributed environment. When a task is submitted to a node, the scheduling software tries to schedule the task locally so as to meet its deadline. If that is not feasible, it tries to locate another node where this could be done with a high probability of success, while attempting to maintain an overall load profile for the system. Nodes in the system inform each other about their state using a combination of multicasting and gossiping. The performance of the proposed protocol is evaluated via simulation, and is contrasted to other dynamic scheduling protocols for real-time distributed systems. Based on our findings, we argue that keeping a diverse availability profile and using passive bidding (through gossiping) are both advantageous to distributed scheduling for real-time systems.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

To construct high performance Web servers, system builders are increasingly turning to distributed designs. An important challenge that arises in distributed Web servers is the need to direct incoming connections to individual hosts. Previous methods for connection routing have employed a centralized node which handles all incoming requests. In contrast, we propose a distributed approach, called Distributed Packet Rewriting (DPR), in which all hosts of the distributed system participate in connection routing. We argue that this approach promises better scalability and fault-tolerance than the centralized approach. We describe our implementation of four variants of DPR and compare their performance. We show that DPR provides performance comparable to centralized alternatives, measured in terms of throughput and delay under the SPECweb96 benchmark. Finally, we argue that DPR is particularly attractive both for small scale systems and for systems following the emerging trend toward increasingly intelligent I/O subsystems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

For communication-intensive parallel applications, the maximum degree of concurrency achievable is limited by the communication throughput made available by the network. In previous work [HPS94], we showed experimentally that the performance of certain parallel applications running on a workstation network can be improved significantly if a congestion control protocol is used to enhance network performance. In this paper, we characterize and analyze the communication requirements of a large class of supercomputing applications that fall under the category of fixed-point problems, amenable to solution by parallel iterative methods. This results in a set of interface and architectural features sufficient for the efficient implementation of the applications over a large-scale distributed system. In particular, we propose a direct link between the application and network layer, supporting congestion control actions at both ends. This in turn enhances the system's responsiveness to network congestion, improving performance. Measurements are given showing the efficacy of our scheme to support large-scale parallel computations.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Current research on Internet-based distributed systems emphasizes the scalability of overlay topologies for efficient search and retrieval of data items, as well as routing amongst peers. However, most existing approaches fail to address the transport of data across these logical networks in accordance with quality of service (QoS) constraints. Consequently, this paper investigates the use of scalable overlay topologies for routing real-time media streams between publishers and potentially many thousands of subscribers. Specifically, we analyze the costs of using k-ary n-cubes for QoS-constrained routing. Given a number of nodes in a distributed system, we calculate the optimal k-ary n-cube structure for minimizing the average distance between any pair of nodes. Using this structure, we describe a greedy algorithm that selects paths between nodes in accordance with the real-time delays along physical links. We show this method improves the routing latencies by as much as 67%, compared to approaches that do not consider physical link costs. We are in the process of developing a method for adaptive node placement in the overlay topology, based upon the locations of publishers, subscribers, physical link costs and per-subscriber QoS constraints. One such method for repositioning nodes in logical space is discussed, to improve the likelihood of meeting service requirements on data routed between publishers and subscribers. Future work will evaluate the benefits of such techniques more thoroughly.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The exploding demand for services like the World Wide Web reflects the potential that is presented by globally distributed information systems. The number of WWW servers world-wide has doubled every 3 to 5 months since 1993, outstripping even the growth of the Internet. At each of these self-managed sites, the Common Gateway Interface (CGI) and Hypertext Transfer Protocol (HTTP) already constitute a rudimentary basis for contributing local resources to remote collaborations. However, the Web has serious deficiencies that make it unsuited for use as a true medium for metacomputing --- the process of bringing hardware, software, and expertise from many geographically dispersed sources to bear on large scale problems. These deficiencies are, paradoxically, the direct result of the very simple design principles that enabled its exponential growth. There are many symptoms of the problems exhibited by the Web: disk and network resources are consumed extravagantly; information search and discovery are difficult; protocols are aimed at data movement rather than task migration, and ignore the potential for distributing computation. However, all of these can be seen as aspects of a single problem: as a distributed system for metacomputing, the Web offers unpredictable performance and unreliable results. The goal of our project is to use the Web as a medium (within either the global Internet or an enterprise intranet) for metacomputing in a reliable way with performance guarantees. We attack this problem one four levels: (1) Resource Management Services: Globally distributed computing allows novel approaches to the old problems of performance guarantees and reliability. Our first set of ideas involve setting up a family of real-time resource management models organized by the Web Computing Framework with a standard Resource Management Interface (RMI), a Resource Registry, a Task Registry, and resource management protocols to allow resource needs and availability information be collected and disseminated so that a family of algorithms with varying computational precision and accuracy of representations can be chosen to meet realtime and reliability constraints. (2) Middleware Services: Complementary to techniques for allocating and scheduling available resources to serve application needs under realtime and reliability constraints, the second set of ideas aim at reduce communication latency, traffic congestion, server work load, etc. We develop customizable middleware services to exploit application characteristics in traffic analysis to drive new server/browser design strategies (e.g., exploit self-similarity of Web traffic), derive document access patterns via multiserver cooperation, and use them in speculative prefetching, document caching, and aggressive replication to reduce server load and bandwidth requirements. (3) Communication Infrastructure: Finally, to achieve any guarantee of quality of service or performance, one must get at the network layer that can provide the basic guarantees of bandwidth, latency, and reliability. Therefore, the third area is a set of new techniques in network service and protocol designs. (4) Object-Oriented Web Computing Framework A useful resource management system must deal with job priority, fault-tolerance, quality of service, complex resources such as ATM channels, probabilistic models, etc., and models must be tailored to represent the best tradeoff for a particular setting. This requires a family of models, organized within an object-oriented framework, because no one-size-fits-all approach is appropriate. This presents a software engineering challenge requiring integration of solutions at all levels: algorithms, models, protocols, and profiling and monitoring tools. The framework captures the abstract class interfaces of the collection of cooperating components, but allows the concretization of each component to be driven by the requirements of a specific approach and environment.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper describes a prototype implementation of a Distributed File System (DFS) based on the Adaptive Information Dispersal Algorithm (AIDA). Using AIDA, a file block is encoded and dispersed into smaller blocks stored on a number of DFS nodes distributed over a network. The implementation devises file creation, read, and write operations. In particular, when reading a file, the DFS accepts an optional timing constraint, which it uses to determine the level of redundancy needed for the read operation. The tighter the timing constraint, the more nodes in the DFS are queried for encoded blocks. Write operations update all blocks in all DFS nodes--with future implementations possibly including the use of read and write quorums. This work was conducted under the supervision of Professor Azer Bestavros (best@cs.bu.edu) in the Computer Science Department as part of Mohammad Makarechian's Master's project.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The proliferation of inexpensive workstations and networks has prompted several researchers to use such distributed systems for parallel computing. Attempts have been made to offer a shared-memory programming model on such distributed memory computers. Most systems provide a shared-memory that is coherent in that all processes that use it agree on the order of all memory events. This dissertation explores the possibility of a significant improvement in the performance of some applications when they use non-coherent memory. First, a new formal model to describe existing non-coherent memories is developed. I use this model to prove that certain problems can be solved using asynchronous iterative algorithms on shared-memory in which the coherence constraints are substantially relaxed. In the course of the development of the model I discovered a new type of non-coherent behavior called Local Consistency. Second, a programming model, Mermera, is proposed. It provides programmers with a choice of hierarchically related non-coherent behaviors along with one coherent behavior. Thus, one can trade-off the ease of programming with coherent memory for improved performance with non-coherent memory. As an example, I present a program to solve a linear system of equations using an asynchronous iterative algorithm. This program uses all the behaviors offered by Mermera. Third, I describe the implementation of Mermera on a BBN Butterfly TC2000 and on a network of workstations. The performance of a version of the equation solving program that uses all the behaviors of Mermera is compared with that of a version that uses coherent behavior only. For a system of 1000 equations the former exhibits at least a 5-fold improvement in convergence time over the latter. The version using coherent behavior only does not benefit from employing more than one workstation to solve the problem while the program using non-coherent behavior continues to achieve improved performance as the number of workstations is increased from 1 to 6. This measurement corroborates our belief that non-coherent shared memory can be a performance boon for some applications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Coherent shared memory is a convenient, but inefficient, method of inter-process communication for parallel programs. By contrast, message passing can be less convenient, but more efficient. To get the benefits of both models, several non-coherent memory behaviors have recently been proposed in the literature. We present an implementation of Mermera, a shared memory system that supports both coherent and non-coherent behaviors in a manner that enables programmers to mix multiple behaviors in the same program[HS93]. A programmer can debug a Mermera program using coherent memory, and then improve its performance by selectively reducing the level of coherence in the parts that are critical to performance. Mermera permits a trade-off of coherence for performance. We analyze this trade-off through measurements of our implementation, and by an example that illustrates the style of programming needed to exploit non-coherence. We find that, even on a small network of workstations, the performance advantage of non-coherence is compelling. Raw non-coherent memory operations perform 20-40~times better than non-coherent memory operations. An example application program is shown to run 5-11~times faster when permitted to exploit non-coherence. We conclude by commenting on our use of the Isis Toolkit of multicast protocols in implementing Mermera.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Programmers of parallel processes that communicate through shared globally distributed data structures (DDS) face a difficult choice. Either they must explicitly program DDS management, by partitioning or replicating it over multiple distributed memory modules, or be content with a high latency coherent (sequentially consistent) memory abstraction that hides the DDS' distribution. We present Mermera, a new formalism and system that enable a smooth spectrum of noncoherent shared memory behaviors to coexist between the above two extremes. Our approach allows us to define known noncoherent memories in a new simple way, to identify new memory behaviors, and to characterize generic mixed-behavior computations. The latter are useful for programming using multiple behaviors that complement each others' advantages. On the practical side, we show that the large class of programs that use asynchronous iterative methods (AIM) can run correctly on slow memory, one of the weakest, and hence most efficient and fault-tolerant, noncoherence conditions. An example AIM program to solve linear equations, is developed to illustrate: (1) the need for concurrently mixing memory behaviors, and, (2) the performance gains attainable via noncoherence. Other program classes tolerate weak memory consistency by synchronizing in such a way as to yield executions indistinguishable from coherent ones. AIM computations on noncoherent memory yield noncoherent, yet correct, computations. We report performance data that exemplifies the potential benefits of noncoherence, in terms of raw memory performance, as well as application speed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper examines how and why web server performance changes as the workload at the server varies. We measure the performance of a PC acting as a standalone web server, running Apache on top of Linux. We use two important tools to understand what aspects of software architecture and implementation determine performance at the server. The first is a tool that we developed, called WebMonitor, which measures activity and resource consumption, both in the operating system and in the web server. The second is the kernel profiling facility distributed as part of Linux. We vary the workload at the server along two important dimensions: the number of clients concurrently accessing the server, and the size of the documents stored on the server. Our results quantify and show how more clients and larger files stress the web server and operating system in different and surprising ways. Our results also show the importance of fixed costs (i.e., opening and closing TCP connections, and updating the server log) in determining web server performance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

snBench is a platform on which novice users compose and deploy distributed Sense and Respond programs for simultaneous execution on a shared, distributed infrastructure. It is a natural imperative that we have the ability to (1) verify the safety/correctness of newly submitted tasks and (2) derive the resource requirements for these tasks such that correct allocation may occur. To achieve these goals we have established a multi-dimensional sized type system for our functional-style Domain Specific Language (DSL) called Sensor Task Execution Plan (STEP). In such a type system data types are annotated with a vector of size attributes (e.g., upper and lower size bounds). Tracking multiple size aspects proves essential in a system in which Images are manipulated as a first class data type, as image manipulation functions may have specific minimum and/or maximum resolution restrictions on the input they can correctly process. Through static analysis of STEP instances we not only verify basic type safety and establish upper computational resource bounds (i.e., time and space), but we also derive and solve data and resource sizing constraints (e.g., Image resolution, camera capabilities) from the implicit constraints embedded in program instances. In fact, the static methods presented here have benefit beyond their application to Image data, and may be extended to other data types that require tracking multiple dimensions (e.g., image "quality", video frame-rate or aspect ratio, audio sampling rate). In this paper we present the syntax and semantics of our functional language, our type system that builds costs and resource/data constraints, and (through both formalism and specific details of our implementation) provide concrete examples of how the constraints and sizing information are used in practice.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A foundational issue underlying many overlay network applications ranging from routing to peer-to-peer file sharing is that of connectivity management, i.e., folding new arrivals into an existing overlay, and rewiring to cope with changing network conditions. Previous work has considered the problem from two perspectives: devising practical heuristics for specific applications designed to work well in real deployments, and providing abstractions for the underlying problem that are analytically tractable, especially via game-theoretic analysis. In this paper, we unify these two thrusts by using insights gleaned from novel, realistic theoretic models in the design of Egoist – a distributed overlay routing system that we implemented, deployed, and evaluated on PlanetLab. Using extensive measurements of paths between nodes, we demonstrate that Egoist’s neighbor selection primitives significantly outperform existing heuristics on a variety of performance metrics, including delay, available bandwidth, and node utilization. Moreover, we demonstrate that Egoist is competitive with an optimal, but unscalable full-mesh approach, remains highly effective under significant churn, is robust to cheating, and incurs minimal overhead. Finally, we use a multiplayer peer-to-peer game to demonstrate the value of Egoist to end-user applications. This technical report supersedes BUCS-TR-2007-013.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present an online distributed algorithm, the Causation Logging Algorithm (CLA), in which Autonomous Systems (ASes) in the Internet individually report route oscillations/flaps they experience to a central Internet Routing Registry (IRR). The IRR aggregates these reports and may observe what we call causation chains where each node on the chain caused a route flap at the next node along the chain. A chain may also have a causation cycle. The type of an observed causation chain/cycle allows the IRR to infer the underlying policy routing configuration (i.e., the system of economic relationships and constraints on route/path preferences). Our algorithm is based on a formal policy routing model that captures the propagation dynamics of route flaps under arbitrary changes in topology or path preferences. We derive invariant properties of causation chains/cycles for ASes which conform to economic relationships based on the popular Gao-Rexford model. The Gao-Rexford model is known to be safe in the sense that the system always converges to a stable set of paths under static conditions. Our CLA algorithm recovers the type/property of an observed causation chain of an underlying system and determines whether it conforms to the safe economic Gao-Rexford model. Causes for nonconformity can be diagnosed by comparing the properties of the causation chains with those predicted from different variants of the Gao-Rexford model.