Biblioteca Digital

25 resultados para Android, Componenti, Sensori, IPC, Shared memory

em Boston University Digital Common

Mermera: Non-Coherent Distributed Shared Memory for Parallel Computing

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The proliferation of inexpensive workstations and networks has prompted several researchers to use such distributed systems for parallel computing. Attempts have been made to offer a shared-memory programming model on such distributed memory computers. Most systems provide a shared-memory that is coherent in that all processes that use it agree on the order of all memory events. This dissertation explores the possibility of a significant improvement in the performance of some applications when they use non-coherent memory. First, a new formal model to describe existing non-coherent memories is developed. I use this model to prove that certain problems can be solved using asynchronous iterative algorithms on shared-memory in which the coherence constraints are substantially relaxed. In the course of the development of the model I discovered a new type of non-coherent behavior called Local Consistency. Second, a programming model, Mermera, is proposed. It provides programmers with a choice of hierarchically related non-coherent behaviors along with one coherent behavior. Thus, one can trade-off the ease of programming with coherent memory for improved performance with non-coherent memory. As an example, I present a program to solve a linear system of equations using an asynchronous iterative algorithm. This program uses all the behaviors offered by Mermera. Third, I describe the implementation of Mermera on a BBN Butterfly TC2000 and on a network of workstations. The performance of a version of the equation solving program that uses all the behaviors of Mermera is compared with that of a version that uses coherent behavior only. For a system of 1000 equations the former exhibits at least a 5-fold improvement in convergence time over the latter. The version using coherent behavior only does not benefit from employing more than one workstation to solve the problem while the program using non-coherent behavior continues to achieve improved performance as the number of workstations is increased from 1 to 6. This measurement corroborates our belief that non-coherent shared memory can be a performance boon for some applications.

An Implementation of Mermera: A Shared Memory System that Mixes Coherence with Non-coherence

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Coherent shared memory is a convenient, but inefficient, method of inter-process communication for parallel programs. By contrast, message passing can be less convenient, but more efficient. To get the benefits of both models, several non-coherent memory behaviors have recently been proposed in the literature. We present an implementation of Mermera, a shared memory system that supports both coherent and non-coherent behaviors in a manner that enables programmers to mix multiple behaviors in the same program[HS93]. A programmer can debug a Mermera program using coherent memory, and then improve its performance by selectively reducing the level of coherence in the parts that are critical to performance. Mermera permits a trade-off of coherence for performance. We analyze this trade-off through measurements of our implementation, and by an example that illustrates the style of programming needed to exploit non-coherence. We find that, even on a small network of workstations, the performance advantage of non-coherence is compelling. Raw non-coherent memory operations perform 20-40~times better than non-coherent memory operations. An example application program is shown to run 5-11~times faster when permitted to exploit non-coherence. We conclude by commenting on our use of the Isis Toolkit of multicast protocols in implementing Mermera.

An Efficient User-Level Shared Memory Mechanism for Application-Specific Extensions

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper focuses on an efficient user-level method for the deployment of application-specific extensions, using commodity operating systems and hardware. A sandboxing technique is described that supports multiple extensions within a shared virtual address space. Applications can register sandboxed code with the system, so that it may be executed in the context of any process. Such code may be used to implement generic routines and handlers for a class of applications, or system service extensions that complement the functionality of the core kernel. Using our approach, application-specific extensions can be written like conventional user-level code, utilizing libraries and system calls, with the advantage that they may be executed without the traditional costs of scheduling and context-switching between process-level protection domains. No special hardware support such as segmentation or tagged translation look-aside buffers (TLBs) is required. Instead, our ``user-level sandboxing'' mechanism requires only paged-based virtual memory support, given that sandboxed extensions are either written by a trusted source or are guaranteed to be memory-safe (e.g., using type-safe languages). Using a fast method of upcalls, we show how our mechanism provides significant performance improvements over traditional methods of invoking user-level services. As an application of our approach, we have implemented a user-level network subsystem that avoids data copying via the kernel and, in many cases, yields far greater network throughput than kernel-level approaches.

Distributed Parallel Computing in Mermera: Mixing Noncoherent Shared Memories

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Programmers of parallel processes that communicate through shared globally distributed data structures (DDS) face a difficult choice. Either they must explicitly program DDS management, by partitioning or replicating it over multiple distributed memory modules, or be content with a high latency coherent (sequentially consistent) memory abstraction that hides the DDS' distribution. We present Mermera, a new formalism and system that enable a smooth spectrum of noncoherent shared memory behaviors to coexist between the above two extremes. Our approach allows us to define known noncoherent memories in a new simple way, to identify new memory behaviors, and to characterize generic mixed-behavior computations. The latter are useful for programming using multiple behaviors that complement each others' advantages. On the practical side, we show that the large class of programs that use asynchronous iterative methods (AIM) can run correctly on slow memory, one of the weakest, and hence most efficient and fault-tolerant, noncoherence conditions. An example AIM program to solve linear equations, is developed to illustrate: (1) the need for concurrently mixing memory behaviors, and, (2) the performance gains attainable via noncoherence. Other program classes tolerate weak memory consistency by synchronizing in such a way as to yield executions indistinguishable from coherent ones. AIM computations on noncoherent memory yield noncoherent, yet correct, computations. We report performance data that exemplifies the potential benefits of noncoherence, in terms of raw memory performance, as well as application speed.

Communicable Memory and Lazy Barriers for Bulk Synchronous Parallelism in BSPk

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Communication and synchronization stand as the dual bottlenecks in the performance of parallel systems, and especially those that attempt to alleviate the programming burden by incurring overhead in these two domains. We formulate the notions of communicable memory and lazy barriers to help achieve efficient communication and synchronization. These concepts are developed in the context of BSPk, a toolkit library for programming networks of workstations|and other distributed memory architectures in general|based on the Bulk Synchronous Parallel (BSP) model. BSPk emphasizes efficiency in communication by minimizing local memory-to-memory copying, and in barrier synchronization by not forcing a process to wait unless it needs remote data. Both the message passing (MP) and distributed shared memory (DSM) programming styles are supported in BSPk. MP helps processes efficiently exchange short-lived unnamed data values, when the identity of either the sender or receiver is known to the other party. By contrast, DSM supports communication between processes that may be mutually anonymous, so long as they can agree on variable names in which to store shared temporary or long-lived data.

Using Warp to Control Network Contention in Mermera

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Parallel computing on a network of workstations can saturate the communication network, leading to excessive message delays and consequently poor application performance. We examine empirically the consequences of integrating a flow control protocol, called Warp control [Par93], into Mermera, a software shared memory system that supports parallel computing on distributed systems [HS93]. For an asynchronous iterative program that solves a system of linear equations, our measurements show that Warp succeeds in stabilizing the network's behavior even under high levels of contention. As a result, the application achieves a higher effective communication throughput, and a reduced completion time. In some cases, however, Warp control does not achieve the performance attainable by fixed size buffering when using a statically optimal buffer size. Our use of Warp to regulate the allocation of network bandwidth emphasizes the possibility for integrating it with the allocation of other resources, such as CPU cycles and disk bandwidth, so as to optimize overall system throughput, and enable fully-shared execution of parallel programs.

Using Speculation to Reduce Server Load and Service Time on the WWW

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speculative service implies that a client's request for a document is serviced by sending, in addition to the document requested, a number of other documents (or pointers thereto) that the server speculates will be requested by the client in the near future. This speculation is based on statistical information that the server maintains for each document it serves. The notion of speculative service is analogous to prefetching, which is used to improve cache performance in distributed/parallel shared memory systems, with the exception that servers (not clients) control when and what to prefetch. Using trace simulations based on the logs of our departmental HTTP server http://cs-www.bu.edu, we show that both server load and service time could be reduced considerably, if speculative service is used. This is above and beyond what is currently achievable using client-side caching [3] and server-side dissemination [2]. We identify a number of parameters that could be used to fine-tune the level of speculation performed by the server.

A Chosen Witness: A Sermon Preached in St. Stephen's Church in Memory of the Rev. Webster

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Memorial Sermon preached in memory of the Rev. Walter Gardner Webster

Tribute to the Memory of President Fisk

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Tribute to the Memory of President Fisk.

Client-based Logging: A New Paradigm of Distributed Transaction Management

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The proliferation of inexpensive workstations and networks has created a new era in distributed computing. At the same time, non-traditional applications such as computer-aided design (CAD), computer-aided software engineering (CASE), geographic-information systems (GIS), and office-information systems (OIS) have placed increased demands for high-performance transaction processing on database systems. The combination of these factors gives rise to significant challenges in the design of modern database systems. In this thesis, we propose novel techniques whose aim is to improve the performance and scalability of these new database systems. These techniques exploit client resources through client-based transaction management. Client-based transaction management is realized by providing logging facilities locally even when data is shared in a global environment. This thesis presents several recovery algorithms which utilize client disks for storing recovery related information (i.e., log records). Our algorithms work with both coarse and fine-granularity locking and they do not require the merging of client logs at any time. Moreover, our algorithms support fine-granularity locking with multiple clients permitted to concurrently update different portions of the same database page. The database state is recovered correctly when there is a complex crash as well as when the updates performed by different clients on a page are not present on the disk version of the page, even though some of the updating transactions have committed. This thesis also presents the implementation of the proposed algorithms in a memory-mapped storage manager as well as a detailed performance study of these algorithms using the OO1 database benchmark. The performance results show that client-based logging is superior to traditional server-based logging. This is because client-based logging is an effective way to reduce dependencies on server CPU and disk resources and, thus, prevents the server from becoming a performance bottleneck as quickly when the number of clients accessing the database increases.

Robust Identification of Shared Losses Using End-to-End Unicast Probes

Relevância:

20.00% 20.00%

Publicador:

Resumo:

ERRATA: We present corrections to Fact 3 and (as a consequence) to Lemma 1 of BUCS Technical Report BUCS-TR-2000-013 (also published in IEEE INCP'2000)[1]. These corrections result in slight changes to the formulae used for the identifications of shared losses, which we quantify.

"Networking is IPC": A Guiding Principle to a Better Internet

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This position paper outlines a new network architecture, i.e., a style of construction that identifies the objects and how they relate. We do not specify particular protocol implementations or specific interfaces and policies. After all, it should be possible to change protocols in an architecture without changing the architecture. Rather we outline the repeating patterns and structures, and how the proposed model would cope with the challenges faced by today's Internet (and that of the future). Our new architecture is based on the following principle: Application processes communicate via a distributed inter-process communication (IPC) facility. The application processes that make up this facility provide a protocol that implements an IPC mechanism, and a protocol for managing distributed IPC (routing, security and other management tasks). Existing implementation strategies, algorithms, and protocols can be cast and used within our proposed new structure.

Trade & Cap: A Customer-Managed, Market-Based System for Trading Bandwidth Allowances at a Shared Link

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose Trade & Cap (T&C), an economics-inspired mechanism that incentivizes users to voluntarily coordinate their consumption of the bandwidth of a shared resource (e.g., a DSLAM link) so as to converge on what they perceive to be an equitable allocation, while ensuring efficient resource utilization. Under T&C, rather than acting as an arbiter, an Internet Service Provider (ISP) acts as an enforcer of what the community of rational users sharing the resource decides is a fair allocation of that resource. Our T&C mechanism proceeds in two phases. In the first, software agents acting on behalf of users engage in a strategic trading game in which each user agent selfishly chooses bandwidth slots to reserve in support of primary, interactive network usage activities. In the second phase, each user is allowed to acquire additional bandwidth slots in support of presumed open-ended need for fluid bandwidth, catering to secondary applications. The acquisition of this fluid bandwidth is subject to the remaining "buying power" of each user and by prevalent "market prices" – both of which are determined by the results of the trading phase and a desirable aggregate cap on link utilization. We present analytical results that establish the underpinnings of our T&C mechanism, including game-theoretic results pertaining to the trading phase, and pricing of fluid bandwidth allocation pertaining to the capping phase. Using real network traces, we present extensive experimental results that demonstrate the benefits of our scheme, which we also show to be practical by highlighting the salient features of an efficient implementation architecture.

Robust Identification of Shared Losses Using End-to-End Unicast Probes

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Current Internet transport protocols make end-to-end measurements and maintain per-connection state to regulate the use of shared network resources. When two or more such connections share a common endpoint, there is an opportunity to correlate the end-to-end measurements made by these protocols to better diagnose and control the use of shared resources. We develop packet probing techniques to determine whether a pair of connections experience shared congestion. Correct, efficient diagnoses could enable new techniques for aggregate congestion control, QoS admission control, connection scheduling and mirror site selection. Our extensive simulation results demonstrate that the conditional (Bayesian) probing approach we employ provides superior accuracy, converges faster, and tolerates a wider range of network conditions than recently proposed memoryless (Markovian) probing approaches for addressing this opportunity.

Cuckoo: a Language for Implementing Memory- and Thread-safe System Services

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper is centered around the design of a thread- and memory-safe language, primarily for the compilation of application-specific services for extensible operating systems. We describe various issues that have influenced the design of our language, called Cuckoo, that guarantees safety of programs with potentially asynchronous flows of control. Comparisons are drawn between Cuckoo and related software safety techniques, including Cyclone and software-based fault isolation (SFI), and performance results suggest our prototype compiler is capable of generating safe code that executes with low runtime overheads, even without potential code optimizations. Compared to Cyclone, Cuckoo is able to safely guard accesses to memory when programs are multithreaded. Similarly, Cuckoo is capable of enforcing memory safety in situations that are potentially troublesome for techniques such as SFI.

«
1
2
»