Biblioteca Digital

The current industry trend is towards using Commercially available Off-The-Shelf (COTS) based multicores for developing real time embedded systems, as opposed to the usage of custom-made hardware. In typical implementation of such COTS-based multicores, multiple cores access the main memory via a shared bus. This often leads to contention on this shared channel, which results in an increase of the response time of the tasks. Analyzing this increased response time, considering the contention on the shared bus, is challenging on COTS-based systems mainly because bus arbitration protocols are often undocumented and the exact instants at which the shared bus is accessed by tasks are not explicitly controlled by the operating system scheduler; they are instead a result of cache misses. This paper makes three contributions towards analyzing tasks scheduled on COTS-based multicores. Firstly, we describe a method to model the memory access patterns of a task. Secondly, we apply this model to analyze the worst case response time for a set of tasks. Although the required parameters to obtain the request profile can be obtained by static analysis, we provide an alternative method to experimentally obtain them by using performance monitoring counters (PMCs). We also compare our work against an existing approach and show that our approach outperforms it by providing tighter upper-bound on the number of bus requests generated by a task.

Veja mais

Synergy : a comprehensive software distributed shared memory system

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Considerable research and development has been invested in software Distributed Shared Memory (DSM). The primary focus of this work has traditionally been on high performance and consistency protocols. Unfortunately, clusters present a number of challenges for any DSM systems not solvable through consistency protocols alone. These challenges relate to the ability of DSM systems to adjust to load fluctuations, computers being added/removed from the cluster, to deal with faults, and the ability to use DSM objects larger than the available physical memory. This paper introduces the Synergy DSM System and its integration with the virtual memory, group communication and process migration services of the Genesis Cluster Operating System.

Veja mais

Programmer friendly and efficient distributed shared memory integrated into a distributed operating system

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Distributed Shared Memory (DSM) provides programmers with a shared memory environment in systems where memory is not physically shared. Clusters of Workstations (COWs), an often untapped source of computing power, are characterised by a very low cost/performance ratio. The combination of Clusters of Workstations (COWs) with DSM provides an environment in which the programmer can use the well known approaches and methods of programming for physically shared memory systems and parallel processing can be carried out to make full use of the computing power and cost advantages of the COW. The aim of this research is to synthesise and develop a distributed shared memory system as an integral part of an operating system in order to provide application programmers with a convenient environment in which the development and execution of parallel applications can be done easily and efficiently, and which does this in a transparent manner. Furthermore, in order to satisfy our challenging design requirements we want to demonstrate that the operating system into which the DSM system is integrated should be a distributed operating system. In this thesis a study into the synthesis of a DSM system within a microkernel and client-server based distributed operating system which uses both strict and weak consistency models, with a write-invalidate and write-update based approach for consistency maintenance is reported. Furthermore a unique automatic initialisation system which allows the programmer to start the parallel execution of a group of processes with a single library call is reported. The number and location of these processes are determined by the operating system based on system load information. The DSM system proposed has a novel approach in that it provides programmers with a complete programming environment in which they are easily able to develop and run their code or indeed run existing shared memory code. A set of demanding DSM system design requirements are presented and the incentives for the placement of the DSM system with a distributed operating system and in particular in the memory management server have been reported. The new DSM system concentrated on an event-driven set of cooperating and distributed entities, and a detailed description of the events and reactions to these events that make up the operation of the DSM system is then presented. This is followed by a pseudocode form of the detailed design of the main modules and activities of the primitives used in the proposed DSM system. Quantitative results of performance tests and qualitative results showing the ease of programming and use of the RHODOS DSM system are reported. A study of five different application is given and the results of tests carried out on these applications together with a discussion of the results are given. A discussion of how RHODOS’ DSM allows programmers to write shared memory code in an easy to use and familiar environment and a comparative evaluation of RHODOS DSM with other DSM systems is presented. In particular, the ease of use and transparency of the DSM system have been demonstrated through the description of the ease with which a moderately inexperienced undergraduate programmer was able to convert, write and run applications for the testing of the DSM system. Furthermore, the description of the tests performed using physically shared memory shows that the latter is indistinguishable from distributed shared memory; this is further evidence that the DSM system is fully transparent. This study clearly demonstrates that the aim of the research has been achieved; it is possible to develop a programmer friendly and efficient DSM system fully integrated within a distributed operating system. It is clear from this research that client-server and microkernel based distributed operating system integrated DSM makes shared memory operations transparent and almost completely removes the involvement of the programmer beyond classical activities needed to deal with shared memory. The conclusion can be drawn that DSM, when implemented within a client-server and microkernel based distributed operating system, is one of the most encouraging approaches to parallel processing since it guarantees performance improvements with minimal programmer involvement.

Veja mais

Hierarchical directory-based shared memory architecture

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We describe the design of a directory-based shared memory architecture on a hierarchical network of hypercubes. The distributed directory scheme comprises two separate hierarchical networks for handling cache requests and transfers. Further, the scheme assumes a single address space and each processing element views the entire network as contiguous memory space. The size of individual directories stored at each node of the network remains constant throughout the network. Although the size of the directory increases with the network size, the architecture is scalable. The results of the analytical studies demonstrate superior performance characteristics of our scheme compared with those of other schemes.

Veja mais

A survey of shared memory systems

Relevância:

100.00% 100.00%

Publicador:

Veja mais

Use of shared memory in the context of embedded multi-core processors: exploration of the technology and its limits

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modern embedded systems embrace many-core shared-memory designs. Due to constrained power and area budgets, most of them feature software-managed scratchpad memories instead of data caches to increase the data locality. It is therefore programmers’ responsibility to explicitly manage the memory transfers, and this make programming these platform cumbersome. Moreover, complex modern applications must be adequately parallelized before they can the parallel potential of the platform into actual performance. To support this, programming languages were proposed, which work at a high level of abstraction, and rely on a runtime whose cost hinders performance, especially in embedded systems, where resources and power budget are constrained. This dissertation explores the applicability of the shared-memory paradigm on modern many-core systems, focusing on the ease-of-programming. It focuses on OpenMP, the de-facto standard for shared memory programming. In a first part, the cost of algorithms for synchronization and data partitioning are analyzed, and they are adapted to modern embedded many-cores. Then, the original design of an OpenMP runtime library is presented, which supports complex forms of parallelism such as multi-level and irregular parallelism. In the second part of the thesis, the focus is on heterogeneous systems, where hardware accelerators are coupled to (many-)cores to implement key functional kernels with orders-of-magnitude of speedup and energy efficiency compared to the “pure software” version. However, three main issues rise, namely i) platform design complexity, ii) architectural scalability and iii) programmability. To tackle them, a template for a generic hardware processing unit (HWPU) is proposed, which share the memory banks with cores, and the template for a scalable architecture is shown, which integrates them through the shared-memory system. Then, a full software stack and toolchain are developed to support platform design and to let programmers exploiting the accelerators of the platform. The OpenMP frontend is extended to interact with it.

Veja mais

OpenMP task scheduling strategies to mitigate hardware variability in tightly-coupled shared memory clusters

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In questa tesi sono stati apportati due importanti contributi nel campo degli acceleratori embedded many-core. Abbiamo implementato un runtime OpenMP ottimizzato per la gestione del tasking model per sistemi a processori strettamente accoppiati in cluster e poi interconnessi attraverso una network on chip. Ci siamo focalizzati sulla loro scalabilità e sul supporto di task di granularità fine, come è tipico nelle applicazioni embedded. Il secondo contributo di questa tesi è stata proporre una estensione del runtime di OpenMP che cerca di prevedere la manifestazione di errori dati da fenomeni di variability tramite una schedulazione efficiente del carico di lavoro.

Veja mais

Memory performance of and-parallel prolog on shared-memory architectures

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The goal of the RAP-WAM AND-parallel Prolog abstract architecture is to provide inference speeds significantly beyond those of sequential systems, while supporting Prolog semantics and preserving sequential performance and storage efficiency. This paper presents simulation results supporting these claims with special emphasis on memory performance on a two-level sharedmemory multiprocessor organization. Several solutions to the cache coherency problem are analyzed. It is shown that RAP-WAM offers good locality and storage efficiency and that it can effectively take advantage of broadcast caches. It is argued that speeds in excess of 2 ML IPS on real applications exhibiting medium parallelism can be attained with current technology.

Veja mais

Jasmine: A shared-object multi-locking distributed shared memory system for heterogeneous computers

Relevância:

100.00% 100.00%

Publicador:

Veja mais

Evaluation and measurement of multiprocessor systems with shared memory

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study is concerned with several proposals concerning multiprocessor systems and with the various possible methods of evaluating such proposals. After a discussion of the advantages and disadvantages of several performance evaluation tools, the author decides that simulation is the only tool powerful enough to develop a model which would be of practical use, in the design, comparison and extension of systems. The main aims of the simulation package developed as part of this study are cost effectiveness, ease of use and generality. The methodology on which the simulation package is based is described in detail. The fundamental principles are that model design should reflect actual systems design, that measuring procedures should be carried out alongside design that models should be well documented and easily adaptable and that models should be dynamic. The simulation package itself is modular, and in this way reflects current design trends. This approach also aids documentation and ensures that the model is easily adaptable. It contains a skeleton structure and a library of segments which can be added to or directly swapped with segments of the skeleton structure, to form a model which fits a user's requirements. The study also contains the results of some experimental work carried out using the model, the first part of which tests• the model's capabilities by simulating a large operating system, the ICL George 3 system; the second part deals with general questions and some of the many proposals concerning multiprocessor systems.

Veja mais

WOMM: A Weak Operational Memory Model

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Memory models of shared memory concurrent programs define the values a read of a shared memory location is allowed to see. Such memory models are typically weaker than the intuitive sequential consistency semantics to allow efficient execution. In this paper, we present WOMM (abbreviation for Weak Operational Memory Model) that formally unifies two sources of weak behavior in hardware memory models: reordering of instructions and weakly consistent memory. We show that a large number of optimizations are allowed by WOMM. We also show that WOMM is weaker than a number of hardware memory models. Consequently, if a program behaves correctly under WOMM, it will be correct with respect to those hardware memory models. Hence, WOMM can be used as a formally specified abstraction of the hardware memory models. Moreover; unlike most weak memory models, WOMM is described using operational semantics, making it easy to integrate into a model checker for concurrent programs. We further show that WOMM has an important property - it has sequential consistency semantics for datarace-free programs.

Veja mais

Variable Granularity Access Tracking Scheme for Improving the Performance of Software Transactional Memory

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Software transactional memory (STM) has been proposed as a promising programming paradigm for shared memory multi-threaded programs as an alternative to conventional lock based synchronization primitives. Typical STM implementations employ a conflict detection scheme, which works with uniform access granularity, tracking shared data accesses either at word/cache line or at object level. It is well known that a single fixed access tracking granularity cannot meet the conflicting goals of reducing false conflicts without impacting concurrency adversely. A fine grained granularity while improving concurrency can have an adverse impact on performance due to lock aliasing, lock validation overheads, and additional cache pressure. On the other hand, a coarse grained granularity can impact performance due to reduced concurrency. Thus, in general, a fixed or uniform granularity access tracking (UGAT) scheme is application-unaware and rarely matches the access patterns of individual application or parts of an application, leading to sub-optimal performance for different parts of the application(s). In order to mitigate the disadvantages associated with UGAT scheme, we propose a Variable Granularity Access Tracking (VGAT) scheme in this paper. We propose a compiler based approach wherein the compiler uses inter-procedural whole program static analysis to select the access tracking granularity for different shared data structures of the application based on the application's data access pattern. We describe our prototype VGAT scheme, using TL2 as our STM implementation. Our experimental results reveal that VGAT-STM scheme can improve the application performance of STAMP benchmarks from 1.87% to up to 21.2%.

Veja mais

978 resultados para Android, Componenti, Sensori, IPC, Shared memory

Filtro por publicador