937 resultados para Distributed non-coherent shared memory
Resumo:
Mutable state can be useful in certain algorithms, to structure programs, or for efficiency purposes. However, when shared mutable state is used in non-local or nonobvious ways, the interactions that can occur via aliases to that shared memory can be a source of program errors. Undisciplined uses of shared state may unsafely interfere with local reasoning as other aliases may interleave their changes to the shared state in unexpected ways. We propose a novel technique, rely-guarantee protocols, that structures the interactions between aliases and ensures that only safe interference is possible. We present a linear type system outfitted with our novel sharing mechanism that enables controlled interference over shared mutable resources. Each alias is assigned separate, local roles encoded in a protocol abstraction that constrains how an alias can legally use that shared state. By following the spirit of rely-guarantee reasoning, our rely-guarantee protocols ensure that only safe interference can occur but still allow many interesting uses of shared state, such as going beyond invariant and monotonic usages. This thesis describes the three core mechanisms that enable our type-based technique to work: 1) we show how a protocol models an alias’s perspective on how the shared state evolves and constrains that alias’s interactions with the shared state; 2) we show how protocols can be used while enforcing the agreed interference contract; and finally, 3) we show how to check that all local protocols to some shared state can be safely composed to ensure globally safe interference over that shared memory. The interference caused by shared state is rooted at how the uses of di↵erent aliases to that state may be interleaved (perhaps even in non-deterministic ways) at run-time. Therefore, our technique is mostly agnostic as to whether this interference was the result of alias interleaving caused by sequential or concurrent semantics. We show implementations of our technique in both settings, and highlight their di↵erences. Because sharing is “first-class” (and not tied to a module), we show a polymorphic procedure that enables abstract compositions of protocols. Thus, protocols can be specialized or extended without requiring specific knowledge of the interference produce by other protocols to that state. We show that protocol composition can ensure safety even when considering abstracted protocols. We show that this core composition mechanism is sound, decidable (without the need for manual intervention), and provide an algorithm implementation.
Resumo:
Modern embedded systems embrace many-core shared-memory designs. Due to constrained power and area budgets, most of them feature software-managed scratchpad memories instead of data caches to increase the data locality. It is therefore programmers’ responsibility to explicitly manage the memory transfers, and this make programming these platform cumbersome. Moreover, complex modern applications must be adequately parallelized before they can the parallel potential of the platform into actual performance. To support this, programming languages were proposed, which work at a high level of abstraction, and rely on a runtime whose cost hinders performance, especially in embedded systems, where resources and power budget are constrained. This dissertation explores the applicability of the shared-memory paradigm on modern many-core systems, focusing on the ease-of-programming. It focuses on OpenMP, the de-facto standard for shared memory programming. In a first part, the cost of algorithms for synchronization and data partitioning are analyzed, and they are adapted to modern embedded many-cores. Then, the original design of an OpenMP runtime library is presented, which supports complex forms of parallelism such as multi-level and irregular parallelism. In the second part of the thesis, the focus is on heterogeneous systems, where hardware accelerators are coupled to (many-)cores to implement key functional kernels with orders-of-magnitude of speedup and energy efficiency compared to the “pure software” version. However, three main issues rise, namely i) platform design complexity, ii) architectural scalability and iii) programmability. To tackle them, a template for a generic hardware processing unit (HWPU) is proposed, which share the memory banks with cores, and the template for a scalable architecture is shown, which integrates them through the shared-memory system. Then, a full software stack and toolchain are developed to support platform design and to let programmers exploiting the accelerators of the platform. The OpenMP frontend is extended to interact with it.
Resumo:
The current industry trend is towards using Commercially available Off-The-Shelf (COTS) based multicores for developing real time embedded systems, as opposed to the usage of custom-made hardware. In typical implementation of such COTS-based multicores, multiple cores access the main memory via a shared bus. This often leads to contention on this shared channel, which results in an increase of the response time of the tasks. Analyzing this increased response time, considering the contention on the shared bus, is challenging on COTS-based systems mainly because bus arbitration protocols are often undocumented and the exact instants at which the shared bus is accessed by tasks are not explicitly controlled by the operating system scheduler; they are instead a result of cache misses. This paper makes three contributions towards analyzing tasks scheduled on COTS-based multicores. Firstly, we describe a method to model the memory access patterns of a task. Secondly, we apply this model to analyze the worst case response time for a set of tasks. Although the required parameters to obtain the request profile can be obtained by static analysis, we provide an alternative method to experimentally obtain them by using performance monitoring counters (PMCs). We also compare our work against an existing approach and show that our approach outperforms it by providing tighter upper-bound on the number of bus requests generated by a task.
Resumo:
Inner products of the type < f, g >(S) = < f, g >psi(0) + < f', g'>psi(1), where one of the measures psi(0) or psi(1) is the measure associated with the Gegenbauer polynomials, are usually referred to as Gegenbauer-Sobolev inner products. This paper deals with some asymptotic relations for the orthogonal polynomials with respect to a class of Gegenbauer-Sobolev inner products. The inner products are such that the associated pairs of symmetric measures (psi(0), psi(1)) are not within the concept of symmetrically coherent pairs of measures.
Asymptotics for Jacobi-Sobolev orthogonal polynomials associated with non-coherent pairs of measures
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
In questa tesi sono stati apportati due importanti contributi nel campo degli acceleratori embedded many-core. Abbiamo implementato un runtime OpenMP ottimizzato per la gestione del tasking model per sistemi a processori strettamente accoppiati in cluster e poi interconnessi attraverso una network on chip. Ci siamo focalizzati sulla loro scalabilità e sul supporto di task di granularità fine, come è tipico nelle applicazioni embedded. Il secondo contributo di questa tesi è stata proporre una estensione del runtime di OpenMP che cerca di prevedere la manifestazione di errori dati da fenomeni di variability tramite una schedulazione efficiente del carico di lavoro.
Resumo:
The goal of the RAP-WAM AND-parallel Prolog abstract architecture is to provide inference speeds significantly beyond those of sequential systems, while supporting Prolog semantics and preserving sequential performance and storage efficiency. This paper presents simulation results supporting these claims with special emphasis on memory performance on a two-level sharedmemory multiprocessor organization. Several solutions to the cache coherency problem are analyzed. It is shown that RAP-WAM offers good locality and storage efficiency and that it can effectively take advantage of broadcast caches. It is argued that speeds in excess of 2 ML IPS on real applications exhibiting medium parallelism can be attained with current technology.
Resumo:
We have experimentally investigated two non-inverting optical memory configurations: a TOAD based device & an integrated hybrid Mach-Zehnder interferometer. Experimental results for both devices and a comparison of the two techniques is presented.
Resumo:
This study is concerned with several proposals concerning multiprocessor systems and with the various possible methods of evaluating such proposals. After a discussion of the advantages and disadvantages of several performance evaluation tools, the author decides that simulation is the only tool powerful enough to develop a model which would be of practical use, in the design, comparison and extension of systems. The main aims of the simulation package developed as part of this study are cost effectiveness, ease of use and generality. The methodology on which the simulation package is based is described in detail. The fundamental principles are that model design should reflect actual systems design, that measuring procedures should be carried out alongside design that models should be well documented and easily adaptable and that models should be dynamic. The simulation package itself is modular, and in this way reflects current design trends. This approach also aids documentation and ensures that the model is easily adaptable. It contains a skeleton structure and a library of segments which can be added to or directly swapped with segments of the skeleton structure, to form a model which fits a user's requirements. The study also contains the results of some experimental work carried out using the model, the first part of which tests• the model's capabilities by simulating a large operating system, the ICL George 3 system; the second part deals with general questions and some of the many proposals concerning multiprocessor systems.
Resumo:
In this thesis, a tube-based Distributed Economic Predictive Control (DEPC) scheme is presented for a group of dynamically coupled linear subsystems. These subsystems are components of a large scale system and control inputs are computed based on optimizing a local economic objective. Each subsystem is interacting with its neighbors by sending its future reference trajectory, at each sampling time. It solves a local optimization problem in parallel, based on the received future reference trajectories of the other subsystems. To ensure recursive feasibility and a performance bound, each subsystem is constrained to not deviate too much from its communicated reference trajectory. This difference between the plan trajectory and the communicated one is interpreted as a disturbance on the local level. Then, to ensure the satisfaction of both state and input constraints, they are tightened by considering explicitly the effect of these local disturbances. The proposed approach averages over all possible disturbances, handles tightened state and input constraints, while satisfies the compatibility constraints to guarantee that the actual trajectory lies within a certain bound in the neighborhood of the reference one. Each subsystem is optimizing a local arbitrary economic objective function in parallel while considering a local terminal constraint to guarantee recursive feasibility. In this framework, economic performance guarantees for a tube-based distributed predictive control (DPC) scheme are developed rigorously. It is presented that the closed-loop nominal subsystem has a robust average performance bound locally which is no worse than that of a local robust steady state. Since a robust algorithm is applying on the states of the real (with disturbances) subsystems, this bound can be interpreted as an average performance result for the real closed-loop system. To this end, we present our outcomes on local and global performance, illustrated by a numerical example.
Resumo:
The recent trends of chip architectures with higher number of heterogeneous cores, and non-uniform memory/non-coherent caches, brings renewed attention to the use of Software Transactional Memory (STM) as a fundamental building block for developing parallel applications. Nevertheless, although STM promises to ease concurrent and parallel software development, it relies on the possibility of aborting conflicting transactions to maintain data consistency, which impacts on the responsiveness and timing guarantees required by embedded real-time systems. In these systems, contention delays must be (efficiently) limited so that the response times of tasks executing transactions are upper-bounded and task sets can be feasibly scheduled. In this paper we assess the use of STM in the development of embedded real-time software, defending that the amount of contention can be reduced if read-only transactions access recent consistent data snapshots, progressing in a wait-free manner. We show how the required number of versions of a shared object can be calculated for a set of tasks. We also outline an algorithm to manage conflicts between update transactions that prevents starvation.
Resumo:
Recent embedded processor architectures containing multiple heterogeneous cores and non-coherent caches renewed attention to the use of Software Transactional Memory (STM) as a building block for developing parallel applications. STM promises to ease concurrent and parallel software development, but relies on the possibility of abort conflicting transactions to maintain data consistency, which in turns affects the execution time of tasks carrying transactions. Because of this fact the timing behaviour of the task set may not be predictable, thus it is crucial to limit the execution time overheads resulting from aborts. In this paper we formalise a FIFO-based algorithm to order the sequence of commits of concurrent transactions. Then, we propose and evaluate two non-preemptive and one SRP-based fully-preemptive scheduling strategies, in order to avoid transaction starvation.