988 resultados para Android, Componenti, Sensori, IPC, Shared memory
Resumo:
The current industry trend is towards using Commercially available Off-The-Shelf (COTS) based multicores for developing real time embedded systems, as opposed to the usage of custom-made hardware. In typical implementation of such COTS-based multicores, multiple cores access the main memory via a shared bus. This often leads to contention on this shared channel, which results in an increase of the response time of the tasks. Analyzing this increased response time, considering the contention on the shared bus, is challenging on COTS-based systems mainly because bus arbitration protocols are often undocumented and the exact instants at which the shared bus is accessed by tasks are not explicitly controlled by the operating system scheduler; they are instead a result of cache misses. This paper makes three contributions towards analyzing tasks scheduled on COTS-based multicores. Firstly, we describe a method to model the memory access patterns of a task. Secondly, we apply this model to analyze the worst case response time for a set of tasks. Although the required parameters to obtain the request profile can be obtained by static analysis, we provide an alternative method to experimentally obtain them by using performance monitoring counters (PMCs). We also compare our work against an existing approach and show that our approach outperforms it by providing tighter upper-bound on the number of bus requests generated by a task.
Resumo:
Mutable state can be useful in certain algorithms, to structure programs, or for efficiency purposes. However, when shared mutable state is used in non-local or nonobvious ways, the interactions that can occur via aliases to that shared memory can be a source of program errors. Undisciplined uses of shared state may unsafely interfere with local reasoning as other aliases may interleave their changes to the shared state in unexpected ways. We propose a novel technique, rely-guarantee protocols, that structures the interactions between aliases and ensures that only safe interference is possible. We present a linear type system outfitted with our novel sharing mechanism that enables controlled interference over shared mutable resources. Each alias is assigned separate, local roles encoded in a protocol abstraction that constrains how an alias can legally use that shared state. By following the spirit of rely-guarantee reasoning, our rely-guarantee protocols ensure that only safe interference can occur but still allow many interesting uses of shared state, such as going beyond invariant and monotonic usages. This thesis describes the three core mechanisms that enable our type-based technique to work: 1) we show how a protocol models an alias’s perspective on how the shared state evolves and constrains that alias’s interactions with the shared state; 2) we show how protocols can be used while enforcing the agreed interference contract; and finally, 3) we show how to check that all local protocols to some shared state can be safely composed to ensure globally safe interference over that shared memory. The interference caused by shared state is rooted at how the uses of di↵erent aliases to that state may be interleaved (perhaps even in non-deterministic ways) at run-time. Therefore, our technique is mostly agnostic as to whether this interference was the result of alias interleaving caused by sequential or concurrent semantics. We show implementations of our technique in both settings, and highlight their di↵erences. Because sharing is “first-class” (and not tied to a module), we show a polymorphic procedure that enables abstract compositions of protocols. Thus, protocols can be specialized or extended without requiring specific knowledge of the interference produce by other protocols to that state. We show that protocol composition can ensure safety even when considering abstracted protocols. We show that this core composition mechanism is sound, decidable (without the need for manual intervention), and provide an algorithm implementation.
Resumo:
Modern embedded systems embrace many-core shared-memory designs. Due to constrained power and area budgets, most of them feature software-managed scratchpad memories instead of data caches to increase the data locality. It is therefore programmers’ responsibility to explicitly manage the memory transfers, and this make programming these platform cumbersome. Moreover, complex modern applications must be adequately parallelized before they can the parallel potential of the platform into actual performance. To support this, programming languages were proposed, which work at a high level of abstraction, and rely on a runtime whose cost hinders performance, especially in embedded systems, where resources and power budget are constrained. This dissertation explores the applicability of the shared-memory paradigm on modern many-core systems, focusing on the ease-of-programming. It focuses on OpenMP, the de-facto standard for shared memory programming. In a first part, the cost of algorithms for synchronization and data partitioning are analyzed, and they are adapted to modern embedded many-cores. Then, the original design of an OpenMP runtime library is presented, which supports complex forms of parallelism such as multi-level and irregular parallelism. In the second part of the thesis, the focus is on heterogeneous systems, where hardware accelerators are coupled to (many-)cores to implement key functional kernels with orders-of-magnitude of speedup and energy efficiency compared to the “pure software” version. However, three main issues rise, namely i) platform design complexity, ii) architectural scalability and iii) programmability. To tackle them, a template for a generic hardware processing unit (HWPU) is proposed, which share the memory banks with cores, and the template for a scalable architecture is shown, which integrates them through the shared-memory system. Then, a full software stack and toolchain are developed to support platform design and to let programmers exploiting the accelerators of the platform. The OpenMP frontend is extended to interact with it.
Resumo:
In questa tesi sono stati apportati due importanti contributi nel campo degli acceleratori embedded many-core. Abbiamo implementato un runtime OpenMP ottimizzato per la gestione del tasking model per sistemi a processori strettamente accoppiati in cluster e poi interconnessi attraverso una network on chip. Ci siamo focalizzati sulla loro scalabilità e sul supporto di task di granularità fine, come è tipico nelle applicazioni embedded. Il secondo contributo di questa tesi è stata proporre una estensione del runtime di OpenMP che cerca di prevedere la manifestazione di errori dati da fenomeni di variability tramite una schedulazione efficiente del carico di lavoro.
Resumo:
The goal of the RAP-WAM AND-parallel Prolog abstract architecture is to provide inference speeds significantly beyond those of sequential systems, while supporting Prolog semantics and preserving sequential performance and storage efficiency. This paper presents simulation results supporting these claims with special emphasis on memory performance on a two-level sharedmemory multiprocessor organization. Several solutions to the cache coherency problem are analyzed. It is shown that RAP-WAM offers good locality and storage efficiency and that it can effectively take advantage of broadcast caches. It is argued that speeds in excess of 2 ML IPS on real applications exhibiting medium parallelism can be attained with current technology.
Resumo:
This study is concerned with several proposals concerning multiprocessor systems and with the various possible methods of evaluating such proposals. After a discussion of the advantages and disadvantages of several performance evaluation tools, the author decides that simulation is the only tool powerful enough to develop a model which would be of practical use, in the design, comparison and extension of systems. The main aims of the simulation package developed as part of this study are cost effectiveness, ease of use and generality. The methodology on which the simulation package is based is described in detail. The fundamental principles are that model design should reflect actual systems design, that measuring procedures should be carried out alongside design that models should be well documented and easily adaptable and that models should be dynamic. The simulation package itself is modular, and in this way reflects current design trends. This approach also aids documentation and ensures that the model is easily adaptable. It contains a skeleton structure and a library of segments which can be added to or directly swapped with segments of the skeleton structure, to form a model which fits a user's requirements. The study also contains the results of some experimental work carried out using the model, the first part of which tests• the model's capabilities by simulating a large operating system, the ICL George 3 system; the second part deals with general questions and some of the many proposals concerning multiprocessor systems.
Resumo:
The last decade has witnessed a major shift towards the deployment of embedded applications on multi-core platforms. However, real-time applications have not been able to fully benefit from this transition, as the computational gains offered by multi-cores are often offset by performance degradation due to shared resources, such as main memory. To efficiently use multi-core platforms for real-time systems, it is hence essential to tightly bound the interference when accessing shared resources. Although there has been much recent work in this area, a remaining key problem is to address the diversity of memory arbiters in the analysis to make it applicable to a wide range of systems. This work handles diverse arbiters by proposing a general framework to compute the maximum interference caused by the shared memory bus and its impact on the execution time of the tasks running on the cores, considering different bus arbiters. Our novel approach clearly demarcates the arbiter-dependent and independent stages in the analysis of these upper bounds. The arbiter-dependent phase takes the arbiter and the task memory-traffic pattern as inputs and produces a model of the availability of the bus to a given task. Then, based on the availability of the bus, the arbiter-independent phase determines the worst-case request-release scenario that maximizes the interference experienced by the tasks due to the contention for the bus. We show that the framework addresses the diversity problem by applying it to a memory bus shared by a fixed-priority arbiter, a time-division multiplexing (TDM) arbiter, and an unspecified work-conserving arbiter using applications from the MediaBench test suite. We also experimentally evaluate the quality of the analysis by comparison with a state-of-the-art TDM analysis approach and consistently showing a considerable reduction in maximum interference.
Resumo:
Das hier frei verfügbare Skript gehört zu einer gleichnamigen Vorlesung, die von Prof. Dr. Lutz Wegner bis zum Sommersemester 2007 gehalten wurde. Davor lief sie bis 1999 unter dem etwas irreführenden Titel „Ausgewählte Themen zu Rechnernetzen“. Behandelt wird die IPC in UNIX-basierten Rechnernetzen. Dazu gehören allgemeine Kenntnisse der Prozessumgebung, die fork- und exec-Systemaufrufe, Lock Files, Signale, Pipes, das Botschaftenkonzept (message queues), Semaphore, Shared Memory, Remote Procedure Calls, Sockets und Threads. Jedes Konzept wird mit kleinen Beispielen besprochen, die in C geschrieben sind. Der Quelltext liegt auf unseren Anlagen vor (für AIX, LINUX, Solaris). Grundlage der Vorlesung und des Skripts ist das ausgezeichnete Buch von John Shapley Gray „Interprocess Communications in UNIX“ aus dem Jahr 1998 bzw. die auf Linux angepasste Auflage desselben Buches „Interprocess Communications in LINUX“ aus dem Jahr 2003.
Resumo:
An approximate analytic model of a shared memory multiprocessor with a Cache Only Memory Architecture (COMA), the busbased Data Difussion Machine (DDM), is presented and validated. It describes the timing and interference in the system as a function of the hardware, the protocols, the topology and the workload. Model results have been compared to results from an independent simulator. The comparison shows good model accuracy specially for non-saturated systems, where the errors in response times and device utilizations are independent of the number of processors and remain below 10% in 90% of the simulations. Therefore, the model can be used as an average performance prediction tool that avoids expensive simulations in the design of systems with many processors.
Resumo:
Hardware vendors make an important effort creating low-power CPUs that keep battery duration and durability above acceptable levels. In order to achieve this goal and provide good performance-energy for a wide variety of applications, ARM designed the big.LITTLE architecture. This heterogeneous multi-core architecture features two different types of cores: big cores oriented to performance and little cores, slower and aimed to save energy consumption. As all the cores have access to the same memory, multi-threaded applications must resort to some mutual exclusion mechanism to coordinate the access to shared data by the concurrent threads. Transactional Memory (TM) represents an optimistic approach for shared-memory synchronization. To take full advantage of the features offered by software TM, but also benefit from the characteristics of the heterogeneous big.LITTLE architectures, our focus is to propose TM solutions that take into account the power/performance requirements of the application and what it is offered by the architecture. In order to understand the current state-of-the-art and obtain useful information for future power-aware software TM solutions, we have performed an analysis of a popular TM library running on top of an ARM big.LITTLE processor. Experiments show, in general, better scalability for the LITTLE cores for most of the applications except for one, which requires the computing performance that the big cores offer.
Resumo:
OctVCE is a cartesian cell CFD code produced especially for numerical simulations of shock and blast wave interactions with complex geometries, in particular, from explosions. Virtual Cell Embedding (VCE) was chosen as its cartesian cell kernel for its simplicity and sufficiency for practical engineering design problems. The code uses a finite-volume formulation of the unsteady Euler equations with a second order explicit Runge-Kutta Godonov (MUSCL) scheme. Gradients are calculated using a least-squares method with a minmod limiter. Flux solvers used are AUSM, AUSMDV and EFM. No fluid-structure coupling or chemical reactions are allowed, but gas models can be perfect gas and JWL or JWLB for the explosive products. This report also describes the code’s ‘octree’ mesh adaptive capability and point-inclusion query procedures for the VCE geometry engine. Finally, some space will also be devoted to describing code parallelization using the shared-memory OpenMP paradigm. The user manual to the code is to be found in the companion report 2007/13.
Resumo:
OctVCE is a cartesian cell CFD code produced especially for numerical simulations of shock and blast wave interactions with complex geometries. Virtual Cell Embedding (VCE) was chosen as its cartesian cell kernel as it is simple to code and sufficient for practical engineering design problems. This also makes the code much more ‘user-friendly’ than structured grid approaches as the gridding process is done automatically. The CFD methodology relies on a finite-volume formulation of the unsteady Euler equations and is solved using a standard explicit Godonov (MUSCL) scheme. Both octree-based adaptive mesh refinement and shared-memory parallel processing capability have also been incorporated. For further details on the theory behind the code, see the companion report 2007/12.