2 resultados para Cache Memories
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
Cost, performance and availability considerations are forcing even the most conservative high-integrity embedded real-time systems industry to migrate from simple hardware processors to ones equipped with caches and other acceleration features. This migration disrupts the practices and solutions that industry had developed and consolidated over the years to perform timing analysis. Industry that are confident with the efficiency/effectiveness of their verification and validation processes for old-generation processors, do not have sufficient insight on the effects of the migration to cache-equipped processors. Caches are perceived as an additional source of complexity, which has potential for shattering the guarantees of cost- and schedule-constrained qualification of their systems. The current industrial approach to timing analysis is ill-equipped to cope with the variability incurred by caches. Conversely, the application of advanced WCET analysis techniques on real-world industrial software, developed without analysability in mind, is hardly feasible. We propose a development approach aimed at minimising the cache jitters, as well as at enabling the application of advanced WCET analysis techniques to industrial systems. Our approach builds on:(i) identification of those software constructs that may impede or complicate timing analysis in industrial-scale systems; (ii) elaboration of practical means, under the model-driven engineering (MDE) paradigm, to enforce the automated generation of software that is analyzable by construction; (iii) implementation of a layout optimisation method to remove cache jitters stemming from the software layout in memory, with the intent of facilitating incremental software development, which is of high strategic interest to industry. The integration of those constituents in a structured approach to timing analysis achieves two interesting properties: the resulting software is analysable from the earliest releases onwards - as opposed to becoming so only when the system is final - and more easily amenable to advanced timing analysis by construction, regardless of the system scale and complexity.
Resumo:
Modern embedded systems embrace many-core shared-memory designs. Due to constrained power and area budgets, most of them feature software-managed scratchpad memories instead of data caches to increase the data locality. It is therefore programmers’ responsibility to explicitly manage the memory transfers, and this make programming these platform cumbersome. Moreover, complex modern applications must be adequately parallelized before they can the parallel potential of the platform into actual performance. To support this, programming languages were proposed, which work at a high level of abstraction, and rely on a runtime whose cost hinders performance, especially in embedded systems, where resources and power budget are constrained. This dissertation explores the applicability of the shared-memory paradigm on modern many-core systems, focusing on the ease-of-programming. It focuses on OpenMP, the de-facto standard for shared memory programming. In a first part, the cost of algorithms for synchronization and data partitioning are analyzed, and they are adapted to modern embedded many-cores. Then, the original design of an OpenMP runtime library is presented, which supports complex forms of parallelism such as multi-level and irregular parallelism. In the second part of the thesis, the focus is on heterogeneous systems, where hardware accelerators are coupled to (many-)cores to implement key functional kernels with orders-of-magnitude of speedup and energy efficiency compared to the “pure software” version. However, three main issues rise, namely i) platform design complexity, ii) architectural scalability and iii) programmability. To tackle them, a template for a generic hardware processing unit (HWPU) is proposed, which share the memory banks with cores, and the template for a scalable architecture is shown, which integrates them through the shared-memory system. Then, a full software stack and toolchain are developed to support platform design and to let programmers exploiting the accelerators of the platform. The OpenMP frontend is extended to interact with it.