3 resultados para Multi-cores heterogêneos

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Modern embedded systems embrace many-core shared-memory designs. Due to constrained power and area budgets, most of them feature software-managed scratchpad memories instead of data caches to increase the data locality. It is therefore programmers’ responsibility to explicitly manage the memory transfers, and this make programming these platform cumbersome. Moreover, complex modern applications must be adequately parallelized before they can the parallel potential of the platform into actual performance. To support this, programming languages were proposed, which work at a high level of abstraction, and rely on a runtime whose cost hinders performance, especially in embedded systems, where resources and power budget are constrained. This dissertation explores the applicability of the shared-memory paradigm on modern many-core systems, focusing on the ease-of-programming. It focuses on OpenMP, the de-facto standard for shared memory programming. In a first part, the cost of algorithms for synchronization and data partitioning are analyzed, and they are adapted to modern embedded many-cores. Then, the original design of an OpenMP runtime library is presented, which supports complex forms of parallelism such as multi-level and irregular parallelism. In the second part of the thesis, the focus is on heterogeneous systems, where hardware accelerators are coupled to (many-)cores to implement key functional kernels with orders-of-magnitude of speedup and energy efficiency compared to the “pure software” version. However, three main issues rise, namely i) platform design complexity, ii) architectural scalability and iii) programmability. To tackle them, a template for a generic hardware processing unit (HWPU) is proposed, which share the memory banks with cores, and the template for a scalable architecture is shown, which integrates them through the shared-memory system. Then, a full software stack and toolchain are developed to support platform design and to let programmers exploiting the accelerators of the platform. The OpenMP frontend is extended to interact with it.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis deals with heterogeneous architectures in standard workstations. Heterogeneous architectures represent an appealing alternative to traditional supercomputers because they are based on commodity components fabricated in large quantities. Hence their price-performance ratio is unparalleled in the world of high performance computing (HPC). In particular, different aspects related to the performance and consumption of heterogeneous architectures have been explored. The thesis initially focuses on an efficient implementation of a parallel application, where the execution time is dominated by an high number of floating point instructions. Then the thesis touches the central problem of efficient management of power peaks in heterogeneous computing systems. Finally it discusses a memory-bounded problem, where the execution time is dominated by the memory latency. Specifically, the following main contributions have been carried out: A novel framework for the design and analysis of solar field for Central Receiver Systems (CRS) has been developed. The implementation based on desktop workstation equipped with multiple Graphics Processing Units (GPUs) is motivated by the need to have an accurate and fast simulation environment for studying mirror imperfection and non-planar geometries. Secondly, a power-aware scheduling algorithm on heterogeneous CPU-GPU architectures, based on an efficient distribution of the computing workload to the resources, has been realized. The scheduler manages the resources of several computing nodes with a view to reducing the peak power. The two main contributions of this work follow: the approach reduces the supply cost due to high peak power whilst having negligible impact on the parallelism of computational nodes. from another point of view the developed model allows designer to increase the number of cores without increasing the capacity of the power supply unit. Finally, an implementation for efficient graph exploration on reconfigurable architectures is presented. The purpose is to accelerate graph exploration, reducing the number of random memory accesses.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Nowadays the rise of non-recurring engineering (NRE) costs associated with complexity is becoming a major factor in SoC design, limiting both scaling opportunities and the flexibility advantages offered by the integration of complex computational units. The introduction of embedded programmable elements can represent an appealing solution, able both to guarantee the desired flexibility and upgradabilty and to widen the SoC market. In particular embedded FPGA (eFPGA) cores can provide bit-level optimization for those applications which benefits from synthesis, paying on the other side in terms of performance penalties and area overhead with respect to standard cell ASIC implementations. In this scenario this thesis proposes a design methodology for a synthesizable programmable device designed to be embedded in a SoC. A soft-core embedded FPGA (eFPGA) is hence presented and analyzed in terms of the opportunities given by a fully synthesizable approach, following an implementation flow based on Standard-Cell methodology. A key point of the proposed eFPGA template is that it adopts a Multi-Stage Switching Network (MSSN) as the foundation of the programmable interconnects, since it can be efficiently synthesized and optimized through a standard cell based implementation flow, ensuring at the same time an intrinsic congestion-free network topology. The evaluation of the flexibility potentialities of the eFPGA has been performed using different technology libraries (STMicroelectronics CMOS 65nm and BCD9s 0.11μm) through a design space exploration in terms of area-speed-leakage tradeoffs, enabled by the full synthesizability of the template. Since the most relevant disadvantage of the adopted soft approach, compared to a hardcore, is represented by a performance overhead increase, the eFPGA analysis has been made targeting small area budgets. The generation of the configuration bitstream has been obtained thanks to the implementation of a custom CAD flow environment, and has allowed functional verification and performance evaluation through an application-aware analysis.