6 resultados para Many-core

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Despite the several issues faced in the past, the evolutionary trend of silicon has kept its constant pace. Today an ever increasing number of cores is integrated onto the same die. Unfortunately, the extraordinary performance achievable by the many-core paradigm is limited by several factors. Memory bandwidth limitation, combined with inefficient synchronization mechanisms, can severely overcome the potential computation capabilities. Moreover, the huge HW/SW design space requires accurate and flexible tools to perform architectural explorations and validation of design choices. In this thesis we focus on the aforementioned aspects: a flexible and accurate Virtual Platform has been developed, targeting a reference many-core architecture. Such tool has been used to perform architectural explorations, focusing on instruction caching architecture and hybrid HW/SW synchronization mechanism. Beside architectural implications, another issue of embedded systems is considered: energy efficiency. Near Threshold Computing is a key research area in the Ultra-Low-Power domain, as it promises a tenfold improvement in energy efficiency compared to super-threshold operation and it mitigates thermal bottlenecks. The physical implications of modern deep sub-micron technology are severely limiting performance and reliability of modern designs. Reliability becomes a major obstacle when operating in NTC, especially memory operation becomes unreliable and can compromise system correctness. In the present work a novel hybrid memory architecture is devised to overcome reliability issues and at the same time improve energy efficiency by means of aggressive voltage scaling when allowed by workload requirements. Variability is another great drawback of near-threshold operation. The greatly increased sensitivity to threshold voltage variations in today a major concern for electronic devices. We introduce a variation-tolerant extension of the baseline many-core architecture. By means of micro-architectural knobs and a lightweight runtime control unit, the baseline architecture becomes dynamically tolerant to variations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

During the last few decades an unprecedented technological growth has been at the center of the embedded systems design paramount, with Moore’s Law being the leading factor of this trend. Today in fact an ever increasing number of cores can be integrated on the same die, marking the transition from state-of-the-art multi-core chips to the new many-core design paradigm. Despite the extraordinarily high computing power, the complexity of many-core chips opens the door to several challenges. As a result of the increased silicon density of modern Systems-on-a-Chip (SoC), the design space exploration needed to find the best design has exploded and hardware designers are in fact facing the problem of a huge design space. Virtual Platforms have always been used to enable hardware-software co-design, but today they are facing with the huge complexity of both hardware and software systems. In this thesis two different research works on Virtual Platforms are presented: the first one is intended for the hardware developer, to easily allow complex cycle accurate simulations of many-core SoCs. The second work exploits the parallel computing power of off-the-shelf General Purpose Graphics Processing Units (GPGPUs), with the goal of an increased simulation speed. The term Virtualization can be used in the context of many-core systems not only to refer to the aforementioned hardware emulation tools (Virtual Platforms), but also for two other main purposes: 1) to help the programmer to achieve the maximum possible performance of an application, by hiding the complexity of the underlying hardware. 2) to efficiently exploit the high parallel hardware of many-core chips in environments with multiple active Virtual Machines. This thesis is focused on virtualization techniques with the goal to mitigate, and overtake when possible, some of the challenges introduced by the many-core design paradigm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Combinatorial Optimization is becoming ever more crucial, in these days. From natural sciences to economics, passing through urban centers administration and personnel management, methodologies and algorithms with a strong theoretical background and a consolidated real-word effectiveness is more and more requested, in order to find, quickly, good solutions to complex strategical problems. Resource optimization is, nowadays, a fundamental ground for building the basements of successful projects. From the theoretical point of view, Combinatorial Optimization rests on stable and strong foundations, that allow researchers to face ever more challenging problems. However, from the application point of view, it seems that the rate of theoretical developments cannot cope with that enjoyed by modern hardware technologies, especially with reference to the one of processors industry. In this work we propose new parallel algorithms, designed for exploiting the new parallel architectures available on the market. We found that, exposing the inherent parallelism of some resolution techniques (like Dynamic Programming), the computational benefits are remarkable, lowering the execution times by more than an order of magnitude, and allowing to address instances with dimensions not possible before. We approached four Combinatorial Optimization’s notable problems: Packing Problem, Vehicle Routing Problem, Single Source Shortest Path Problem and a Network Design problem. For each of these problems we propose a collection of effective parallel solution algorithms, either for solving the full problem (Guillotine Cuts and SSSPP) or for enhancing a fundamental part of the solution method (VRP and ND). We endorse our claim by presenting computational results for all problems, either on standard benchmarks from the literature or, when possible, on data from real-world applications, where speed-ups of one order of magnitude are usually attained, not uncommonly scaling up to 40 X factors.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Modern embedded systems embrace many-core shared-memory designs. Due to constrained power and area budgets, most of them feature software-managed scratchpad memories instead of data caches to increase the data locality. It is therefore programmers’ responsibility to explicitly manage the memory transfers, and this make programming these platform cumbersome. Moreover, complex modern applications must be adequately parallelized before they can the parallel potential of the platform into actual performance. To support this, programming languages were proposed, which work at a high level of abstraction, and rely on a runtime whose cost hinders performance, especially in embedded systems, where resources and power budget are constrained. This dissertation explores the applicability of the shared-memory paradigm on modern many-core systems, focusing on the ease-of-programming. It focuses on OpenMP, the de-facto standard for shared memory programming. In a first part, the cost of algorithms for synchronization and data partitioning are analyzed, and they are adapted to modern embedded many-cores. Then, the original design of an OpenMP runtime library is presented, which supports complex forms of parallelism such as multi-level and irregular parallelism. In the second part of the thesis, the focus is on heterogeneous systems, where hardware accelerators are coupled to (many-)cores to implement key functional kernels with orders-of-magnitude of speedup and energy efficiency compared to the “pure software” version. However, three main issues rise, namely i) platform design complexity, ii) architectural scalability and iii) programmability. To tackle them, a template for a generic hardware processing unit (HWPU) is proposed, which share the memory banks with cores, and the template for a scalable architecture is shown, which integrates them through the shared-memory system. Then, a full software stack and toolchain are developed to support platform design and to let programmers exploiting the accelerators of the platform. The OpenMP frontend is extended to interact with it.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The application of Concurrency Theory to Systems Biology is in its earliest stage of progress. The metaphor of cells as computing systems by Regev and Shapiro opened the employment of concurrent languages for the modelling of biological systems. Their peculiar characteristics led to the design of many bio-inspired formalisms which achieve higher faithfulness and specificity. In this thesis we present pi@, an extremely simple and conservative extension of the pi-calculus representing a keystone in this respect, thanks to its expressiveness capabilities. The pi@ calculus is obtained by the addition of polyadic synchronisation and priority to the pi-calculus, in order to achieve compartment semantics and atomicity of complex operations respectively. In its direct application to biological modelling, the stochastic variant of the calculus, Spi@, is shown able to model consistently several phenomena such as formation of molecular complexes, hierarchical subdivision of the system into compartments, inter-compartment reactions, dynamic reorganisation of compartment structure consistent with volume variation. The pivotal role of pi@ is evidenced by its capability of encoding in a compositional way several bio-inspired formalisms, so that it represents the optimal core of a framework for the analysis and implementation of bio-inspired languages. In this respect, the encodings of BioAmbients, Brane Calculi and a variant of P Systems in pi@ are formalised. The conciseness of their translation in pi@ allows their indirect comparison by means of their encodings. Furthermore it provides a ready-to-run implementation of minimal effort whose correctness is granted by the correctness of the respective encoding functions. Further important results of general validity are stated on the expressive power of priority. Several impossibility results are described, which clearly state the superior expressiveness of prioritised languages and the problems arising in the attempt of providing their parallel implementation. To this aim, a new setting in distributed computing (the last man standing problem) is singled out and exploited to prove the impossibility of providing a purely parallel implementation of priority by means of point-to-point or broadcast communication.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Spinal cord injury (SCI) results not only in paralysis; but it is also associated with a range of autonomic dysregulation that can interfere with cardiovascular, bladder, bowel, temperature, and sexual function. The entity of the autonomic dysfunction is related to the level and severity of injury to descending autonomic (sympathetic) pathways. For many years there was limited awareness of these issues and the attention given to them by the scientific and medical community was scarce. Yet, even if a new system to document the impact of SCI on autonomic function has recently been proposed, the current standard of assessment of SCI (American Spinal Injury Association (ASIA) examination) evaluates motor and sensory pathways, but not severity of injury to autonomic pathways. Beside the severe impact on quality of life, autonomic dysfunction in persons with SCI is associated with increased risk of cardiovascular disease and mortality. Therefore, obtaining information regarding autonomic function in persons with SCI is pivotal and clinical examinations and laboratory evaluations to detect the presence of autonomic dysfunction and quantitate its severity are mandatory. Furthermore, previous studies demonstrated that there is an intimate relationship between the autonomic nervous system and sleep from anatomical, physiological, and neurochemical points of view. Although, even if previous epidemiological studies demonstrated that sleep problems are common in spinal cord injury (SCI), so far only limited polysomnographic (PSG) data are available. Finally, until now, circadian and state dependent autonomic regulation of blood pressure (BP), heart rate (HR) and body core temperature (BcT) were never assessed in SCI patients. Aim of the current study was to establish the association between the autonomic control of the cardiovascular function and thermoregulation, sleep parameters and increased cardiovascular risk in SCI patients.