3 resultados para multi-cervical unit
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
Modern embedded systems embrace many-core shared-memory designs. Due to constrained power and area budgets, most of them feature software-managed scratchpad memories instead of data caches to increase the data locality. It is therefore programmers’ responsibility to explicitly manage the memory transfers, and this make programming these platform cumbersome. Moreover, complex modern applications must be adequately parallelized before they can the parallel potential of the platform into actual performance. To support this, programming languages were proposed, which work at a high level of abstraction, and rely on a runtime whose cost hinders performance, especially in embedded systems, where resources and power budget are constrained. This dissertation explores the applicability of the shared-memory paradigm on modern many-core systems, focusing on the ease-of-programming. It focuses on OpenMP, the de-facto standard for shared memory programming. In a first part, the cost of algorithms for synchronization and data partitioning are analyzed, and they are adapted to modern embedded many-cores. Then, the original design of an OpenMP runtime library is presented, which supports complex forms of parallelism such as multi-level and irregular parallelism. In the second part of the thesis, the focus is on heterogeneous systems, where hardware accelerators are coupled to (many-)cores to implement key functional kernels with orders-of-magnitude of speedup and energy efficiency compared to the “pure software” version. However, three main issues rise, namely i) platform design complexity, ii) architectural scalability and iii) programmability. To tackle them, a template for a generic hardware processing unit (HWPU) is proposed, which share the memory banks with cores, and the template for a scalable architecture is shown, which integrates them through the shared-memory system. Then, a full software stack and toolchain are developed to support platform design and to let programmers exploiting the accelerators of the platform. The OpenMP frontend is extended to interact with it.
Resumo:
This thesis proposes an integrated holistic approach to the study of neuromuscular fatigue in order to encompass all the causes and all the consequences underlying the phenomenon. Starting from the metabolic processes occurring at the cellular level, the reader is guided toward the physiological changes at the motorneuron and motor unit level and from this to the more general biomechanical alterations. In Chapter 1 a list of the various definitions for fatigue spanning several contexts has been reported. In Chapter 2, the electrophysiological changes in terms of motor unit behavior and descending neural drive to the muscle have been studied extensively as well as the biomechanical adaptations induced. In Chapter 3 a study based on the observation of temporal features extracted from sEMG signals has been reported leading to the need of a more robust and reliable indicator during fatiguing tasks. Therefore, in Chapter 4, a novel bi-dimensional parameter is proposed. The study on sEMG-based indicators opened a scenario also on neurophysiological mechanisms underlying fatigue. For this purpose, in Chapter 5, a protocol designed for the analysis of motor unit-related parameters during prolonged fatiguing contractions is presented. In particular, two methodologies have been applied to multichannel sEMG recordings of isometric contractions of the Tibialis Anterior muscle: the state-of-the-art technique for sEMG decomposition and a coherence analysis on MU spike trains. The importance of a multi-scale approach has been finally highlighted in the context of the evaluation of cycling performance, where fatigue is one of the limiting factors. In particular, the last chapter of this thesis can be considered as a paradigm: physiological, metabolic, environmental, psychological and biomechanical factors influence the performance of a cyclist and only when all of these are kept together in a novel integrative way it is possible to derive a clear model and make correct assessments.
Resumo:
This thesis deals with heterogeneous architectures in standard workstations. Heterogeneous architectures represent an appealing alternative to traditional supercomputers because they are based on commodity components fabricated in large quantities. Hence their price-performance ratio is unparalleled in the world of high performance computing (HPC). In particular, different aspects related to the performance and consumption of heterogeneous architectures have been explored. The thesis initially focuses on an efficient implementation of a parallel application, where the execution time is dominated by an high number of floating point instructions. Then the thesis touches the central problem of efficient management of power peaks in heterogeneous computing systems. Finally it discusses a memory-bounded problem, where the execution time is dominated by the memory latency. Specifically, the following main contributions have been carried out: A novel framework for the design and analysis of solar field for Central Receiver Systems (CRS) has been developed. The implementation based on desktop workstation equipped with multiple Graphics Processing Units (GPUs) is motivated by the need to have an accurate and fast simulation environment for studying mirror imperfection and non-planar geometries. Secondly, a power-aware scheduling algorithm on heterogeneous CPU-GPU architectures, based on an efficient distribution of the computing workload to the resources, has been realized. The scheduler manages the resources of several computing nodes with a view to reducing the peak power. The two main contributions of this work follow: the approach reduces the supply cost due to high peak power whilst having negligible impact on the parallelism of computational nodes. from another point of view the developed model allows designer to increase the number of cores without increasing the capacity of the power supply unit. Finally, an implementation for efficient graph exploration on reconfigurable architectures is presented. The purpose is to accelerate graph exploration, reducing the number of random memory accesses.