5 resultados para Binary accelerator systems

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

40.00% 40.00%

Publicador:

Resumo:

The new generation of multicore processors opens new perspectives for the design of embedded systems. Multiprocessing, however, poses new challenges to the scheduling of real-time applications, in which the ever-increasing computational demands are constantly flanked by the need of meeting critical time constraints. Many research works have contributed to this field introducing new advanced scheduling algorithms. However, despite many of these works have solidly demonstrated their effectiveness, the actual support for multiprocessor real-time scheduling offered by current operating systems is still very limited. This dissertation deals with implementative aspects of real-time schedulers in modern embedded multiprocessor systems. The first contribution is represented by an open-source scheduling framework, which is capable of realizing complex multiprocessor scheduling policies, such as G-EDF, on conventional operating systems exploiting only their native scheduler from user-space. A set of experimental evaluations compare the proposed solution to other research projects that pursue the same goals by means of kernel modifications, highlighting comparable scheduling performances. The principles that underpin the operation of the framework, originally designed for symmetric multiprocessors, have been further extended first to asymmetric ones, which are subjected to major restrictions such as the lack of support for task migrations, and later to re-programmable hardware architectures (FPGAs). In the latter case, this work introduces a scheduling accelerator, which offloads most of the scheduling operations to the hardware and exhibits extremely low scheduling jitter. The realization of a portable scheduling framework presented many interesting software challenges. One of these has been represented by timekeeping. In this regard, a further contribution is represented by a novel data structure, called addressable binary heap (ABH). Such ABH, which is conceptually a pointer-based implementation of a binary heap, shows very interesting average and worst-case performances when addressing the problem of tick-less timekeeping of high-resolution timers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Synchronization is a key issue in any communication system, but it becomes fundamental in the navigation systems, which are entirely based on the estimation of the time delay of the signals coming from the satellites. Thus, even if synchronization has been a well known topic for many years, the introduction of new modulations and new physical layer techniques in the modern standards makes the traditional synchronization strategies completely ineffective. For this reason, the design of advanced and innovative techniques for synchronization in modern communication systems, like DVB-SH, DVB-T2, DVB-RCS, WiMAX, LTE, and in the modern navigation system, like Galileo, has been the topic of the activity. Recent years have seen the consolidation of two different trends: the introduction of Orthogonal Frequency Division Multiplexing (OFDM) in the communication systems, and of the Binary Offset Carrier (BOC) modulation in the modern Global Navigation Satellite Systems (GNSS). Thus, a particular attention has been given to the investigation of the synchronization algorithms in these areas.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Millisecond Pulsars (MSPs) are fast rotating, highly magnetized neutron stars. According to the "canonical recycling scenario", MSPs form in binary systems containing a neutron star which is spun up through mass accretion from the evolving companion. Therefore, the final stage consists of a binary made of a MSP and the core of the deeply peeled companion. In the last years, however an increasing number of systems deviating from these expectations has been discovered, thus strongly indicating that our understanding of MSPs is far to be complete. The identification of the optical companions to binary MSPs is crucial to constrain the formation and evolution of these objects. In dense environments such as Globular Clusters (GCs), it also allows us to get insights on the cluster internal dynamics. By using deep photometric data, acquired both from space and ground-based telescopes, we identified 5 new companions to MSPs. Three of them being located in GCs and two in the Galactic Field. The three new identifications in GCs increased by 50% the number of such objects known before this Thesis. They all are non-degenerate stars, at odds with the expectations of the "canonical recycling scenario". These results therefore suggest either that transitory phases should also be taken into account, or that dynamical processes, as exchange interactions, play a crucial role in the evolution of MSPs. We also performed a spectroscopic follow-up of the companion to PSRJ1740-5340A in the GC NGC 6397, confirming that it is a deeply peeled star descending from a ~0.8Msun progenitor. This nicely confirms the theoretical expectations about the formation and evolution of MSPs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Embedding intelligence in extreme edge devices allows distilling raw data acquired from sensors into actionable information, directly on IoT end-nodes. This computing paradigm, in which end-nodes no longer depend entirely on the Cloud, offers undeniable benefits, driving a large research area (TinyML) to deploy leading Machine Learning (ML) algorithms on micro-controller class of devices. To fit the limited memory storage capability of these tiny platforms, full-precision Deep Neural Networks (DNNs) are compressed by representing their data down to byte and sub-byte formats, in the integer domain. However, the current generation of micro-controller systems can barely cope with the computing requirements of QNNs. This thesis tackles the challenge from many perspectives, presenting solutions both at software and hardware levels, exploiting parallelism, heterogeneity and software programmability to guarantee high flexibility and high energy-performance proportionality. The first contribution, PULP-NN, is an optimized software computing library for QNN inference on parallel ultra-low-power (PULP) clusters of RISC-V processors, showing one order of magnitude improvements in performance and energy efficiency, compared to current State-of-the-Art (SoA) STM32 micro-controller systems (MCUs) based on ARM Cortex-M cores. The second contribution is XpulpNN, a set of RISC-V domain specific instruction set architecture (ISA) extensions to deal with sub-byte integer arithmetic computation. The solution, including the ISA extensions and the micro-architecture to support them, achieves energy efficiency comparable with dedicated DNN accelerators and surpasses the efficiency of SoA ARM Cortex-M based MCUs, such as the low-end STM32M4 and the high-end STM32H7 devices, by up to three orders of magnitude. To overcome the Von Neumann bottleneck while guaranteeing the highest flexibility, the final contribution integrates an Analog In-Memory Computing accelerator into the PULP cluster, creating a fully programmable heterogeneous fabric that demonstrates end-to-end inference capabilities of SoA MobileNetV2 models, showing two orders of magnitude performance improvements over current SoA analog/digital solutions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Embedded systems are increasingly integral to daily life, improving and facilitating the efficiency of modern Cyber-Physical Systems which provide access to sensor data, and actuators. As modern architectures become increasingly complex and heterogeneous, their optimization becomes a challenging task. Additionally, ensuring platform security is important to avoid harm to individuals and assets. This study primarily addresses challenges in contemporary Embedded Systems, focusing on platform optimization and security enforcement. The initial section of this study delves into the application of machine learning methods to efficiently determine the optimal number of cores for a parallel RISC-V cluster to minimize energy consumption using static source code analysis. Results demonstrate that automated platform configuration is not only viable but also that there is a moderate performance trade-off when relying solely on static features. The second part focuses on addressing the problem of heterogeneous device mapping, which involves assigning tasks to the most suitable computational device in a heterogeneous platform for optimal runtime. The contribution of this section lies in the introduction of novel pre-processing techniques, along with a training framework called Siamese Networks, that enhances the classification performance of DeepLLVM, an advanced approach for task mapping. Importantly, these proposed approaches are independent from the specific deep-learning model used. Finally, this research work focuses on addressing issues concerning the binary exploitation of software running in modern Embedded Systems. It proposes an architecture to implement Control-Flow Integrity in embedded platforms with a Root-of-Trust, aiming to enhance security guarantees with limited hardware modifications. The approach involves enhancing the architecture of a modern RISC-V platform for autonomous vehicles by implementing a side-channel communication mechanism that relays control-flow changes executed by the process running on the host core to the Root-of-Trust. This approach has limited impact on performance and it is effective in enhancing the security of embedded platforms.