19 resultados para Simplex. CPLEXR. Parallel Efficiency. Parallel Scalability. Linear Programming
Resumo:
Modern embedded systems embrace many-core shared-memory designs. Due to constrained power and area budgets, most of them feature software-managed scratchpad memories instead of data caches to increase the data locality. It is therefore programmers’ responsibility to explicitly manage the memory transfers, and this make programming these platform cumbersome. Moreover, complex modern applications must be adequately parallelized before they can the parallel potential of the platform into actual performance. To support this, programming languages were proposed, which work at a high level of abstraction, and rely on a runtime whose cost hinders performance, especially in embedded systems, where resources and power budget are constrained. This dissertation explores the applicability of the shared-memory paradigm on modern many-core systems, focusing on the ease-of-programming. It focuses on OpenMP, the de-facto standard for shared memory programming. In a first part, the cost of algorithms for synchronization and data partitioning are analyzed, and they are adapted to modern embedded many-cores. Then, the original design of an OpenMP runtime library is presented, which supports complex forms of parallelism such as multi-level and irregular parallelism. In the second part of the thesis, the focus is on heterogeneous systems, where hardware accelerators are coupled to (many-)cores to implement key functional kernels with orders-of-magnitude of speedup and energy efficiency compared to the “pure software” version. However, three main issues rise, namely i) platform design complexity, ii) architectural scalability and iii) programmability. To tackle them, a template for a generic hardware processing unit (HWPU) is proposed, which share the memory banks with cores, and the template for a scalable architecture is shown, which integrates them through the shared-memory system. Then, a full software stack and toolchain are developed to support platform design and to let programmers exploiting the accelerators of the platform. The OpenMP frontend is extended to interact with it.
Resumo:
Despite the several issues faced in the past, the evolutionary trend of silicon has kept its constant pace. Today an ever increasing number of cores is integrated onto the same die. Unfortunately, the extraordinary performance achievable by the many-core paradigm is limited by several factors. Memory bandwidth limitation, combined with inefficient synchronization mechanisms, can severely overcome the potential computation capabilities. Moreover, the huge HW/SW design space requires accurate and flexible tools to perform architectural explorations and validation of design choices. In this thesis we focus on the aforementioned aspects: a flexible and accurate Virtual Platform has been developed, targeting a reference many-core architecture. Such tool has been used to perform architectural explorations, focusing on instruction caching architecture and hybrid HW/SW synchronization mechanism. Beside architectural implications, another issue of embedded systems is considered: energy efficiency. Near Threshold Computing is a key research area in the Ultra-Low-Power domain, as it promises a tenfold improvement in energy efficiency compared to super-threshold operation and it mitigates thermal bottlenecks. The physical implications of modern deep sub-micron technology are severely limiting performance and reliability of modern designs. Reliability becomes a major obstacle when operating in NTC, especially memory operation becomes unreliable and can compromise system correctness. In the present work a novel hybrid memory architecture is devised to overcome reliability issues and at the same time improve energy efficiency by means of aggressive voltage scaling when allowed by workload requirements. Variability is another great drawback of near-threshold operation. The greatly increased sensitivity to threshold voltage variations in today a major concern for electronic devices. We introduce a variation-tolerant extension of the baseline many-core architecture. By means of micro-architectural knobs and a lightweight runtime control unit, the baseline architecture becomes dynamically tolerant to variations.
Resumo:
In the past two decades the work of a growing portion of researchers in robotics focused on a particular group of machines, belonging to the family of parallel manipulators: the cable robots. Although these robots share several theoretical elements with the better known parallel robots, they still present completely (or partly) unsolved issues. In particular, the study of their kinematic, already a difficult subject for conventional parallel manipulators, is further complicated by the non-linear nature of cables, which can exert only efforts of pure traction. The work presented in this thesis therefore focuses on the study of the kinematics of these robots and on the development of numerical techniques able to address some of the problems related to it. Most of the work is focused on the development of an interval-analysis based procedure for the solution of the direct geometric problem of a generic cable manipulator. This technique, as well as allowing for a rapid solution of the problem, also guarantees the results obtained against rounding and elimination errors and can take into account any uncertainties in the model of the problem. The developed code has been tested with the help of a small manipulator whose realization is described in this dissertation together with the auxiliary work done during its design and simulation phases.
Resumo:
Embedded systems are increasingly integral to daily life, improving and facilitating the efficiency of modern Cyber-Physical Systems which provide access to sensor data, and actuators. As modern architectures become increasingly complex and heterogeneous, their optimization becomes a challenging task. Additionally, ensuring platform security is important to avoid harm to individuals and assets. This study primarily addresses challenges in contemporary Embedded Systems, focusing on platform optimization and security enforcement. The initial section of this study delves into the application of machine learning methods to efficiently determine the optimal number of cores for a parallel RISC-V cluster to minimize energy consumption using static source code analysis. Results demonstrate that automated platform configuration is not only viable but also that there is a moderate performance trade-off when relying solely on static features. The second part focuses on addressing the problem of heterogeneous device mapping, which involves assigning tasks to the most suitable computational device in a heterogeneous platform for optimal runtime. The contribution of this section lies in the introduction of novel pre-processing techniques, along with a training framework called Siamese Networks, that enhances the classification performance of DeepLLVM, an advanced approach for task mapping. Importantly, these proposed approaches are independent from the specific deep-learning model used. Finally, this research work focuses on addressing issues concerning the binary exploitation of software running in modern Embedded Systems. It proposes an architecture to implement Control-Flow Integrity in embedded platforms with a Root-of-Trust, aiming to enhance security guarantees with limited hardware modifications. The approach involves enhancing the architecture of a modern RISC-V platform for autonomous vehicles by implementing a side-channel communication mechanism that relays control-flow changes executed by the process running on the host core to the Root-of-Trust. This approach has limited impact on performance and it is effective in enhancing the security of embedded platforms.