11 resultados para parallel systems
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
Despite the several issues faced in the past, the evolutionary trend of silicon has kept its constant pace. Today an ever increasing number of cores is integrated onto the same die. Unfortunately, the extraordinary performance achievable by the many-core paradigm is limited by several factors. Memory bandwidth limitation, combined with inefficient synchronization mechanisms, can severely overcome the potential computation capabilities. Moreover, the huge HW/SW design space requires accurate and flexible tools to perform architectural explorations and validation of design choices. In this thesis we focus on the aforementioned aspects: a flexible and accurate Virtual Platform has been developed, targeting a reference many-core architecture. Such tool has been used to perform architectural explorations, focusing on instruction caching architecture and hybrid HW/SW synchronization mechanism. Beside architectural implications, another issue of embedded systems is considered: energy efficiency. Near Threshold Computing is a key research area in the Ultra-Low-Power domain, as it promises a tenfold improvement in energy efficiency compared to super-threshold operation and it mitigates thermal bottlenecks. The physical implications of modern deep sub-micron technology are severely limiting performance and reliability of modern designs. Reliability becomes a major obstacle when operating in NTC, especially memory operation becomes unreliable and can compromise system correctness. In the present work a novel hybrid memory architecture is devised to overcome reliability issues and at the same time improve energy efficiency by means of aggressive voltage scaling when allowed by workload requirements. Variability is another great drawback of near-threshold operation. The greatly increased sensitivity to threshold voltage variations in today a major concern for electronic devices. We introduce a variation-tolerant extension of the baseline many-core architecture. By means of micro-architectural knobs and a lightweight runtime control unit, the baseline architecture becomes dynamically tolerant to variations.
Resumo:
The term "Brain Imaging" identi�es a set of techniques to analyze the structure and/or functional behavior of the brain in normal and/or pathological situations. These techniques are largely used in the study of brain activity. In addition to clinical usage, analysis of brain activity is gaining popularity in others recent �fields, i.e. Brain Computer Interfaces (BCI) and the study of cognitive processes. In this context, usage of classical solutions (e.g. f MRI, PET-CT) could be unfeasible, due to their low temporal resolution, high cost and limited portability. For these reasons alternative low cost techniques are object of research, typically based on simple recording hardware and on intensive data elaboration process. Typical examples are ElectroEncephaloGraphy (EEG) and Electrical Impedance Tomography (EIT), where electric potential at the patient's scalp is recorded by high impedance electrodes. In EEG potentials are directly generated from neuronal activity, while in EIT by the injection of small currents at the scalp. To retrieve meaningful insights on brain activity from measurements, EIT and EEG relies on detailed knowledge of the underlying electrical properties of the body. This is obtained from numerical models of the electric �field distribution therein. The inhomogeneous and anisotropic electric properties of human tissues make accurate modeling and simulation very challenging, leading to a tradeo�ff between physical accuracy and technical feasibility, which currently severely limits the capabilities of these techniques. Moreover elaboration of data recorded requires usage of regularization techniques computationally intensive, which influences the application with heavy temporal constraints (such as BCI). This work focuses on the parallel implementation of a work-flow for EEG and EIT data processing. The resulting software is accelerated using multi-core GPUs, in order to provide solution in reasonable times and address requirements of real-time BCI systems, without over-simplifying the complexity and accuracy of the head models.
Resumo:
This thesis explores the capabilities of heterogeneous multi-core systems, based on multiple Graphics Processing Units (GPUs) in a standard desktop framework. Multi-GPU accelerated desk side computers are an appealing alternative to other high performance computing (HPC) systems: being composed of commodity hardware components fabricated in large quantities, their price-performance ratio is unparalleled in the world of high performance computing. Essentially bringing “supercomputing to the masses”, this opens up new possibilities for application fields where investing in HPC resources had been considered unfeasible before. One of these is the field of bioelectrical imaging, a class of medical imaging technologies that occupy a low-cost niche next to million-dollar systems like functional Magnetic Resonance Imaging (fMRI). In the scope of this work, several computational challenges encountered in bioelectrical imaging are tackled with this new kind of computing resource, striving to help these methods approach their true potential. Specifically, the following main contributions were made: Firstly, a novel dual-GPU implementation of parallel triangular matrix inversion (TMI) is presented, addressing an crucial kernel in computation of multi-mesh head models of encephalographic (EEG) source localization. This includes not only a highly efficient implementation of the routine itself achieving excellent speedups versus an optimized CPU implementation, but also a novel GPU-friendly compressed storage scheme for triangular matrices. Secondly, a scalable multi-GPU solver for non-hermitian linear systems was implemented. It is integrated into a simulation environment for electrical impedance tomography (EIT) that requires frequent solution of complex systems with millions of unknowns, a task that this solution can perform within seconds. In terms of computational throughput, it outperforms not only an highly optimized multi-CPU reference, but related GPU-based work as well. Finally, a GPU-accelerated graphical EEG real-time source localization software was implemented. Thanks to acceleration, it can meet real-time requirements in unpreceeded anatomical detail running more complex localization algorithms. Additionally, a novel implementation to extract anatomical priors from static Magnetic Resonance (MR) scansions has been included.
Resumo:
The main objective of this work was to investigate the impact of different hybridization concepts and levels of hybridization on fuel economy of a standard road vehicle where both conventional and non-conventional hybrid architectures are treated exactly in the same way from the point of view of overall energy flow optimization. Hybrid component models were developed and presented in detail as well as the simulations results mainly for NEDC cycle. The analysis was performed on four different parallel hybrid powertrain concepts: Hybrid Electric Vehicle (HEV), High Speed Flywheel Hybrid Vehicle (HSF-HV), Hydraulic Hybrid Vehicle (HHV) and Pneumatic Hybrid Vehicle (PHV). In order to perform equitable analysis of different hybrid systems, comparison was performed also on the basis of the same usable system energy storage capacity (i.e. 625kJ for HEV, HSF and the HHV) but in the case of pneumatic hybrid systems maximal storage capacity was limited by the size of the systems in order to comply with the packaging requirements of the vehicle. The simulations were performed within the IAV Gmbh - VeLoDyn software simulator based on Matlab / Simulink software package. Advanced cycle independent control strategy (ECMS) was implemented into the hybrid supervisory control unit in order to solve power management problem for all hybrid powertrain solutions. In order to maintain State of Charge within desired boundaries during different cycles and to facilitate easy implementation and recalibration of the control strategy for very different hybrid systems, Charge Sustaining Algorithm was added into the ECMS framework. Also, a Variable Shift Pattern VSP-ECMS algorithm was proposed as an extension of ECMS capabilities so as to include gear selection into the determination of minimal (energy) cost function of the hybrid system. Further, cycle-based energetic analysis was performed in all the simulated cases, and the results have been reported in the corresponding chapters.
Resumo:
Negli ultimi anni, parallelamente all’espansione del settore biologico, si è assistito a un crescente interesse per i modelli alternativi di garanzia dell’integrità e della genuinità dei prodotti biologici. Gruppi di piccoli agricoltori di tutto il mondo hanno iniziato a sviluppare approcci alternativi per affrontare i problemi connessi alla certificazione di terza parte. Queste pratiche sono note come Sistemi di Garanzia Partecipativa (PGS). Tali modelli: (i) si basano sugli standard di certificazione biologica dell’IFOAM, (ii) riguardano il complesso dei produttori di una comunità rurale, (iii) comportano l’inclusione di una grande varietà di attori e (iv) hanno lo scopo di ridurre al minimo burocrazia e costi semplificando le procedure di verifica e incorporando un elemento di educazione ambientale e sociale sia per i produttori sia per i consumatori. Gli obiettivi di questo lavoro di ricerca: • descrivere il funzionamento dei sistemi di garanzia partecipativa; • indicare i vantaggi della loro adozione nei Paesi in via di sviluppo e non; • illustrare il caso della Rede Ecovida de Agroecologia (Brasile); • offrire uno spunto di riflessione che riguarda il consumatore e la relativa fiducia nel modello PGS. L’impianto teorico fa riferimento alla Teoria delle Convenzioni. Sulla base del quadro teorico è stato costruito un questionario per i consumatori con lo scopo di testare l’appropriatezza delle ipotesi teoriche. I risultati finali riguardano la stima del livello di conoscenza attuale, la fiducia e la volontà d’acquisto dei prodotti PGS da parte dei consumatori nelle aree considerate. Sulla base di questa ricerca sarà possibile adattare ed esportare il modello empirico in altri paesi che presentano economie diverse per cercare di comprendere il potenziale campo di applicazione dei sistemi di garanzia partecipativa.
Resumo:
The promising development in the routine nanofabrication and the increasing knowledge of the working principles of new classes of highly sensitive, label-free and possibly cost-effective bio-nanosensors for the detection of molecules in liquid environment, has rapidly increased the possibility to develop portable sensor devices that could have a great impact on many application fields, such as health-care, environment and food production, thanks to the intrinsic ability of these biosensors to detect, monitor and study events at the nanoscale. Moreover, there is a growing demand for low-cost, compact readout structures able to perform accurate preliminary tests on biosensors and/or to perform routine tests with respect to experimental conditions avoiding skilled personnel and bulky laboratory instruments. This thesis focuses on analysing, designing and testing novel implementation of bio-nanosensors in layered hybrid systems where microfluidic devices and microelectronic systems are fused in compact printed circuit board (PCB) technology. In particular the manuscript presents hybrid systems in two validating cases using nanopore and nanowire technology, demonstrating new features not covered by state of the art technologies and based on the use of two custom integrated circuits (ICs). As far as the nanopores interface system is concerned, an automatic setup has been developed for the concurrent formation of bilayer lipid membranes combined with a custom parallel readout electronic system creating a complete portable platform for nanopores or ion channels studies. On the other hand, referring to the nanowire readout hybrid interface, two systems enabling to perform parallel, real-time, complex impedance measurements based on lock-in technique, as well as impedance spectroscopy measurements have been developed. This feature enable to experimentally investigate the possibility to enrich informations on the bio-nanosensors concurrently acquiring impedance magnitude and phase thus investigating capacitive contributions of bioanalytical interactions on biosensor surface.
Resumo:
Massive parallel robots (MPRs) driven by discrete actuators are force regulated robots that undergo continuous motions despite being commanded through a finite number of states only. Designing a real-time control of such systems requires fast and efficient methods for solving their inverse static analysis (ISA), which is a challenging problem and the subject of this thesis. In particular, five Artificial intelligence methods are proposed to investigate the on-line computation and the generalization error of ISA problem of a class of MPRs featuring three-state force actuators and one degree of revolute motion.
Resumo:
The research work reported in this Thesis was held along two main lines of research. The first and main line of research is about the synthesis of heteroaromatic compounds with increasing steric hindrance, with the aim of preparing stable atropisomers. The main tools used for the study of these dynamic systems, as described in the Introduction, are DNMR, coupled with line shape simulation and DFT calculations, aimed to the conformational analysis for the prediction of the geometries and energy barriers to the trasition states. This techniques have been applied to the research projects about: • atropisomers of arylmaleimides; • atropisomers of 4-arylpyrazolo[3,4-b]pyridines; • study of the intramolecular NO2/CO interaction in solution; • study on 2-arylpyridines. Parallel to the main project, in collaboration with other groups, the research line about determination of the absolute configuration was followed. The products, deriving form organocatalytic reactions, in many cases couldn’t be analyzed by means of X-Ray diffraction, making necessary the development of a protocol based on spectroscopic methodologies: NMR, circular dichroism and computational tools (DFT, TD-DFT) have been implemented in this scope. In this Thesis are reported the determination of the absolute configuration of: • substituted 1,2,3,4-tetrahydroquinolines; • compounds from enantioselective Friedel-Crafts alkylation-acetalization cascade of naphthols with α,β-unsaturated cyclic ketones; • substituted 3,4-annulated indoles.
Resumo:
The aim of the Ph.D. research project was to explore Dual Fuel combustion and hybridization. Natural gas-diesel Dual Fuel combustion was experimentally investigated on a 4-Stroke, 2.8 L, turbocharged, light-duty Diesel engine, considering four operating points in the range between low to medium-high loads at 3000 rpm. Then, a numerical analysis was carried out using a customized version of the KIVA-3V code, in order to optimize the diesel injection strategy of the highest investigated load. A second KIVA-3V model was used to analyse the interchangeability between natural gas and biogas on an intermediate operating point. Since natural gas-diesel Dual Fuel combustion suffers from poor combustion efficiency at low loads, the effects of hydrogen enriched natural gas on Dual Fuel combustion were investigated using a validated Ansys Forte model, followed by an optimization of the diesel injection strategy and a sensitivity analysis to the swirl ratio, on the lowest investigated load. Since one of the main issues of Low Temperature Combustion engines is the low power density, 2-Stroke engines, thanks to the double frequency compared to 4-Stroke engines, may be more suitable to operate in Dual Fuel mode. Therefore, the application of gasoline-diesel Dual Fuel combustion to a modern 2-Stroke Diesel engine was analysed, starting from the investigation of gasoline injection and mixture formation. As far as hybridization is concerned, a MATLAB-Simulink model was built to compare a conventional (combustion) and a parallel-hybrid powertrain applied to a Formula SAE race car.
Resumo:
Embedding intelligence in extreme edge devices allows distilling raw data acquired from sensors into actionable information, directly on IoT end-nodes. This computing paradigm, in which end-nodes no longer depend entirely on the Cloud, offers undeniable benefits, driving a large research area (TinyML) to deploy leading Machine Learning (ML) algorithms on micro-controller class of devices. To fit the limited memory storage capability of these tiny platforms, full-precision Deep Neural Networks (DNNs) are compressed by representing their data down to byte and sub-byte formats, in the integer domain. However, the current generation of micro-controller systems can barely cope with the computing requirements of QNNs. This thesis tackles the challenge from many perspectives, presenting solutions both at software and hardware levels, exploiting parallelism, heterogeneity and software programmability to guarantee high flexibility and high energy-performance proportionality. The first contribution, PULP-NN, is an optimized software computing library for QNN inference on parallel ultra-low-power (PULP) clusters of RISC-V processors, showing one order of magnitude improvements in performance and energy efficiency, compared to current State-of-the-Art (SoA) STM32 micro-controller systems (MCUs) based on ARM Cortex-M cores. The second contribution is XpulpNN, a set of RISC-V domain specific instruction set architecture (ISA) extensions to deal with sub-byte integer arithmetic computation. The solution, including the ISA extensions and the micro-architecture to support them, achieves energy efficiency comparable with dedicated DNN accelerators and surpasses the efficiency of SoA ARM Cortex-M based MCUs, such as the low-end STM32M4 and the high-end STM32H7 devices, by up to three orders of magnitude. To overcome the Von Neumann bottleneck while guaranteeing the highest flexibility, the final contribution integrates an Analog In-Memory Computing accelerator into the PULP cluster, creating a fully programmable heterogeneous fabric that demonstrates end-to-end inference capabilities of SoA MobileNetV2 models, showing two orders of magnitude performance improvements over current SoA analog/digital solutions.
Resumo:
Embedded systems are increasingly integral to daily life, improving and facilitating the efficiency of modern Cyber-Physical Systems which provide access to sensor data, and actuators. As modern architectures become increasingly complex and heterogeneous, their optimization becomes a challenging task. Additionally, ensuring platform security is important to avoid harm to individuals and assets. This study primarily addresses challenges in contemporary Embedded Systems, focusing on platform optimization and security enforcement. The initial section of this study delves into the application of machine learning methods to efficiently determine the optimal number of cores for a parallel RISC-V cluster to minimize energy consumption using static source code analysis. Results demonstrate that automated platform configuration is not only viable but also that there is a moderate performance trade-off when relying solely on static features. The second part focuses on addressing the problem of heterogeneous device mapping, which involves assigning tasks to the most suitable computational device in a heterogeneous platform for optimal runtime. The contribution of this section lies in the introduction of novel pre-processing techniques, along with a training framework called Siamese Networks, that enhances the classification performance of DeepLLVM, an advanced approach for task mapping. Importantly, these proposed approaches are independent from the specific deep-learning model used. Finally, this research work focuses on addressing issues concerning the binary exploitation of software running in modern Embedded Systems. It proposes an architecture to implement Control-Flow Integrity in embedded platforms with a Root-of-Trust, aiming to enhance security guarantees with limited hardware modifications. The approach involves enhancing the architecture of a modern RISC-V platform for autonomous vehicles by implementing a side-channel communication mechanism that relays control-flow changes executed by the process running on the host core to the Root-of-Trust. This approach has limited impact on performance and it is effective in enhancing the security of embedded platforms.