20 resultados para Parallel computing, Virtual machine, Composition, Determinism, Abstraction

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The scale down of transistor technology allows microelectronics manufacturers such as Intel and IBM to build always more sophisticated systems on a single microchip. The classical interconnection solutions based on shared buses or direct connections between the modules of the chip are becoming obsolete as they struggle to sustain the increasing tight bandwidth and latency constraints that these systems demand. The most promising solution for the future chip interconnects are the Networks on Chip (NoC). NoCs are network composed by routers and channels used to inter- connect the different components installed on the single microchip. Examples of advanced processors based on NoC interconnects are the IBM Cell processor, composed by eight CPUs that is installed on the Sony Playstation III and the Intel Teraflops pro ject composed by 80 independent (simple) microprocessors. On chip integration is becoming popular not only in the Chip Multi Processor (CMP) research area but also in the wider and more heterogeneous world of Systems on Chip (SoC). SoC comprehend all the electronic devices that surround us such as cell-phones, smart-phones, house embedded systems, automotive systems, set-top boxes etc... SoC manufacturers such as ST Microelectronics , Samsung, Philips and also Universities such as Bologna University, M.I.T., Berkeley and more are all proposing proprietary frameworks based on NoC interconnects. These frameworks help engineers in the switch of design methodology and speed up the development of new NoC-based systems on chip. In this Thesis we propose an introduction of CMP and SoC interconnection networks. Then focusing on SoC systems we propose: • a detailed analysis based on simulation of the Spidergon NoC, a ST Microelectronics solution for SoC interconnects. The Spidergon NoC differs from many classical solutions inherited from the parallel computing world. Here we propose a detailed analysis of this NoC topology and routing algorithms. Furthermore we propose aEqualized a new routing algorithm designed to optimize the use of the resources of the network while also increasing its performance; • a methodology flow based on modified publicly available tools that combined can be used to design, model and analyze any kind of System on Chip; • a detailed analysis of a ST Microelectronics-proprietary transport-level protocol that the author of this Thesis helped developing; • a simulation-based comprehensive comparison of different network interface designs proposed by the author and the researchers at AST lab, in order to integrate shared-memory and message-passing based components on a single System on Chip; • a powerful and flexible solution to address the time closure exception issue in the design of synchronous Networks on Chip. Our solution is based on relay stations repeaters and allows to reduce the power and area demands of NoC interconnects while also reducing its buffer needs; • a solution to simplify the design of the NoC by also increasing their performance and reducing their power and area consumption. We propose to replace complex and slow virtual channel-based routers with multiple and flexible small Multi Plane ones. This solution allows us to reduce the area and power dissipation of any NoC while also increasing its performance especially when the resources are reduced. This Thesis has been written in collaboration with the Advanced System Technology laboratory in Grenoble France, and the Computer Science Department at Columbia University in the city of New York.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

During the last few decades an unprecedented technological growth has been at the center of the embedded systems design paramount, with Moore’s Law being the leading factor of this trend. Today in fact an ever increasing number of cores can be integrated on the same die, marking the transition from state-of-the-art multi-core chips to the new many-core design paradigm. Despite the extraordinarily high computing power, the complexity of many-core chips opens the door to several challenges. As a result of the increased silicon density of modern Systems-on-a-Chip (SoC), the design space exploration needed to find the best design has exploded and hardware designers are in fact facing the problem of a huge design space. Virtual Platforms have always been used to enable hardware-software co-design, but today they are facing with the huge complexity of both hardware and software systems. In this thesis two different research works on Virtual Platforms are presented: the first one is intended for the hardware developer, to easily allow complex cycle accurate simulations of many-core SoCs. The second work exploits the parallel computing power of off-the-shelf General Purpose Graphics Processing Units (GPGPUs), with the goal of an increased simulation speed. The term Virtualization can be used in the context of many-core systems not only to refer to the aforementioned hardware emulation tools (Virtual Platforms), but also for two other main purposes: 1) to help the programmer to achieve the maximum possible performance of an application, by hiding the complexity of the underlying hardware. 2) to efficiently exploit the high parallel hardware of many-core chips in environments with multiple active Virtual Machines. This thesis is focused on virtualization techniques with the goal to mitigate, and overtake when possible, some of the challenges introduced by the many-core design paradigm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the CERN LHC program underway, there has been an acceleration of data growth in the High Energy Physics (HEP) field and the usage of Machine Learning (ML) in HEP will be critical during the HL-LHC program when the data that will be produced will reach the exascale. ML techniques have been successfully used in many areas of HEP nevertheless, the development of a ML project and its implementation for production use is a highly time-consuming task and requires specific skills. Complicating this scenario is the fact that HEP data is stored in ROOT data format, which is mostly unknown outside of the HEP community. The work presented in this thesis is focused on the development of a ML as a Service (MLaaS) solution for HEP, aiming to provide a cloud service that allows HEP users to run ML pipelines via HTTP calls. These pipelines are executed by using the MLaaS4HEP framework, which allows reading data, processing data, and training ML models directly using ROOT files of arbitrary size from local or distributed data sources. Such a solution provides HEP users non-expert in ML with a tool that allows them to apply ML techniques in their analyses in a streamlined manner. Over the years the MLaaS4HEP framework has been developed, validated, and tested and new features have been added. A first MLaaS solution has been developed by automatizing the deployment of a platform equipped with the MLaaS4HEP framework. Then, a service with APIs has been developed, so that a user after being authenticated and authorized can submit MLaaS4HEP workflows producing trained ML models ready for the inference phase. A working prototype of this service is currently running on a virtual machine of INFN-Cloud and is compliant to be added to the INFN Cloud portfolio of services.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Generic programming is likely to become a new challenge for a critical mass of developers. Therefore, it is crucial to refine the support for generic programming in mainstream Object-Oriented languages — both at the design and at the implementation level — as well as to suggest novel ways to exploit the additional degree of expressiveness made available by genericity. This study is meant to provide a contribution towards bringing Java genericity to a more mature stage with respect to mainstream programming practice, by increasing the effectiveness of its implementation, and by revealing its full expressive power in real world scenario. With respect to the current research setting, the main contribution of the thesis is twofold. First, we propose a revised implementation for Java generics that greatly increases the expressiveness of the Java platform by adding reification support for generic types. Secondly, we show how Java genericity can be leveraged in a real world case-study in the context of the multi-paradigm language integration. Several approaches have been proposed in order to overcome the lack of reification of generic types in the Java programming language. Existing approaches tackle the problem of reification of generic types by defining new translation techniques which would allow for a runtime representation of generics and wildcards. Unfortunately most approaches suffer from several problems: heterogeneous translations are known to be problematic when considering reification of generic methods and wildcards. On the other hand, more sophisticated techniques requiring changes in the Java runtime, supports reified generics through a true language extension (where clauses) so that backward compatibility is compromised. In this thesis we develop a sophisticated type-passing technique for addressing the problem of reification of generic types in the Java programming language; this approach — first pioneered by the so called EGO translator — is here turned into a full-blown solution which reifies generic types inside the Java Virtual Machine (JVM) itself, thus overcoming both performance penalties and compatibility issues of the original EGO translator. Java-Prolog integration Integrating Object-Oriented and declarative programming has been the subject of several researches and corresponding technologies. Such proposals come in two flavours, either attempting at joining the two paradigms, or simply providing an interface library for accessing Prolog declarative features from a mainstream Object-Oriented languages such as Java. Both solutions have however drawbacks: in the case of hybrid languages featuring both Object-Oriented and logic traits, such resulting language is typically too complex, thus making mainstream application development an harder task; in the case of library-based integration approaches there is no true language integration, and some “boilerplate code” has to be implemented to fix the paradigm mismatch. In this thesis we develop a framework called PatJ which promotes seamless exploitation of Prolog programming in Java. A sophisticated usage of generics/wildcards allows to define a precise mapping between Object-Oriented and declarative features. PatJ defines a hierarchy of classes where the bidirectional semantics of Prolog terms is modelled directly at the level of the Java generic type-system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ultrasound imaging is widely used in medical diagnostics as it is the fastest, least invasive, and least expensive imaging modality. However, ultrasound images are intrinsically difficult to be interpreted. In this scenario, Computer Aided Detection (CAD) systems can be used to support physicians during diagnosis providing them a second opinion. This thesis discusses efficient ultrasound processing techniques for computer aided medical diagnostics, focusing on two major topics: (i) Ultrasound Tissue Characterization (UTC), aimed at characterizing and differentiating between healthy and diseased tissue; (ii) Ultrasound Image Segmentation (UIS), aimed at detecting the boundaries of anatomical structures to automatically measure organ dimensions and compute clinically relevant functional indices. Research on UTC produced a CAD tool for Prostate Cancer detection to improve the biopsy protocol. In particular, this thesis contributes with: (i) the development of a robust classification system; (ii) the exploitation of parallel computing on GPU for real-time performance; (iii) the introduction of both an innovative Semi-Supervised Learning algorithm and a novel supervised/semi-supervised learning scheme for CAD system training that improve system performance reducing data collection effort and avoiding collected data wasting. The tool provides physicians a risk map highlighting suspect tissue areas, allowing them to perform a lesion-directed biopsy. Clinical validation demonstrated the system validity as a diagnostic support tool and its effectiveness at reducing the number of biopsy cores requested for an accurate diagnosis. For UIS the research developed a heart disease diagnostic tool based on Real-Time 3D Echocardiography. Thesis contributions to this application are: (i) the development of an automated GPU based level-set segmentation framework for 3D images; (ii) the application of this framework to the myocardium segmentation. Experimental results showed the high efficiency and flexibility of the proposed framework. Its effectiveness as a tool for quantitative analysis of 3D cardiac morphology and function was demonstrated through clinical validation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Vrancea region, at the south-eastern bend of the Carpathian Mountains in Romania, represents one of the most puzzling seismically active zones of Europe. Beside some shallow seismicity spread across the whole Romanian territory, Vrancea is the place of an intense seismicity with the presence of a cluster of intermediate-depth foci placed in a narrow nearly vertical volume. Although large-scale mantle seismic tomographic studies have revealed the presence of a narrow, almost vertical, high-velocity body in the upper mantle, the nature and the geodynamic of this deep intra-continental seismicity is still questioned. High-resolution seismic tomography could help to reveal more details in the subcrustal structure of Vrancea. Recent developments in computational seismology as well as the availability of parallel computing now allow to potentially retrieve more information out of seismic waveforms and to reach such high-resolution models. This study was aimed to evaluate the application of a full waveform inversion tomography at regional scale for the Vrancea lithosphere using data from the 1999 six months temporary local network CALIXTO. Starting from a detailed 3D Vp, Vs and density model, built on classical travel-time tomography together with gravity data, I evaluated the improvements obtained with the full waveform inversion approach. The latter proved to be highly problem dependent and highly computational expensive. The model retrieved after the first two iterations does not show large variations with respect to the initial model but remains in agreement with previous tomographic models. It presents a well-defined downgoing slab shape high velocity anomaly, composed of a N-S horizontal anomaly in the depths between 40 and 70km linked to a nearly vertical NE-SW anomaly from 70 to 180km.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Despite the several issues faced in the past, the evolutionary trend of silicon has kept its constant pace. Today an ever increasing number of cores is integrated onto the same die. Unfortunately, the extraordinary performance achievable by the many-core paradigm is limited by several factors. Memory bandwidth limitation, combined with inefficient synchronization mechanisms, can severely overcome the potential computation capabilities. Moreover, the huge HW/SW design space requires accurate and flexible tools to perform architectural explorations and validation of design choices. In this thesis we focus on the aforementioned aspects: a flexible and accurate Virtual Platform has been developed, targeting a reference many-core architecture. Such tool has been used to perform architectural explorations, focusing on instruction caching architecture and hybrid HW/SW synchronization mechanism. Beside architectural implications, another issue of embedded systems is considered: energy efficiency. Near Threshold Computing is a key research area in the Ultra-Low-Power domain, as it promises a tenfold improvement in energy efficiency compared to super-threshold operation and it mitigates thermal bottlenecks. The physical implications of modern deep sub-micron technology are severely limiting performance and reliability of modern designs. Reliability becomes a major obstacle when operating in NTC, especially memory operation becomes unreliable and can compromise system correctness. In the present work a novel hybrid memory architecture is devised to overcome reliability issues and at the same time improve energy efficiency by means of aggressive voltage scaling when allowed by workload requirements. Variability is another great drawback of near-threshold operation. The greatly increased sensitivity to threshold voltage variations in today a major concern for electronic devices. We introduce a variation-tolerant extension of the baseline many-core architecture. By means of micro-architectural knobs and a lightweight runtime control unit, the baseline architecture becomes dynamically tolerant to variations.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Embedding intelligence in extreme edge devices allows distilling raw data acquired from sensors into actionable information, directly on IoT end-nodes. This computing paradigm, in which end-nodes no longer depend entirely on the Cloud, offers undeniable benefits, driving a large research area (TinyML) to deploy leading Machine Learning (ML) algorithms on micro-controller class of devices. To fit the limited memory storage capability of these tiny platforms, full-precision Deep Neural Networks (DNNs) are compressed by representing their data down to byte and sub-byte formats, in the integer domain. However, the current generation of micro-controller systems can barely cope with the computing requirements of QNNs. This thesis tackles the challenge from many perspectives, presenting solutions both at software and hardware levels, exploiting parallelism, heterogeneity and software programmability to guarantee high flexibility and high energy-performance proportionality. The first contribution, PULP-NN, is an optimized software computing library for QNN inference on parallel ultra-low-power (PULP) clusters of RISC-V processors, showing one order of magnitude improvements in performance and energy efficiency, compared to current State-of-the-Art (SoA) STM32 micro-controller systems (MCUs) based on ARM Cortex-M cores. The second contribution is XpulpNN, a set of RISC-V domain specific instruction set architecture (ISA) extensions to deal with sub-byte integer arithmetic computation. The solution, including the ISA extensions and the micro-architecture to support them, achieves energy efficiency comparable with dedicated DNN accelerators and surpasses the efficiency of SoA ARM Cortex-M based MCUs, such as the low-end STM32M4 and the high-end STM32H7 devices, by up to three orders of magnitude. To overcome the Von Neumann bottleneck while guaranteeing the highest flexibility, the final contribution integrates an Analog In-Memory Computing accelerator into the PULP cluster, creating a fully programmable heterogeneous fabric that demonstrates end-to-end inference capabilities of SoA MobileNetV2 models, showing two orders of magnitude performance improvements over current SoA analog/digital solutions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Singularities of robot manipulators have been intensely studied in the last decades by researchers of many fields. Serial singularities produce some local loss of dexterity of the manipulator, therefore it might be desirable to search for singularityfree trajectories in the jointspace. On the other hand, parallel singularities are very dangerous for parallel manipulators, for they may provoke the local loss of platform control, and jeopardize the structural integrity of links or actuators. It is therefore utterly important to avoid parallel singularities, while operating a parallel machine. Furthermore, there might be some configurations of a parallel manipulators that are allowed by the constraints, but nevertheless are unreachable by any feasible path. The present work proposes a numerical procedure based upon Morse theory, an important branch of differential topology. Such procedure counts and identify the singularity-free regions that are cut by the singularity locus out of the configuration space, and the disjoint regions composing the configuration space of a parallel manipulator. Moreover, given any two configurations of a manipulator, a feasible or a singularity-free path connecting them can always be found, or it can be proved that none exists. Examples of applications to 3R and 6R serial manipulators, to 3UPS and 3UPU parallel wrists, to 3UPU parallel translational manipulators, and to 3RRR planar manipulators are reported in the work.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Context-aware computing is currently considered the most promising approach to overcome information overload and to speed up access to relevant information and services. Context-awareness may be derived from many sources, including user profile and preferences, network information, sensor analysis; usually context-awareness relies on the ability of computing devices to interact with the physical world, i.e. with the natural and artificial objects hosted within the "environment”. Ideally, context-aware applications should not be intrusive and should be able to react according to user’s context, with minimum user effort. Context is an application dependent multidimensional space and the location is an important part of it since the very beginning. Location can be used to guide applications, in providing information or functions that are most appropriate for a specific position. Hence location systems play a crucial role. There are several technologies and systems for computing location to a vary degree of accuracy and tailored for specific space model, i.e. indoors or outdoors, structured spaces or unstructured spaces. The research challenge faced by this thesis is related to pedestrian positioning in heterogeneous environments. Particularly, the focus will be on pedestrian identification, localization, orientation and activity recognition. This research was mainly carried out within the “mobile and ambient systems” workgroup of EPOCH, a 6FP NoE on the application of ICT to Cultural Heritage. Therefore applications in Cultural Heritage sites were the main target of the context-aware services discussed. Cultural Heritage sites are considered significant test-beds in Context-aware computing for many reasons. For example building a smart environment in museums or in protected sites is a challenging task, because localization and tracking are usually based on technologies that are difficult to hide or harmonize within the environment. Therefore it is expected that the experience made with this research may be useful also in domains other than Cultural Heritage. This work presents three different approaches to the pedestrian identification, positioning and tracking: Pedestrian navigation by means of a wearable inertial sensing platform assisted by the vision based tracking system for initial settings an real-time calibration; Pedestrian navigation by means of a wearable inertial sensing platform augmented with GPS measurements; Pedestrian identification and tracking, combining the vision based tracking system with WiFi localization. The proposed localization systems have been mainly used to enhance Cultural Heritage applications in providing information and services depending on the user’s actual context, in particular depending on the user’s location.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The goal of this thesis work is to develop a computational method based on machine learning techniques for predicting disulfide-bonding states of cysteine residues in proteins, which is a sub-problem of a bigger and yet unsolved problem of protein structure prediction. Improvement in the prediction of disulfide bonding states of cysteine residues will help in putting a constraint in the three dimensional (3D) space of the respective protein structure, and thus will eventually help in the prediction of 3D structure of proteins. Results of this work will have direct implications in site-directed mutational studies of proteins, proteins engineering and the problem of protein folding. We have used a combination of Artificial Neural Network (ANN) and Hidden Markov Model (HMM), the so-called Hidden Neural Network (HNN) as a machine learning technique to develop our prediction method. By using different global and local features of proteins (specifically profiles, parity of cysteine residues, average cysteine conservation, correlated mutation, sub-cellular localization, and signal peptide) as inputs and considering Eukaryotes and Prokaryotes separately we have reached to a remarkable accuracy of 94% on cysteine basis for both Eukaryotic and Prokaryotic datasets, and an accuracy of 90% and 93% on protein basis for Eukaryotic dataset and Prokaryotic dataset respectively. These accuracies are best so far ever reached by any existing prediction methods, and thus our prediction method has outperformed all the previously developed approaches and therefore is more reliable. Most interesting part of this thesis work is the differences in the prediction performances of Eukaryotes and Prokaryotes at the basic level of input coding when ‘profile’ information was given as input to our prediction method. And one of the reasons for this we discover is the difference in the amino acid composition of the local environment of bonded and free cysteine residues in Eukaryotes and Prokaryotes. Eukaryotic bonded cysteine examples have a ‘symmetric-cysteine-rich’ environment, where as Prokaryotic bonded examples lack it.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

I moderni sistemi embedded sono equipaggiati con risorse hardware che consentono l’esecuzione di applicazioni molto complesse come il decoding audio e video. La progettazione di simili sistemi deve soddisfare due esigenze opposte. Da un lato è necessario fornire un elevato potenziale computazionale, dall’altro bisogna rispettare dei vincoli stringenti riguardo il consumo di energia. Uno dei trend più diffusi per rispondere a queste esigenze opposte è quello di integrare su uno stesso chip un numero elevato di processori caratterizzati da un design semplificato e da bassi consumi. Tuttavia, per sfruttare effettivamente il potenziale computazionale offerto da una batteria di processoriè necessario rivisitare pesantemente le metodologie di sviluppo delle applicazioni. Con l’avvento dei sistemi multi-processore su singolo chip (MPSoC) il parallel programming si è diffuso largamente anche in ambito embedded. Tuttavia, i progressi nel campo della programmazione parallela non hanno mantenuto il passo con la capacità di integrare hardware parallelo su un singolo chip. Oltre all’introduzione di multipli processori, la necessità di ridurre i consumi degli MPSoC comporta altre soluzioni architetturali che hanno l’effetto diretto di complicare lo sviluppo delle applicazioni. Il design del sottosistema di memoria, in particolare, è un problema critico. Integrare sul chip dei banchi di memoria consente dei tempi d’accesso molto brevi e dei consumi molto contenuti. Sfortunatamente, la quantità di memoria on-chip che può essere integrata in un MPSoC è molto limitata. Per questo motivo è necessario aggiungere dei banchi di memoria off-chip, che hanno una capacità molto maggiore, come maggiori sono i consumi e i tempi d’accesso. La maggior parte degli MPSoC attualmente in commercio destina una parte del budget di area all’implementazione di memorie cache e/o scratchpad. Le scratchpad (SPM) sono spesso preferite alle cache nei sistemi MPSoC embedded, per motivi di maggiore predicibilità, minore occupazione d’area e – soprattutto – minori consumi. Per contro, mentre l’uso delle cache è completamente trasparente al programmatore, le SPM devono essere esplicitamente gestite dall’applicazione. Esporre l’organizzazione della gerarchia di memoria ll’applicazione consente di sfruttarne in maniera efficiente i vantaggi (ridotti tempi d’accesso e consumi). Per contro, per ottenere questi benefici è necessario scrivere le applicazioni in maniera tale che i dati vengano partizionati e allocati sulle varie memorie in maniera opportuna. L’onere di questo compito complesso ricade ovviamente sul programmatore. Questo scenario descrive bene l’esigenza di modelli di programmazione e strumenti di supporto che semplifichino lo sviluppo di applicazioni parallele. In questa tesi viene presentato un framework per lo sviluppo di software per MPSoC embedded basato su OpenMP. OpenMP è uno standard di fatto per la programmazione di multiprocessori con memoria shared, caratterizzato da un semplice approccio alla parallelizzazione tramite annotazioni (direttive per il compilatore). La sua interfaccia di programmazione consente di esprimere in maniera naturale e molto efficiente il parallelismo a livello di loop, molto diffuso tra le applicazioni embedded di tipo signal processing e multimedia. OpenMP costituisce un ottimo punto di partenza per la definizione di un modello di programmazione per MPSoC, soprattutto per la sua semplicità d’uso. D’altra parte, per sfruttare in maniera efficiente il potenziale computazionale di un MPSoC è necessario rivisitare profondamente l’implementazione del supporto OpenMP sia nel compilatore che nell’ambiente di supporto a runtime. Tutti i costrutti per gestire il parallelismo, la suddivisione del lavoro e la sincronizzazione inter-processore comportano un costo in termini di overhead che deve essere minimizzato per non comprometterre i vantaggi della parallelizzazione. Questo può essere ottenuto soltanto tramite una accurata analisi delle caratteristiche hardware e l’individuazione dei potenziali colli di bottiglia nell’architettura. Una implementazione del task management, della sincronizzazione a barriera e della condivisione dei dati che sfrutti efficientemente le risorse hardware consente di ottenere elevate performance e scalabilità. La condivisione dei dati, nel modello OpenMP, merita particolare attenzione. In un modello a memoria condivisa le strutture dati (array, matrici) accedute dal programma sono fisicamente allocate su una unica risorsa di memoria raggiungibile da tutti i processori. Al crescere del numero di processori in un sistema, l’accesso concorrente ad una singola risorsa di memoria costituisce un evidente collo di bottiglia. Per alleviare la pressione sulle memorie e sul sistema di connessione vengono da noi studiate e proposte delle tecniche di partizionamento delle strutture dati. Queste tecniche richiedono che una singola entità di tipo array venga trattata nel programma come l’insieme di tanti sotto-array, ciascuno dei quali può essere fisicamente allocato su una risorsa di memoria differente. Dal punto di vista del programma, indirizzare un array partizionato richiede che ad ogni accesso vengano eseguite delle istruzioni per ri-calcolare l’indirizzo fisico di destinazione. Questo è chiaramente un compito lungo, complesso e soggetto ad errori. Per questo motivo, le nostre tecniche di partizionamento sono state integrate nella l’interfaccia di programmazione di OpenMP, che è stata significativamente estesa. Specificamente, delle nuove direttive e clausole consentono al programmatore di annotare i dati di tipo array che si vuole partizionare e allocare in maniera distribuita sulla gerarchia di memoria. Sono stati inoltre sviluppati degli strumenti di supporto che consentono di raccogliere informazioni di profiling sul pattern di accesso agli array. Queste informazioni vengono sfruttate dal nostro compilatore per allocare le partizioni sulle varie risorse di memoria rispettando una relazione di affinità tra il task e i dati. Più precisamente, i passi di allocazione nel nostro compilatore assegnano una determinata partizione alla memoria scratchpad locale al processore che ospita il task che effettua il numero maggiore di accessi alla stessa.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis deals with heterogeneous architectures in standard workstations. Heterogeneous architectures represent an appealing alternative to traditional supercomputers because they are based on commodity components fabricated in large quantities. Hence their price-performance ratio is unparalleled in the world of high performance computing (HPC). In particular, different aspects related to the performance and consumption of heterogeneous architectures have been explored. The thesis initially focuses on an efficient implementation of a parallel application, where the execution time is dominated by an high number of floating point instructions. Then the thesis touches the central problem of efficient management of power peaks in heterogeneous computing systems. Finally it discusses a memory-bounded problem, where the execution time is dominated by the memory latency. Specifically, the following main contributions have been carried out: A novel framework for the design and analysis of solar field for Central Receiver Systems (CRS) has been developed. The implementation based on desktop workstation equipped with multiple Graphics Processing Units (GPUs) is motivated by the need to have an accurate and fast simulation environment for studying mirror imperfection and non-planar geometries. Secondly, a power-aware scheduling algorithm on heterogeneous CPU-GPU architectures, based on an efficient distribution of the computing workload to the resources, has been realized. The scheduler manages the resources of several computing nodes with a view to reducing the peak power. The two main contributions of this work follow: the approach reduces the supply cost due to high peak power whilst having negligible impact on the parallelism of computational nodes. from another point of view the developed model allows designer to increase the number of cores without increasing the capacity of the power supply unit. Finally, an implementation for efficient graph exploration on reconfigurable architectures is presented. The purpose is to accelerate graph exploration, reducing the number of random memory accesses.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Flow features inside centrifugal compressor stages are very complicated to simulate with numerical tools due to the highly complex geometry and varying gas conditions all across the machine. For this reason, a big effort is currently being made to increase the fidelity of the numerical models during the design and validation phases. Computational Fluid Dynamics (CFD) plays an increasing role in the assessment of the performance prediction of centrifugal compressor stages. Historically, CFD was considered reliable for performance prediction on a qualitatively level, whereas tests were necessary to predict compressors performance on a quantitatively basis. In fact "standard" CFD with only the flow-path and blades included into the computational domain is known to be weak in capturing efficiency level and operating range accurately due to the under-estimation of losses and the lack of secondary flows modeling. This research project aims to fill the gap in accuracy between "standard" CFD and tests data by including a high fidelity reproduction of the gas domain and the use of advanced numerical models and tools introduced in the author's OEM in-house CFD code. In other words, this thesis describes a methodology by which virtual tests can be conducted on single stages and multistage centrifugal compressors in a similar fashion to a typical rig test that guarantee end users to operate machines with a confidence level not achievable before. Furthermore, the new "high fidelity" approach allowed understanding flow phenomena not fully captured before, increasing aerodynamicists capability and confidence in designing high efficiency and high reliable centrifugal compressor stages.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The research activity focused on the study, design and evaluation of innovative human-machine interfaces based on virtual three-dimensional environments. It is based on the brain electrical activities recorded in real time through the electrical impulses emitted by the brain waves of the user. The achieved target is to identify and sort in real time the different brain states and adapt the interface and/or stimuli to the corresponding emotional state of the user. The setup of an experimental facility based on an innovative experimental methodology for “man in the loop" simulation was established. It allowed involving during pilot training in virtually simulated flights, both pilot and flight examiner, in order to compare the subjective evaluations of this latter to the objective measurements of the brain activity of the pilot. This was done recording all the relevant information versus a time-line. Different combinations of emotional intensities obtained, led to an evaluation of the current situational awareness of the user. These results have a great implication in the current training methodology of the pilots, and its use could be extended as a tool that can improve the evaluation of a pilot/crew performance in interacting with the aircraft when performing tasks and procedures, especially in critical situations. This research also resulted in the design of an interface that adapts the control of the machine to the situation awareness of the user. The new concept worked on, aimed at improving the efficiency between a user and the interface, and gaining capacity by reducing the user’s workload and hence improving the system overall safety. This innovative research combining emotions measured through electroencephalography resulted in a human-machine interface that would have three aeronautical related applications: • An evaluation tool during the pilot training; • An input for cockpit environment; • An adaptation tool of the cockpit automation.