6 resultados para parallel execution
em AMS Tesi di Laurea - Alm@DL - Università di Bologna
Resumo:
Modern High-Performance Computing HPC systems are gradually increasing in size and complexity due to the correspondent demand of larger simulations requiring more complicated tasks and higher accuracy. However, as side effects of the Dennard’s scaling approaching its ultimate power limit, the efficiency of software plays also an important role in increasing the overall performance of a computation. Tools to measure application performance in these increasingly complex environments provide insights into the intricate ways in which software and hardware interact. The monitoring of the power consumption in order to save energy is possible through processors interfaces like Intel Running Average Power Limit RAPL. Given the low level of these interfaces, they are often paired with an application-level tool like Performance Application Programming Interface PAPI. Since several problems in many heterogeneous fields can be represented as a complex linear system, an optimized and scalable linear system solver algorithm can decrease significantly the time spent to compute its resolution. One of the most widely used algorithms deployed for the resolution of large simulation is the Gaussian Elimination, which has its most popular implementation for HPC systems in the Scalable Linear Algebra PACKage ScaLAPACK library. However, another relevant algorithm, which is increasing in popularity in the academic field, is the Inhibition Method. This thesis compares the energy consumption of the Inhibition Method and Gaussian Elimination from ScaLAPACK to profile their execution during the resolution of linear systems above the HPC architecture offered by CINECA. Moreover, it also collates the energy and power values for different ranks, nodes, and sockets configurations. The monitoring tools employed to track the energy consumption of these algorithms are PAPI and RAPL, that will be integrated with the parallel execution of the algorithms managed with the Message Passing Interface MPI.
Resumo:
Complex networks analysis is a very popular topic in computer science. Unfortunately this networks, extracted from different contexts, are usually very large and the analysis may be very complicated: computation of metrics on these structures could be very complex. Among all metrics we analyse the extraction of subnetworks called communities: they are groups of nodes that probably play the same role within the whole structure. Communities extraction is an interesting operation in many different fields (biology, economics,...). In this work we present a parallel community detection algorithm that can operate on networks with huge number of nodes and edges. After an introduction to graph theory and high performance computing, we will explain our design strategies and our implementation. Then, we will show some performance evaluation made on a distributed memory architectures i.e. the supercomputer IBM-BlueGene/Q "Fermi" at the CINECA supercomputing center, Italy, and we will comment our results.
Resumo:
The aim of my thesis is to parallelize the Weighting Histogram Analysis Method (WHAM), which is a popular algorithm used to calculate the Free Energy of a molucular system in Molecular Dynamics simulations. WHAM works in post processing in cooperation with another algorithm called Umbrella Sampling. Umbrella Sampling has the purpose to add a biasing in the potential energy of the system in order to force the system to sample a specific region in the configurational space. Several N independent simulations are performed in order to sample all the region of interest. Subsequently, the WHAM algorithm is used to estimate the original system energy starting from the N atomic trajectories. The parallelization of WHAM has been performed through CUDA, a language that allows to work in GPUs of NVIDIA graphic cards, which have a parallel achitecture. The parallel implementation may sensibly speed up the WHAM execution compared to previous serial CPU imlementations. However, the WHAM CPU code presents some temporal criticalities to very high numbers of interactions. The algorithm has been written in C++ and executed in UNIX systems provided with NVIDIA graphic cards. The results were satisfying obtaining an increase of performances when the model was executed on graphics cards with compute capability greater. Nonetheless, the GPUs used to test the algorithm is quite old and not designated for scientific calculations. It is likely that a further performance increase will be obtained if the algorithm would be executed in clusters of GPU at high level of computational efficiency. The thesis is organized in the following way: I will first describe the mathematical formulation of Umbrella Sampling and WHAM algorithm with their apllications in the study of ionic channels and in Molecular Docking (Chapter 1); then, I will present the CUDA architectures used to implement the model (Chapter 2); and finally, the results obtained on model systems will be presented (Chapter 3).
Resumo:
L’obiettivo del progetto di tesi svolto è quello di realizzare un servizio di livello middleware dedicato ai dispositivi mobili che sia in grado di fornire il supporto per l’offloading di codice verso una infrastruttura cloud. In particolare il progetto si concentra sulla migrazione di codice verso macchine virtuali dedicate al singolo utente. Il sistema operativo delle VMs è lo stesso utilizzato dal device mobile. Come i precedenti lavori sul computation offloading, il progetto di tesi deve garantire migliori performance in termini di tempo di esecuzione e utilizzo della batteria del dispositivo. In particolare l’obiettivo più ampio è quello di adattare il principio di computation offloading a un contesto di sistemi distribuiti mobili, migliorando non solo le performance del singolo device, ma l’esecuzione stessa dell’applicazione distribuita. Questo viene fatto tramite una gestione dinamica delle decisioni di offloading basata, non solo, sullo stato del device, ma anche sulla volontà e/o sullo stato degli altri utenti appartenenti allo stesso gruppo. Per esempio, un primo utente potrebbe influenzare le decisioni degli altri membri del gruppo specificando una determinata richiesta, come alta qualità delle informazioni, risposta rapida o basata su altre informazioni di alto livello. Il sistema fornisce ai programmatori un semplice strumento di definizione per poter creare nuove policy personalizzate e, quindi, specificare nuove regole di offloading. Per rendere il progetto accessibile ad un più ampio numero di sviluppatori gli strumenti forniti sono semplici e non richiedono specifiche conoscenze sulla tecnologia. Il sistema è stato poi testato per verificare le sue performance in termini di mecchanismi di offloading semplici. Successivamente, esso è stato anche sottoposto a dei test per verificare che la selezione di differenti policy, definite dal programmatore, portasse realmente a una ottimizzazione del parametro designato.