Biblioteca Digital

3 resultados para Computing Classification Systems

em AMS Tesi di Laurea - Alm@DL - Università di Bologna

Energy consumption of parallel algorithms for solving linear systems on HPC architecture

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Modern High-Performance Computing HPC systems are gradually increasing in size and complexity due to the correspondent demand of larger simulations requiring more complicated tasks and higher accuracy. However, as side effects of the Dennard’s scaling approaching its ultimate power limit, the efficiency of software plays also an important role in increasing the overall performance of a computation. Tools to measure application performance in these increasingly complex environments provide insights into the intricate ways in which software and hardware interact. The monitoring of the power consumption in order to save energy is possible through processors interfaces like Intel Running Average Power Limit RAPL. Given the low level of these interfaces, they are often paired with an application-level tool like Performance Application Programming Interface PAPI. Since several problems in many heterogeneous fields can be represented as a complex linear system, an optimized and scalable linear system solver algorithm can decrease significantly the time spent to compute its resolution. One of the most widely used algorithms deployed for the resolution of large simulation is the Gaussian Elimination, which has its most popular implementation for HPC systems in the Scalable Linear Algebra PACKage ScaLAPACK library. However, another relevant algorithm, which is increasing in popularity in the academic field, is the Inhibition Method. This thesis compares the energy consumption of the Inhibition Method and Gaussian Elimination from ScaLAPACK to profile their execution during the resolution of linear systems above the HPC architecture offered by CINECA. Moreover, it also collates the energy and power values for different ranks, nodes, and sockets configurations. The monitoring tools employed to track the energy consumption of these algorithms are PAPI and RAPL, that will be integrated with the parallel execution of the algorithms managed with the Message Passing Interface MPI.

Veja mais

Connectivity management and dynamic context grouping in mobile heterogenous systems

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the last 10 years the number of mobile devices has grown rapidly. Each person usually brings at least two personal devices and researchers says that in a near future this number could raise up to ten devices per person. Moreover, all the devices are becoming more integrated to our life than in the past, therefore the amount of data exchanged increases accordingly to the improvement of people's lifestyle. This is what researchers call Internet of Things. Thus, in the future there will be more than 60 billions of nodes and the current infrastructure is not ready to keep track of all the exchanges of data between them. Therefore, infrastructure improvements have been proposed in the last years, like MobileIP and HIP in order to facilitate the exchange of packets in mobility, however none of them have been optimized for the purpose. In the last years, researchers from Mid Sweden University created The MediaSense Framework. Initially, this framework was based on the Chord protocol in order to route packets in a big network, but the most important change has been the introduction of PGrids in order to create the Overlay and the persistence. Thanks to this technology, a lookup in the trie takes up to 0.5*log(N), where N is the total number of nodes in the network. This result could be improved by further optimizations on the management of the nodes, for example by the dynamic creation of groups of nodes. Moreover, since the nodes move, an underlaying support for connectivity management is needed. SCTP has been selected as one of the most promising upcoming standards for simultaneous multiple connection's management.

Veja mais

Implementazione dell'algoritmo filtered back-projection (FBP) per architetture low-power di tipo systems-on-chip

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Il presente lavoro di tesi, svolto presso i laboratori dell'X-ray Imaging Group del Dipartimento di Fisica e Astronomia dell'Università di Bologna e all'interno del progetto della V Commissione Scientifica Nazionale dell'INFN, COSA (Computing on SoC Architectures), ha come obiettivo il porting e l’analisi di un codice di ricostruzione tomografica su architetture GPU installate su System-On-Chip low-power, al fine di sviluppare un metodo portatile, economico e relativamente veloce. Dall'analisi computazionale sono state sviluppate tre diverse versioni del porting in CUDA C: nella prima ci si è limitati a trasporre la parte più onerosa del calcolo sulla scheda grafica, nella seconda si sfrutta la velocità del calcolo matriciale propria del coprocessore (facendo coincidere ogni pixel con una singola unità di calcolo parallelo), mentre la terza è un miglioramento della precedente versione ottimizzata ulteriormente. La terza versione è quella definitiva scelta perché è la più performante sia dal punto di vista del tempo di ricostruzione della singola slice sia a livello di risparmio energetico. Il porting sviluppato è stato confrontato con altre due parallelizzazioni in OpenMP ed MPI. Si è studiato quindi, sia su cluster HPC, sia su cluster SoC low-power (utilizzando in particolare la scheda quad-core Tegra K1), l’efficienza di ogni paradigma in funzione della velocità di calcolo e dell’energia impiegata. La soluzione da noi proposta prevede la combinazione del porting in OpenMP e di quello in CUDA C. Tre core CPU vengono riservati per l'esecuzione del codice in OpenMP, il quarto per gestire la GPU usando il porting in CUDA C. Questa doppia parallelizzazione ha la massima efficienza in funzione della potenza e dell’energia, mentre il cluster HPC ha la massima efficienza in velocità di calcolo. Il metodo proposto quindi permetterebbe di sfruttare quasi completamente le potenzialità della CPU e GPU con un costo molto contenuto. Una possibile ottimizzazione futura potrebbe prevedere la ricostruzione di due slice contemporaneamente sulla GPU, raddoppiando circa la velocità totale e sfruttando al meglio l’hardware. Questo studio ha dato risultati molto soddisfacenti, infatti, è possibile con solo tre schede TK1 eguagliare e forse a superare, in seguito, la potenza di calcolo di un server tradizionale con il vantaggio aggiunto di avere un sistema portatile, a basso consumo e costo. Questa ricerca si va a porre nell’ambito del computing come uno tra i primi studi effettivi su architetture SoC low-power e sul loro impiego in ambito scientifico, con risultati molto promettenti.

Veja mais

3 resultados para Computing Classification Systems

em AMS Tesi di Laurea - Alm@DL - Università di Bologna

Filtro por publicador

Energy consumption of parallel algorithms for solving linear systems on HPC architecture

Connectivity management and dynamic context grouping in mobile heterogenous systems

Implementazione dell'algoritmo filtered back-projection (FBP) per architetture low-power di tipo systems-on-chip