989 resultados para hpc,risc-v,cluster,graph500,npb,hpcbenctt


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Gli sforzi di ricerca relativi all'High Performance Computing, nel corso degli anni, hanno prodotto risultati importanti inerenti all'incremento delle prestazioni sia in termini di numero di operazioni effettuate per periodo temporale, sia introducendo o migliorando algoritmi paralleli presenti in letteratura. Tali traguardi hanno comportato cambiamenti alla struttura interna delle macchine; si è assistito infatti ad un'evoluzione delle architetture dei processori utilizzati e all'impiego di GPU come risorse di calcolo aggiuntive. La conseguenza di un continuo incremento di prestazioni è quella di dover far fronte ad un grosso dispendio energetico, in quanto le macchine impiegate nell'HPC sono ideate per effettuare un'intensa attività di calcolo in un periodo di tempo molto prolungato; l'energia necessaria per alimentare ciascun nodo e dissipare il calore generato comporta costi elevati. Tra le varie soluzioni proposte per limitare il consumo di energia, quella che ha riscosso maggior interesse, sia a livello di studio che di mercato, è stata l'integrazione di CPU di tipologia RISC (Reduced Instruction Set Computer), in quanto capaci di ottenere prestazioni soddisfacenti con un impiego energetico inferiore rispetto alle CPU CISC (Complex Instruction Set Computer). In questa tesi è presentata l'analisi delle prestazioni di Monte Cimone, un cluster composto da 8 nodi di calcolo basati su architettura RISC-V e distribuiti in 4 piattaforme (\emph{blade}) dual-board. Verranno eseguiti dei benchmark che ci permetteranno di valutare: le prestazioni dello scambio di dati a lunga e corta distanza; le prestazioni nella risoluzione di problemi che presentano un principio di località spaziale ridotto; le prestazioni nella risoluzione di problemi su grafi e, nello specifico, ricerca in ampiezza e cammini minimi da sorgente singola.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fino a 15 anni fa, era possibile aumentare il numero di transistor su un singolo chip e contemporaneamente la sua frequenza di clock mantenendo la densità di potenza costante. Tuttavia dal 2004 non è più possibile mantenere invariata la potenza dissipata per unità d’area a causa di limitazioni fisiche. Al fine di aumentare le performance dei processori e di impedire una diminuzione delle frequenze di clock, i processori moderni integrano on-die dei Power Controller Subsystems (PCS) come risorsa hardware dedicata che implementa complesse strategie di gestione di temperatura e potenza. In questo progetto di tesi viene progettata l'architettura dell'interfaccia di comunicazione di ControlPULP, un PCS basato su ISA RISC-V, per la connessione verso un processore HPC. Tale interfaccia di comunicaione integra il supporto hardware per lo scambio di messaggi secondo la specifica SCMI. L'interfaccia sviluppata viene successivamente validata attraverso simulazione ed emulazione su supporto hardware FPGA. Tale supporto hardware viene inoltre utilizzato per la caratterizzazione dell'utilizzo di risorse dell'architettura progettata. Oltre allo sviluppo dell'interfaccia hardware viene sviluppato e caratterizzato un firmware per la decodifica dei messaggi SCMI conforme ai requisiti di esecuzione su un sistema real-time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Embedded systems are increasingly integral to daily life, improving and facilitating the efficiency of modern Cyber-Physical Systems which provide access to sensor data, and actuators. As modern architectures become increasingly complex and heterogeneous, their optimization becomes a challenging task. Additionally, ensuring platform security is important to avoid harm to individuals and assets. This study primarily addresses challenges in contemporary Embedded Systems, focusing on platform optimization and security enforcement. The initial section of this study delves into the application of machine learning methods to efficiently determine the optimal number of cores for a parallel RISC-V cluster to minimize energy consumption using static source code analysis. Results demonstrate that automated platform configuration is not only viable but also that there is a moderate performance trade-off when relying solely on static features. The second part focuses on addressing the problem of heterogeneous device mapping, which involves assigning tasks to the most suitable computational device in a heterogeneous platform for optimal runtime. The contribution of this section lies in the introduction of novel pre-processing techniques, along with a training framework called Siamese Networks, that enhances the classification performance of DeepLLVM, an advanced approach for task mapping. Importantly, these proposed approaches are independent from the specific deep-learning model used. Finally, this research work focuses on addressing issues concerning the binary exploitation of software running in modern Embedded Systems. It proposes an architecture to implement Control-Flow Integrity in embedded platforms with a Root-of-Trust, aiming to enhance security guarantees with limited hardware modifications. The approach involves enhancing the architecture of a modern RISC-V platform for autonomous vehicles by implementing a side-channel communication mechanism that relays control-flow changes executed by the process running on the host core to the Root-of-Trust. This approach has limited impact on performance and it is effective in enhancing the security of embedded platforms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Il presente lavoro di tesi, svolto presso i laboratori dell'X-ray Imaging Group del Dipartimento di Fisica e Astronomia dell'Università di Bologna e all'interno del progetto della V Commissione Scientifica Nazionale dell'INFN, COSA (Computing on SoC Architectures), ha come obiettivo il porting e l’analisi di un codice di ricostruzione tomografica su architetture GPU installate su System-On-Chip low-power, al fine di sviluppare un metodo portatile, economico e relativamente veloce. Dall'analisi computazionale sono state sviluppate tre diverse versioni del porting in CUDA C: nella prima ci si è limitati a trasporre la parte più onerosa del calcolo sulla scheda grafica, nella seconda si sfrutta la velocità del calcolo matriciale propria del coprocessore (facendo coincidere ogni pixel con una singola unità di calcolo parallelo), mentre la terza è un miglioramento della precedente versione ottimizzata ulteriormente. La terza versione è quella definitiva scelta perché è la più performante sia dal punto di vista del tempo di ricostruzione della singola slice sia a livello di risparmio energetico. Il porting sviluppato è stato confrontato con altre due parallelizzazioni in OpenMP ed MPI. Si è studiato quindi, sia su cluster HPC, sia su cluster SoC low-power (utilizzando in particolare la scheda quad-core Tegra K1), l’efficienza di ogni paradigma in funzione della velocità di calcolo e dell’energia impiegata. La soluzione da noi proposta prevede la combinazione del porting in OpenMP e di quello in CUDA C. Tre core CPU vengono riservati per l'esecuzione del codice in OpenMP, il quarto per gestire la GPU usando il porting in CUDA C. Questa doppia parallelizzazione ha la massima efficienza in funzione della potenza e dell’energia, mentre il cluster HPC ha la massima efficienza in velocità di calcolo. Il metodo proposto quindi permetterebbe di sfruttare quasi completamente le potenzialità della CPU e GPU con un costo molto contenuto. Una possibile ottimizzazione futura potrebbe prevedere la ricostruzione di due slice contemporaneamente sulla GPU, raddoppiando circa la velocità totale e sfruttando al meglio l’hardware. Questo studio ha dato risultati molto soddisfacenti, infatti, è possibile con solo tre schede TK1 eguagliare e forse a superare, in seguito, la potenza di calcolo di un server tradizionale con il vantaggio aggiunto di avere un sistema portatile, a basso consumo e costo. Questa ricerca si va a porre nell’ambito del computing come uno tra i primi studi effettivi su architetture SoC low-power e sul loro impiego in ambito scientifico, con risultati molto promettenti.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Embedding intelligence in extreme edge devices allows distilling raw data acquired from sensors into actionable information, directly on IoT end-nodes. This computing paradigm, in which end-nodes no longer depend entirely on the Cloud, offers undeniable benefits, driving a large research area (TinyML) to deploy leading Machine Learning (ML) algorithms on micro-controller class of devices. To fit the limited memory storage capability of these tiny platforms, full-precision Deep Neural Networks (DNNs) are compressed by representing their data down to byte and sub-byte formats, in the integer domain. However, the current generation of micro-controller systems can barely cope with the computing requirements of QNNs. This thesis tackles the challenge from many perspectives, presenting solutions both at software and hardware levels, exploiting parallelism, heterogeneity and software programmability to guarantee high flexibility and high energy-performance proportionality. The first contribution, PULP-NN, is an optimized software computing library for QNN inference on parallel ultra-low-power (PULP) clusters of RISC-V processors, showing one order of magnitude improvements in performance and energy efficiency, compared to current State-of-the-Art (SoA) STM32 micro-controller systems (MCUs) based on ARM Cortex-M cores. The second contribution is XpulpNN, a set of RISC-V domain specific instruction set architecture (ISA) extensions to deal with sub-byte integer arithmetic computation. The solution, including the ISA extensions and the micro-architecture to support them, achieves energy efficiency comparable with dedicated DNN accelerators and surpasses the efficiency of SoA ARM Cortex-M based MCUs, such as the low-end STM32M4 and the high-end STM32H7 devices, by up to three orders of magnitude. To overcome the Von Neumann bottleneck while guaranteeing the highest flexibility, the final contribution integrates an Analog In-Memory Computing accelerator into the PULP cluster, creating a fully programmable heterogeneous fabric that demonstrates end-to-end inference capabilities of SoA MobileNetV2 models, showing two orders of magnitude performance improvements over current SoA analog/digital solutions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Photoplethysmography (PPG) sensors allow for noninvasive and comfortable heart-rate (HR) monitoring, suitable for compact wearable devices. However, PPG signals collected from such devices often suffer from corruption caused by motion artifacts. This is typically addressed by combining the PPG signal with acceleration measurements from an inertial sensor. Recently, different energy-efficient deep learning approaches for heart rate estimation have been proposed. To test these new solutions, in this work, we developed a highly wearable platform (42mm x 48 mm x 1.2mm) for PPG signal acquisition and processing, based on GAP9, a parallel ultra low power system-on-chip featuring nine cores RISC-V compute cluster with neural network accelerator and 1 core RISC-V controller. The hardware platform also integrates a commercial complete Optical Biosensing Module and an ARM-Cortex M4 microcontroller unit (MCU) with Bluetooth low-energy connectivity. To demonstrate the capabilities of the system, a deep learning-based approach for PPG-based HR estimation has been deployed. Thanks to the reduced power consumption of the digital computational platform, the total power budget is just 2.67 mW providing up to 5 days of operation (105 mAh battery).

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This note describes ParallelKnoppix, a bootable CD that allows econometricians with average knowledge of computers to create and begin using a high performance computing cluster for parallel computing in very little time. The computers used may be heterogeneous machines, and clusters of up to 200 nodes are supported. When the cluster is shut down, all machines are in their original state, so their temporary use in the cluster does not interfere with their normal uses. An example shows how a Monte Carlo study of a bootstrap test procedure may be done in parallel. Using a cluster of 20 nodes, the example runs approximately 20 times faster than it does on a single computer.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The new social panorama resulting from aging of the Brazilian population is leading to significant transformations within healthcare. Through the cluster analysis strategy, it was sought to describe the specific care demands of the elderly population, using frailty components. Cross-sectional study based on reviewing medical records, conducted in the geriatric outpatient clinic, Hospital de Clínicas, Universidade Estadual de Campinas (Unicamp). Ninety-eight elderly users of this clinic were evaluated using cluster analysis and instruments for assessing their overall geriatric status and frailty characteristics. The variables that most strongly influenced the formation of clusters were age, functional capacities, cognitive capacity, presence of comorbidities and number of medications used. Three main groups of elderly people could be identified: one with good cognitive and functional performance but with high prevalence of comorbidities (mean age 77.9 years, cognitive impairment in 28.6% and mean of 7.4 comorbidities); a second with more advanced age, greater cognitive impairment and greater dependence (mean age 88.5 years old, cognitive impairment in 84.6% and mean of 7.1 comorbidities); and a third younger group with poor cognitive performance and greater number of comorbidities but functionally independent (mean age 78.5 years old, cognitive impairment in 89.6% and mean of 7.4 comorbidities). These data characterize the profile of this population and can be used as the basis for developing efficient strategies aimed at diminishing functional dependence, poor self-rated health and impaired quality of life.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The [Ru3O(Ac)6(py)2(CH3OH)]+ cluster provides an effective electrocatalytic species for the oxidation of methanol under mild conditions. This complex exhibits characteristic electrochemical waves at -1.02, 0.15 and 1.18 V, associated with the Ru3III,II,II/Ru3III,III,II/Ru 3III,III,III /Ru3IV,III,III successive redox couples, respectively. Above 1.7 V, formation of two RuIV centers enhances the 2-electron oxidation of the methanol ligand yielding formaldehyde, in agreement with the theoretical evolution of the HOMO levels as a function of the oxidation states. This work illustrates an important strategy to improve the efficiency of the oxidation catalysis, by using a multicentered redox catalyst and accessing its multiple higher oxidation states.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Identifying clusters of acute paracoccidioidomycosis cases could potentially help in identifying the environmental factors that influence the incidence of this mycosis. However, unlike other endemic mycoses, there are no published reports of clusters of paracoccidioidomycosis. Methodology/Principal Findings: A retrospective cluster detection test was applied to verify if an excess of acute form (AF) paracoccidioidomycosis cases in time and/or space occurred in Botucatu, an endemic area in Sao Paulo State. The scan-test SaTScan v7.0.3 was set to find clusters for the maximum temporal period of 1 year. The temporal test indicated a significant cluster in 1985 (P<0.005). This cluster comprised 10 cases, although 2.19 were expected for this year in this area. Age and clinical presentation of these cases were typical of AF paracccidioidomycosis. The space-time test confirmed the temporal cluster in 1985 and showed the localities where the risk was higher in that year. The cluster suggests that some particularities took place in the antecedent years in those localities. Analysis of climate variables showed that soil water storage was atypically high in 1982/83 (similar to 2.11/2.5 SD above mean), and the absolute air humidity in 1984, the year preceding the cluster, was much higher than normal (similar to 1.6 SD above mean), conditions that may have favored, respectively, antecedent fungal growth in the soil and conidia liberation in 1984, the probable year of exposure. These climatic anomalies in this area was due to the 1982/83 El Nino event, the strongest in the last 50 years. Conclusions/Significance: We describe the first cluster of AF paracoccidioidomycosis, which was potentially linked to a climatic anomaly caused by the 1982/83 El Nino Southern Oscillation. This finding is important because it may help to clarify the conditions that favor Paracoccidioides brasiliensis survival and growth in the environment and that enhance human exposure, thus allowing the development of preventive measures.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Context. Abundance variations in moderately metal-rich globular clusters can give clues about the formation and chemical enrichment of globular clusters. Aims. CN, CH, Na, Mg and Al indices in spectra of 89 stars of the template metal-rich globular cluster M71 are measured and implications on internal mixing are discussed. Methods. Stars from the turn-off up to the Red Giant Branch (0.87 < log g < 4.65) observed with the GMOS multi-object spectrograph at the Gemini-North telescope are analyzed. Radial velocities, colours, effective temperatures, gravities and spectral indices are determined for the sample. Results. Previous findings related to the CN bimodality and CN-CH anticorrelation in stars of M71 are confirmed. We also find a CN-Na correlation, and Al-Na, as well as an Mg(2)-Al anticorrelation. Conclusions. A combination of convective mixing and a primordial pollution by AGB or massive stars in the early stages of globular cluster formation is required to explain the observations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Context. It is not known how many globular clusters may remain undetected towards the Galactic bulge. Aims. One of the aims of the VISTA Variables in the Via Lactea (VVV) Survey is to accurately measure the physical parameters of the known globular clusters in the inner regions of the Milky Way and search for new ones, hidden in regions of large extinction. Methods. From deep near-infrared images, we derive deep JHK(S)-band photometry of a region surrounding the known globular cluster UKS 1 and reveal a new low-mass globular cluster candidate that we name VVV CL001. Results. We use the horizontal-branch red clump to measure E(B-V) similar to 2.2 mag, (m - M)(0) = 16.01 mag, and D = 15.9 kpc for the globular cluster UKS 1. On the basis of near-infrared colour-magnitude diagrams, we also find that VVV CL001 has E(B-V) similar to 2.0, and that it is at least as metal-poor as UKS 1, although its distance remains uncertain. Conclusions. Our finding confirms the previous projection that the central region of the Milky Way harbours more globular clusters. VVV CL001 and UKS 1 are good candidates for a physical cluster binary, but follow-up observations are needed to decide if they are located at the same distance and have similar radial velocities.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The A1763 superstructure at z = 0.23 contains the first galaxy filament to be directly detected using mid-infrared observations. Our previous work has shown that the frequency of starbursting galaxies, as characterized by 24 mu m emission is much higher within the filament than at either the center of the rich galaxy cluster, or the field surrounding the system. New Very Large Array and XMM-Newton data are presented here. We use the radio and X-ray data to examine the fraction and location of active galaxies, both active galactic nuclei (AGNs) and starbursts (SBs). The radio far-infrared correlation, X-ray point source location, IRAC colors, and quasar positions are all used to gain an understanding of the presence of dominant AGNs. We find very few MIPS-selected galaxies that are clearly dominated by AGN activity. Most radio-selected members within the filament are SBs. Within the supercluster, three of eight spectroscopic members detected both in the radio and in the mid-infrared are radio-bright AGNs. They are found at or near the core of A1763. The five SBs are located further along the filament. We calculate the physical properties of the known wide angle tail (WAT) source which is the brightest cluster galaxy of A1763. A second double lobe source is found along the filament well outside of the virial radius of either cluster. The velocity offset of the WAT from the X-ray centroid and the bend of the WAT in the intracluster medium are both consistent with ram pressure stripping, indicative of streaming motions along the direction of the filament. We consider this as further evidence of the cluster-feeding nature of the galaxy filament.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The mass function of cluster-size halos and their redshift distribution are computed for 12 distinct accelerating cosmological scenarios and confronted to the predictions of the conventional flat Lambda CDM model. The comparison with Lambda CDM is performed by a two-step process. First, we determine the free parameters of all models through a joint analysis involving the latest cosmological data, using supernovae type Ia, the cosmic microwave background shift parameter, and baryon acoustic oscillations. Apart from a braneworld inspired cosmology, it is found that the derived Hubble relation of the remaining models reproduces the Lambda CDM results approximately with the same degree of statistical confidence. Second, in order to attempt to distinguish the different dark energy models from the expectations of Lambda CDM, we analyze the predicted cluster-size halo redshift distribution on the basis of two future cluster surveys: (i) an X-ray survey based on the eROSITA satellite, and (ii) a Sunayev-Zeldovich survey based on the South Pole Telescope. As a result, we find that the predictions of 8 out of 12 dark energy models can be clearly distinguished from the Lambda CDM cosmology, while the predictions of 4 models are statistically equivalent to those of the Lambda CDM model, as far as the expected cluster mass function and redshift distribution are concerned. The present analysis suggests that such a technique appears to be very competitive to independent tests probing the late time evolution of the Universe and the associated dark energy effects.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We discuss the properties of homogeneous and isotropic flat cosmologies in which the present accelerating stage is powered only by the gravitationally induced creation of cold dark matter (CCDM) particles (Omega(m) = 1). For some matter creation rates proposed in the literature, we show that the main cosmological functions such as the scale factor of the universe, the Hubble expansion rate, the growth factor, and the cluster formation rate are analytically defined. The best CCDM scenario has only one free parameter and our joint analysis involving baryonic acoustic oscillations + cosmic microwave background (CMB) + SNe Ia data yields (Omega) over tilde = 0.28 +/- 0.01 (1 sigma), where (Omega) over tilde (m) is the observed matter density parameter. In particular, this implies that the model has no dark energy but the part of the matter that is effectively clustering is in good agreement with the latest determinations from the large- scale structure. The growth of perturbation and the formation of galaxy clusters in such scenarios are also investigated. Despite the fact that both scenarios may share the same Hubble expansion, we find that matter creation cosmologies predict stronger small scale dynamics which implies a faster growth rate of perturbations with respect to the usual Lambda CDM cosmology. Such results point to the possibility of a crucial observational test confronting CCDM with Lambda CDM scenarios through a more detailed analysis involving CMB, weak lensing, as well as the large-scale structure.