43 resultados para Supercomputers


Relevância:

10.00% 10.00%

Publicador:

Resumo:

We describe a new ab initio method for solving the time-dependent Schrödinger equation for multi-electron atomic systems exposed to intense short-pulse laser light. We call the method the R-matrix with time-dependence (RMT) method. Our starting point is a finite-difference numerical integrator (HELIUM), which has proved successful at describing few-electron atoms and atomic ions in strong laser fields with high accuracy. By exploiting the R-matrix division-of-space concept, we bring together a numerical method most appropriate to the multi-electron finite inner region (R-matrix basis set) and a different numerical method most appropriate to the one-electron outer region (finite difference). In order to exploit massively parallel supercomputers efficiently, we time-propagate the wavefunction in both regions by employing Arnoldi methods, originally developed for HELIUM.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Over the last decade an Auburn-Rollins-Strathclyde consortium has developed several suites of parallel R-matrix codes [1, 2, 3] that can meet the fundamental data needs required for the interpretation of astrophysical observation and/or plasma experiments. Traditionally our collisional work on light fusion-related atoms has been focused towards spectroscopy and impurity transport for magnetically confined fusion devices. Our approach has been to provide a comprehensive data set for the excitation/ionization for every ion stage of a particular element. As we progress towards a burning fusion plasma, there is a demand for the collisional processes involving tungsten, which has required a revitalization of the relativistic R-matrix approach. The implementation of these codes on massively parallel supercomputers has facilitated the progression to models involving thousands of levels in the close-coupling expansion required by the open d and f sub-shell systems of mid Z tungsten. This work also complements the electron-impact excitation of Fe-Peak elements required by astrophysics, in particular the near neutral species, which offer similar atomic structure challenges. Although electron-impact excitation work is our primary focus in terms of fusion application, the single photon photoionisation codes are also being developed in tandem, and benefit greatly from this ongoing work.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Exascale computation is the next target of high performance computing. In the push to create exascale computing platforms, simply increasing the number of hardware devices is not an acceptable option given the limitations of power consumption, heat dissipation, and programming models which are designed for current hardware platforms. Instead, new hardware technologies, coupled with improved programming abstractions and more autonomous runtime systems, are required to achieve this goal. This position paper presents the design of a new runtime for a new heterogeneous hardware platform being developed to explore energy efficient, high performance computing. By combining a number of different technologies, this framework will both simplify the programming of current and future HPC applications, as well as automating the scheduling of data and computation across this new hardware platform. In particular, this work explores the use of FPGAs to achieve both the power and performance goals of exascale, as well as utilising the runtime to automatically effect dynamic configuration and reconfiguration of these platforms. 

Relevância:

10.00% 10.00%

Publicador:

Resumo:

De nos jours, il est bien accepté que le cycle magnétique de 11 ans du Soleil est l'oeuvre d'une dynamo interne présente dans la zone convective. Bien qu'avec la puissance de calculs des ordinateurs actuels il soit possible, à l'aide de véritables simulations magnétohydrodynamiques, de résoudre le champ magnétique et la vitessse dans toutes les directions spatiales, il n'en reste pas moins que pour étudier l'évolution temporelle et spatiale de la dynamo solaire à grande échelle, il reste avantageux de travailler avec des modèles plus simples. Ainsi, nous avons utilisé un modèle simplifié de la dynamo solaire, nommé modèle de champ moyen, pour mieux comprendre les mécanismes importants à l'origine et au maintien de la dynamo solaire. L'insertion d'un tenseur-alpha complet dans un modèle dynamo de champ moyen, provenant d'un modèle global-MHD [Ghizaru et al., 2010] de la convection solaire, nous a permis d'approfondir le rôle que peut jouer la force électromotrice dans les cycles magnétiques produits par ce modèle global. De cette façon, nous avons pu reproduire certaines caractéristiques observées dans les cycles magnétiques provenant de la simulation de Ghizaru et al., 2010. Tout d'abord, le champ magnétique produit par le modèle de champ moyen présente deux modes dynamo distincts. Ces modes, de périodes similaires, sont présents et localisés sensiblement aux mêmes rayons et latitudes que ceux produits par le modèle global. Le fait que l'on puisse reproduire ces deux modes dynamo est dû à la complexité spatiale du tenseur-alpha. Par contre, le rapport entre les périodes des deux modes présents dans le modèle de champ moyen diffère significativement de celui trouvé dans le modèle global. Par ailleurs, on perd l'accumulation d'un fort champ magnétique sous la zone convective dans un modèle où la rotation différentielle n'est plus présente. Ceci suggère que la présence de rotation différentielle joue un rôle non négligeable dans l'accumulation du champ magnétique à cet endroit. Par ailleurs, le champ magnétique produit dans un modèle de champ moyen incluant un tenseur-alpha sans pompage turbulent global est très différent de celui produit par le tenseur original. Le pompage turbulent joue donc un rôle fondamental au sein de la distribution spatiale du champ magnétique. Il est important de souligner que les modèles dépourvus d'une rotation différentielle, utilisant le tenseur-alpha original ou n'utilisant pas de pompage turbulent, parviennent tous deux à produire une dynamo oscillatoire. Produire une telle dynamo à l'aide d'un modèle de ce type n'est pas évident, a priori. Finalement, l'intensité ainsi que le type de profil de circulation méridienne utilisés sont des facteurs affectant significativement la distribution spatiale de la dynamo produite.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The prediction of climate variability and change requires the use of a range of simulation models. Multiple climate model simulations are needed to sample the inherent uncertainties in seasonal to centennial prediction. Because climate models are computationally expensive, there is a tradeoff between complexity, spatial resolution, simulation length, and ensemble size. The methods used to assess climate impacts are examined in the context of this trade-off. An emphasis on complexity allows simulation of coupled mechanisms, such as the carbon cycle and feedbacks between agricultural land management and climate. In addition to improving skill, greater spatial resolution increases relevance to regional planning. Greater ensemble size improves the sampling of probabilities. Research from major international projects is used to show the importance of synergistic research efforts. The primary climate impact examined is crop yield, although many of the issues discussed are relevant to hydrology and health modeling. Methods used to bridge the scale gap between climate and crop models are reviewed. Recent advances include large-area crop modeling, quantification of uncertainty in crop yield, and fully integrated crop–climate modeling. The implications of trends in computer power, including supercomputers, are also discussed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The idea that supercomputers are an important part of making forecasts of the weather and climate is well known amongst the general population. However, the details of their use are somewhat mysterious. A concept used to illustrate many undergraduate numerical weather prediction courses is the idea of a giant 'forecast factory,' conceived by Lewis Fry Richardson in 1922. In this article, a way of using the same idea to communicate key ideas in numerical weather prediction to the general public is outlined and tested amongst children from local schools.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Tropical Cyclone (TC) is normally not studied at the individual level with Global Climate Models (GCMs), because the coarse grid spacing is often deemed insufficient for a realistic representation of the basic underlying processes. GCMs are indeed routinely deployed at low resolution, in order to enable sufficiently long integrations, which means that only large-scale TC proxies are diagnosed. A new class of GCMs is emerging, however, which is capable of simulating TC-type vortexes by retaining a horizontal resolution similar to that of operational NWP GCMs; their integration on the latest supercomputers enables the completion of long-term integrations. The UK-Japan Climate Collaboration and the UK-HiGEM projects have developed climate GCMs which can be run routinely for decades (with grid spacing of 60 km) or centuries (with grid spacing of 90 km); when coupled to the ocean GCM, a mesh of 1/3 degrees provides eddy-permitting resolution. The 90 km resolution model has been developed entirely by the UK-HiGEM consortium (together with its 1/3 degree ocean component); the 60 km atmospheric GCM has been developed by UJCC, in collaboration with the Met Office Hadley Centre.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The evolution of commodity computing lead to the possibility of efficient usage of interconnected machines to solve computationally-intensive tasks, which were previously solvable only by using expensive supercomputers. This, however, required new methods for process scheduling and distribution, considering the network latency, communication cost, heterogeneous environments and distributed computing constraints. An efficient distribution of processes over such environments requires an adequate scheduling strategy, as the cost of inefficient process allocation is unacceptably high. Therefore, a knowledge and prediction of application behavior is essential to perform effective scheduling. In this paper, we overview the evolution of scheduling approaches, focusing on distributed environments. We also evaluate the current approaches for process behavior extraction and prediction, aiming at selecting an adequate technique for online prediction of application execution. Based on this evaluation, we propose a novel model for application behavior prediction, considering chaotic properties of such behavior and the automatic detection of critical execution points. The proposed model is applied and evaluated for process scheduling in cluster and grid computing environments. The obtained results demonstrate that prediction of the process behavior is essential for efficient scheduling in large-scale and heterogeneous distributed environments, outperforming conventional scheduling policies by a factor of 10, and even more in some cases. Furthermore, the proposed approach proves to be efficient for online predictions due to its low computational cost and good precision. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Large-scale simulations of parts of the brain using detailed neuronal models to improve our understanding of brain functions are becoming a reality with the usage of supercomputers and large clusters. However, the high acquisition and maintenance cost of these computers, including the physical space, air conditioning, and electrical power, limits the number of simulations of this kind that scientists can perform. Modern commodity graphical cards, based on the CUDA platform, contain graphical processing units (GPUs) composed of hundreds of processors that can simultaneously execute thousands of threads and thus constitute a low-cost solution for many high-performance computing applications. In this work, we present a CUDA algorithm that enables the execution, on multiple GPUs, of simulations of large-scale networks composed of biologically realistic Hodgkin-Huxley neurons. The algorithm represents each neuron as a CUDA thread, which solves the set of coupled differential equations that model each neuron. Communication among neurons located in different GPUs is coordinated by the CPU. We obtained speedups of 40 for the simulation of 200k neurons that received random external input and speedups of 9 for a network with 200k neurons and 20M neuronal connections, in a single computer with two graphic boards with two GPUs each, when compared with a modern quad-core CPU. Copyright (C) 2010 John Wiley & Sons, Ltd.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Biological sequence assembly is an essential step for sequencing the genomes of organisms. Sequence assembly is very computing intensive especially for the large-scale sequence assembly. Parallel computing is an effective way to reduce the computing time and support the assembly for large amount of biological fragments. Euler sequence assembly algorithm is an innovative algorithm proposed recently. The advantage of this algorithm is that its computing complexity is polynomial and it provides a better solution to the notorious “repeat” problem. This paper introduces the parallelization of the Euler sequence assembly algorithm. All the Genome fragments generated by whole genome shotgun (WGS) will be assembled as a whole rather than dividing them into groups which may incurs errors due to the inaccurate group partition. The implemented system can be run on supercomputers, network of workstations or even network of PC computers. The experimental results have demonstrated the performance of our system.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The seismic method is of extreme importance in geophysics. Mainly associated with oil exploration, this line of research focuses most of all investment in this area. The acquisition, processing and interpretation of seismic data are the parts that instantiate a seismic study. Seismic processing in particular is focused on the imaging that represents the geological structures in subsurface. Seismic processing has evolved significantly in recent decades due to the demands of the oil industry, and also due to the technological advances of hardware that achieved higher storage and digital information processing capabilities, which enabled the development of more sophisticated processing algorithms such as the ones that use of parallel architectures. One of the most important steps in seismic processing is imaging. Migration of seismic data is one of the techniques used for imaging, with the goal of obtaining a seismic section image that represents the geological structures the most accurately and faithfully as possible. The result of migration is a 2D or 3D image which it is possible to identify faults and salt domes among other structures of interest, such as potential hydrocarbon reservoirs. However, a migration fulfilled with quality and accuracy may be a long time consuming process, due to the mathematical algorithm heuristics and the extensive amount of data inputs and outputs involved in this process, which may take days, weeks and even months of uninterrupted execution on the supercomputers, representing large computational and financial costs, that could derail the implementation of these methods. Aiming at performance improvement, this work conducted the core parallelization of a Reverse Time Migration (RTM) algorithm, using the parallel programming model Open Multi-Processing (OpenMP), due to the large computational effort required by this migration technique. Furthermore, analyzes such as speedup, efficiency were performed, and ultimately, the identification of the algorithmic scalability degree with respect to the technological advancement expected by future processors

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The constant increase of complexity in computer applications demands the development of more powerful hardware support for them. With processor's operational frequency reaching its limit, the most viable solution is the use of parallelism. Based on parallelism techniques and the progressive growth in the capacity of transistors integration in a single chip is the concept of MPSoCs (Multi-Processor System-on-Chip). MPSoCs will eventually become a cheaper and faster alternative to supercomputers and clusters, and applications developed for these high performance systems will migrate to computers equipped with MP-SoCs containing dozens to hundreds of computation cores. In particular, applications in the area of oil and natural gas exploration are also characterized by the high processing capacity required and would benefit greatly from these high performance systems. This work intends to evaluate a traditional and complex application of the oil and gas industry known as reservoir simulation, developing a solution with integrated computational systems in a single chip, with hundreds of functional unities. For this, as the STORM (MPSoC Directory-Based Platform) platform already has a shared memory model, a new distributed memory model were developed. Also a message passing library has been developed folowing MPI standard

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Being basic ingredients of numerous daily-life products with significant industrial importance as well as basic building blocks for biomaterials, charged hydrogels continue to pose a series of unanswered challenges for scientists even after decades of practical applications and intensive research efforts. Despite a rather simple internal structure it is mainly the unique combination of short- and long-range forces which render scientific investigations of their characteristic properties to be quite difficult. Hence early on computer simulations were used to link analytical theory and empirical experiments, bridging the gap between the simplifying assumptions of the models and the complexity of real world measurements. Due to the immense numerical effort, even for high performance supercomputers, system sizes and time scales were rather restricted until recently, whereas it only now has become possible to also simulate a network of charged macromolecules. This is the topic of the presented thesis which investigates one of the fundamental and at the same time highly fascinating phenomenon of polymer research: The swelling behaviour of polyelectrolyte networks. For this an extensible simulation package for the research on soft matter systems, ESPResSo for short, was created which puts a particular emphasis on mesoscopic bead-spring-models of complex systems. Highly efficient algorithms and a consistent parallelization reduced the necessary computation time for solving equations of motion even in case of long-ranged electrostatics and large number of particles, allowing to tackle even expensive calculations and applications. Nevertheless, the program has a modular and simple structure, enabling a continuous process of adding new potentials, interactions, degrees of freedom, ensembles, and integrators, while staying easily accessible for newcomers due to a Tcl-script steering level controlling the C-implemented simulation core. Numerous analysis routines provide means to investigate system properties and observables on-the-fly. Even though analytical theories agreed on the modeling of networks in the past years, our numerical MD-simulations show that even in case of simple model systems fundamental theoretical assumptions no longer apply except for a small parameter regime, prohibiting correct predictions of observables. Applying a "microscopic" analysis of the isolated contributions of individual system components, one of the particular strengths of computer simulations, it was then possible to describe the behaviour of charged polymer networks at swelling equilibrium in good solvent and close to the Theta-point by introducing appropriate model modifications. This became possible by enhancing known simple scaling arguments with components deemed crucial in our detailed study, through which a generalized model could be constructed. Herewith an agreement of the final system volume of swollen polyelectrolyte gels with results of computer simulations could be shown successfully over the entire investigated range of parameters, for different network sizes, charge fractions, and interaction strengths. In addition, the "cell under tension" was presented as a self-regulating approach for predicting the amount of swelling based on the used system parameters only. Without the need for measured observables as input, minimizing the free energy alone already allows to determine the the equilibrium behaviour. In poor solvent the shape of the network chains changes considerably, as now their hydrophobicity counteracts the repulsion of like-wise charged monomers and pursues collapsing the polyelectrolytes. Depending on the chosen parameters a fragile balance emerges, giving rise to fascinating geometrical structures such as the so-called pear-necklaces. This behaviour, known from single chain polyelectrolytes under similar environmental conditions and also theoretically predicted, could be detected for the first time for networks as well. An analysis of the total structure factors confirmed first evidences for the existence of such structures found in experimental results.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Microprocessori basati su singolo processore (CPU), hanno visto una rapida crescita di performances ed un abbattimento dei costi per circa venti anni. Questi microprocessori hanno portato una potenza di calcolo nell’ordine del GFLOPS (Giga Floating Point Operation per Second) sui PC Desktop e centinaia di GFLOPS su clusters di server. Questa ascesa ha portato nuove funzionalità nei programmi, migliori interfacce utente e tanti altri vantaggi. Tuttavia questa crescita ha subito un brusco rallentamento nel 2003 a causa di consumi energetici sempre più elevati e problemi di dissipazione termica, che hanno impedito incrementi di frequenza di clock. I limiti fisici del silicio erano sempre più vicini. Per ovviare al problema i produttori di CPU (Central Processing Unit) hanno iniziato a progettare microprocessori multicore, scelta che ha avuto un impatto notevole sulla comunità degli sviluppatori, abituati a considerare il software come una serie di comandi sequenziali. Quindi i programmi che avevano sempre giovato di miglioramenti di prestazioni ad ogni nuova generazione di CPU, non hanno avuto incrementi di performance, in quanto essendo eseguiti su un solo core, non beneficiavano dell’intera potenza della CPU. Per sfruttare appieno la potenza delle nuove CPU la programmazione concorrente, precedentemente utilizzata solo su sistemi costosi o supercomputers, è diventata una pratica sempre più utilizzata dagli sviluppatori. Allo stesso tempo, l’industria videoludica ha conquistato una fetta di mercato notevole: solo nel 2013 verranno spesi quasi 100 miliardi di dollari fra hardware e software dedicati al gaming. Le software houses impegnate nello sviluppo di videogames, per rendere i loro titoli più accattivanti, puntano su motori grafici sempre più potenti e spesso scarsamente ottimizzati, rendendoli estremamente esosi in termini di performance. Per questo motivo i produttori di GPU (Graphic Processing Unit), specialmente nell’ultimo decennio, hanno dato vita ad una vera e propria rincorsa alle performances che li ha portati ad ottenere dei prodotti con capacità di calcolo vertiginose. Ma al contrario delle CPU che agli inizi del 2000 intrapresero la strada del multicore per continuare a favorire programmi sequenziali, le GPU sono diventate manycore, ovvero con centinaia e centinaia di piccoli cores che eseguono calcoli in parallelo. Questa immensa capacità di calcolo può essere utilizzata in altri campi applicativi? La risposta è si e l’obiettivo di questa tesi è proprio quello di constatare allo stato attuale, in che modo e con quale efficienza pùo un software generico, avvalersi dell’utilizzo della GPU invece della CPU.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis deals with heterogeneous architectures in standard workstations. Heterogeneous architectures represent an appealing alternative to traditional supercomputers because they are based on commodity components fabricated in large quantities. Hence their price-performance ratio is unparalleled in the world of high performance computing (HPC). In particular, different aspects related to the performance and consumption of heterogeneous architectures have been explored. The thesis initially focuses on an efficient implementation of a parallel application, where the execution time is dominated by an high number of floating point instructions. Then the thesis touches the central problem of efficient management of power peaks in heterogeneous computing systems. Finally it discusses a memory-bounded problem, where the execution time is dominated by the memory latency. Specifically, the following main contributions have been carried out: A novel framework for the design and analysis of solar field for Central Receiver Systems (CRS) has been developed. The implementation based on desktop workstation equipped with multiple Graphics Processing Units (GPUs) is motivated by the need to have an accurate and fast simulation environment for studying mirror imperfection and non-planar geometries. Secondly, a power-aware scheduling algorithm on heterogeneous CPU-GPU architectures, based on an efficient distribution of the computing workload to the resources, has been realized. The scheduler manages the resources of several computing nodes with a view to reducing the peak power. The two main contributions of this work follow: the approach reduces the supply cost due to high peak power whilst having negligible impact on the parallelism of computational nodes. from another point of view the developed model allows designer to increase the number of cores without increasing the capacity of the power supply unit. Finally, an implementation for efficient graph exploration on reconfigurable architectures is presented. The purpose is to accelerate graph exploration, reducing the number of random memory accesses.