41 resultados para Supercomputers
Resumo:
Over the last decade an Auburn-Rollins-Strathclyde consortium has developed several suites of parallel R-matrix codes [1, 2, 3] that can meet the fundamental data needs required for the interpretation of astrophysical observation and/or plasma experiments. Traditionally our collisional work on light fusion-related atoms has been focused towards spectroscopy and impurity transport for magnetically confined fusion devices. Our approach has been to provide a comprehensive data set for the excitation/ionization for every ion stage of a particular element. As we progress towards a burning fusion plasma, there is a demand for the collisional processes involving tungsten, which has required a revitalization of the relativistic R-matrix approach. The implementation of these codes on massively parallel supercomputers has facilitated the progression to models involving thousands of levels in the close-coupling expansion required by the open d and f sub-shell systems of mid Z tungsten. This work also complements the electron-impact excitation of Fe-Peak elements required by astrophysics, in particular the near neutral species, which offer similar atomic structure challenges. Although electron-impact excitation work is our primary focus in terms of fusion application, the single photon photoionisation codes are also being developed in tandem, and benefit greatly from this ongoing work.
Resumo:
Exascale computation is the next target of high performance computing. In the push to create exascale computing platforms, simply increasing the number of hardware devices is not an acceptable option given the limitations of power consumption, heat dissipation, and programming models which are designed for current hardware platforms. Instead, new hardware technologies, coupled with improved programming abstractions and more autonomous runtime systems, are required to achieve this goal. This position paper presents the design of a new runtime for a new heterogeneous hardware platform being developed to explore energy efficient, high performance computing. By combining a number of different technologies, this framework will both simplify the programming of current and future HPC applications, as well as automating the scheduling of data and computation across this new hardware platform. In particular, this work explores the use of FPGAs to achieve both the power and performance goals of exascale, as well as utilising the runtime to automatically effect dynamic configuration and reconfiguration of these platforms.
Resumo:
De nos jours, il est bien accepté que le cycle magnétique de 11 ans du Soleil est l'oeuvre d'une dynamo interne présente dans la zone convective. Bien qu'avec la puissance de calculs des ordinateurs actuels il soit possible, à l'aide de véritables simulations magnétohydrodynamiques, de résoudre le champ magnétique et la vitessse dans toutes les directions spatiales, il n'en reste pas moins que pour étudier l'évolution temporelle et spatiale de la dynamo solaire à grande échelle, il reste avantageux de travailler avec des modèles plus simples. Ainsi, nous avons utilisé un modèle simplifié de la dynamo solaire, nommé modèle de champ moyen, pour mieux comprendre les mécanismes importants à l'origine et au maintien de la dynamo solaire. L'insertion d'un tenseur-alpha complet dans un modèle dynamo de champ moyen, provenant d'un modèle global-MHD [Ghizaru et al., 2010] de la convection solaire, nous a permis d'approfondir le rôle que peut jouer la force électromotrice dans les cycles magnétiques produits par ce modèle global. De cette façon, nous avons pu reproduire certaines caractéristiques observées dans les cycles magnétiques provenant de la simulation de Ghizaru et al., 2010. Tout d'abord, le champ magnétique produit par le modèle de champ moyen présente deux modes dynamo distincts. Ces modes, de périodes similaires, sont présents et localisés sensiblement aux mêmes rayons et latitudes que ceux produits par le modèle global. Le fait que l'on puisse reproduire ces deux modes dynamo est dû à la complexité spatiale du tenseur-alpha. Par contre, le rapport entre les périodes des deux modes présents dans le modèle de champ moyen diffère significativement de celui trouvé dans le modèle global. Par ailleurs, on perd l'accumulation d'un fort champ magnétique sous la zone convective dans un modèle où la rotation différentielle n'est plus présente. Ceci suggère que la présence de rotation différentielle joue un rôle non négligeable dans l'accumulation du champ magnétique à cet endroit. Par ailleurs, le champ magnétique produit dans un modèle de champ moyen incluant un tenseur-alpha sans pompage turbulent global est très différent de celui produit par le tenseur original. Le pompage turbulent joue donc un rôle fondamental au sein de la distribution spatiale du champ magnétique. Il est important de souligner que les modèles dépourvus d'une rotation différentielle, utilisant le tenseur-alpha original ou n'utilisant pas de pompage turbulent, parviennent tous deux à produire une dynamo oscillatoire. Produire une telle dynamo à l'aide d'un modèle de ce type n'est pas évident, a priori. Finalement, l'intensité ainsi que le type de profil de circulation méridienne utilisés sont des facteurs affectant significativement la distribution spatiale de la dynamo produite.
Resumo:
The prediction of climate variability and change requires the use of a range of simulation models. Multiple climate model simulations are needed to sample the inherent uncertainties in seasonal to centennial prediction. Because climate models are computationally expensive, there is a tradeoff between complexity, spatial resolution, simulation length, and ensemble size. The methods used to assess climate impacts are examined in the context of this trade-off. An emphasis on complexity allows simulation of coupled mechanisms, such as the carbon cycle and feedbacks between agricultural land management and climate. In addition to improving skill, greater spatial resolution increases relevance to regional planning. Greater ensemble size improves the sampling of probabilities. Research from major international projects is used to show the importance of synergistic research efforts. The primary climate impact examined is crop yield, although many of the issues discussed are relevant to hydrology and health modeling. Methods used to bridge the scale gap between climate and crop models are reviewed. Recent advances include large-area crop modeling, quantification of uncertainty in crop yield, and fully integrated crop–climate modeling. The implications of trends in computer power, including supercomputers, are also discussed.
Resumo:
The idea that supercomputers are an important part of making forecasts of the weather and climate is well known amongst the general population. However, the details of their use are somewhat mysterious. A concept used to illustrate many undergraduate numerical weather prediction courses is the idea of a giant 'forecast factory,' conceived by Lewis Fry Richardson in 1922. In this article, a way of using the same idea to communicate key ideas in numerical weather prediction to the general public is outlined and tested amongst children from local schools.
Resumo:
Tropical Cyclone (TC) is normally not studied at the individual level with Global Climate Models (GCMs), because the coarse grid spacing is often deemed insufficient for a realistic representation of the basic underlying processes. GCMs are indeed routinely deployed at low resolution, in order to enable sufficiently long integrations, which means that only large-scale TC proxies are diagnosed. A new class of GCMs is emerging, however, which is capable of simulating TC-type vortexes by retaining a horizontal resolution similar to that of operational NWP GCMs; their integration on the latest supercomputers enables the completion of long-term integrations. The UK-Japan Climate Collaboration and the UK-HiGEM projects have developed climate GCMs which can be run routinely for decades (with grid spacing of 60 km) or centuries (with grid spacing of 90 km); when coupled to the ocean GCM, a mesh of 1/3 degrees provides eddy-permitting resolution. The 90 km resolution model has been developed entirely by the UK-HiGEM consortium (together with its 1/3 degree ocean component); the 60 km atmospheric GCM has been developed by UJCC, in collaboration with the Met Office Hadley Centre.
Resumo:
The evolution of commodity computing lead to the possibility of efficient usage of interconnected machines to solve computationally-intensive tasks, which were previously solvable only by using expensive supercomputers. This, however, required new methods for process scheduling and distribution, considering the network latency, communication cost, heterogeneous environments and distributed computing constraints. An efficient distribution of processes over such environments requires an adequate scheduling strategy, as the cost of inefficient process allocation is unacceptably high. Therefore, a knowledge and prediction of application behavior is essential to perform effective scheduling. In this paper, we overview the evolution of scheduling approaches, focusing on distributed environments. We also evaluate the current approaches for process behavior extraction and prediction, aiming at selecting an adequate technique for online prediction of application execution. Based on this evaluation, we propose a novel model for application behavior prediction, considering chaotic properties of such behavior and the automatic detection of critical execution points. The proposed model is applied and evaluated for process scheduling in cluster and grid computing environments. The obtained results demonstrate that prediction of the process behavior is essential for efficient scheduling in large-scale and heterogeneous distributed environments, outperforming conventional scheduling policies by a factor of 10, and even more in some cases. Furthermore, the proposed approach proves to be efficient for online predictions due to its low computational cost and good precision. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
Large-scale simulations of parts of the brain using detailed neuronal models to improve our understanding of brain functions are becoming a reality with the usage of supercomputers and large clusters. However, the high acquisition and maintenance cost of these computers, including the physical space, air conditioning, and electrical power, limits the number of simulations of this kind that scientists can perform. Modern commodity graphical cards, based on the CUDA platform, contain graphical processing units (GPUs) composed of hundreds of processors that can simultaneously execute thousands of threads and thus constitute a low-cost solution for many high-performance computing applications. In this work, we present a CUDA algorithm that enables the execution, on multiple GPUs, of simulations of large-scale networks composed of biologically realistic Hodgkin-Huxley neurons. The algorithm represents each neuron as a CUDA thread, which solves the set of coupled differential equations that model each neuron. Communication among neurons located in different GPUs is coordinated by the CPU. We obtained speedups of 40 for the simulation of 200k neurons that received random external input and speedups of 9 for a network with 200k neurons and 20M neuronal connections, in a single computer with two graphic boards with two GPUs each, when compared with a modern quad-core CPU. Copyright (C) 2010 John Wiley & Sons, Ltd.
Resumo:
The seismic method is of extreme importance in geophysics. Mainly associated with oil exploration, this line of research focuses most of all investment in this area. The acquisition, processing and interpretation of seismic data are the parts that instantiate a seismic study. Seismic processing in particular is focused on the imaging that represents the geological structures in subsurface. Seismic processing has evolved significantly in recent decades due to the demands of the oil industry, and also due to the technological advances of hardware that achieved higher storage and digital information processing capabilities, which enabled the development of more sophisticated processing algorithms such as the ones that use of parallel architectures. One of the most important steps in seismic processing is imaging. Migration of seismic data is one of the techniques used for imaging, with the goal of obtaining a seismic section image that represents the geological structures the most accurately and faithfully as possible. The result of migration is a 2D or 3D image which it is possible to identify faults and salt domes among other structures of interest, such as potential hydrocarbon reservoirs. However, a migration fulfilled with quality and accuracy may be a long time consuming process, due to the mathematical algorithm heuristics and the extensive amount of data inputs and outputs involved in this process, which may take days, weeks and even months of uninterrupted execution on the supercomputers, representing large computational and financial costs, that could derail the implementation of these methods. Aiming at performance improvement, this work conducted the core parallelization of a Reverse Time Migration (RTM) algorithm, using the parallel programming model Open Multi-Processing (OpenMP), due to the large computational effort required by this migration technique. Furthermore, analyzes such as speedup, efficiency were performed, and ultimately, the identification of the algorithmic scalability degree with respect to the technological advancement expected by future processors
Resumo:
The constant increase of complexity in computer applications demands the development of more powerful hardware support for them. With processor's operational frequency reaching its limit, the most viable solution is the use of parallelism. Based on parallelism techniques and the progressive growth in the capacity of transistors integration in a single chip is the concept of MPSoCs (Multi-Processor System-on-Chip). MPSoCs will eventually become a cheaper and faster alternative to supercomputers and clusters, and applications developed for these high performance systems will migrate to computers equipped with MP-SoCs containing dozens to hundreds of computation cores. In particular, applications in the area of oil and natural gas exploration are also characterized by the high processing capacity required and would benefit greatly from these high performance systems. This work intends to evaluate a traditional and complex application of the oil and gas industry known as reservoir simulation, developing a solution with integrated computational systems in a single chip, with hundreds of functional unities. For this, as the STORM (MPSoC Directory-Based Platform) platform already has a shared memory model, a new distributed memory model were developed. Also a message passing library has been developed folowing MPI standard
Resumo:
Being basic ingredients of numerous daily-life products with significant industrial importance as well as basic building blocks for biomaterials, charged hydrogels continue to pose a series of unanswered challenges for scientists even after decades of practical applications and intensive research efforts. Despite a rather simple internal structure it is mainly the unique combination of short- and long-range forces which render scientific investigations of their characteristic properties to be quite difficult. Hence early on computer simulations were used to link analytical theory and empirical experiments, bridging the gap between the simplifying assumptions of the models and the complexity of real world measurements. Due to the immense numerical effort, even for high performance supercomputers, system sizes and time scales were rather restricted until recently, whereas it only now has become possible to also simulate a network of charged macromolecules. This is the topic of the presented thesis which investigates one of the fundamental and at the same time highly fascinating phenomenon of polymer research: The swelling behaviour of polyelectrolyte networks. For this an extensible simulation package for the research on soft matter systems, ESPResSo for short, was created which puts a particular emphasis on mesoscopic bead-spring-models of complex systems. Highly efficient algorithms and a consistent parallelization reduced the necessary computation time for solving equations of motion even in case of long-ranged electrostatics and large number of particles, allowing to tackle even expensive calculations and applications. Nevertheless, the program has a modular and simple structure, enabling a continuous process of adding new potentials, interactions, degrees of freedom, ensembles, and integrators, while staying easily accessible for newcomers due to a Tcl-script steering level controlling the C-implemented simulation core. Numerous analysis routines provide means to investigate system properties and observables on-the-fly. Even though analytical theories agreed on the modeling of networks in the past years, our numerical MD-simulations show that even in case of simple model systems fundamental theoretical assumptions no longer apply except for a small parameter regime, prohibiting correct predictions of observables. Applying a "microscopic" analysis of the isolated contributions of individual system components, one of the particular strengths of computer simulations, it was then possible to describe the behaviour of charged polymer networks at swelling equilibrium in good solvent and close to the Theta-point by introducing appropriate model modifications. This became possible by enhancing known simple scaling arguments with components deemed crucial in our detailed study, through which a generalized model could be constructed. Herewith an agreement of the final system volume of swollen polyelectrolyte gels with results of computer simulations could be shown successfully over the entire investigated range of parameters, for different network sizes, charge fractions, and interaction strengths. In addition, the "cell under tension" was presented as a self-regulating approach for predicting the amount of swelling based on the used system parameters only. Without the need for measured observables as input, minimizing the free energy alone already allows to determine the the equilibrium behaviour. In poor solvent the shape of the network chains changes considerably, as now their hydrophobicity counteracts the repulsion of like-wise charged monomers and pursues collapsing the polyelectrolytes. Depending on the chosen parameters a fragile balance emerges, giving rise to fascinating geometrical structures such as the so-called pear-necklaces. This behaviour, known from single chain polyelectrolytes under similar environmental conditions and also theoretically predicted, could be detected for the first time for networks as well. An analysis of the total structure factors confirmed first evidences for the existence of such structures found in experimental results.
Resumo:
Microprocessori basati su singolo processore (CPU), hanno visto una rapida crescita di performances ed un abbattimento dei costi per circa venti anni. Questi microprocessori hanno portato una potenza di calcolo nell’ordine del GFLOPS (Giga Floating Point Operation per Second) sui PC Desktop e centinaia di GFLOPS su clusters di server. Questa ascesa ha portato nuove funzionalità nei programmi, migliori interfacce utente e tanti altri vantaggi. Tuttavia questa crescita ha subito un brusco rallentamento nel 2003 a causa di consumi energetici sempre più elevati e problemi di dissipazione termica, che hanno impedito incrementi di frequenza di clock. I limiti fisici del silicio erano sempre più vicini. Per ovviare al problema i produttori di CPU (Central Processing Unit) hanno iniziato a progettare microprocessori multicore, scelta che ha avuto un impatto notevole sulla comunità degli sviluppatori, abituati a considerare il software come una serie di comandi sequenziali. Quindi i programmi che avevano sempre giovato di miglioramenti di prestazioni ad ogni nuova generazione di CPU, non hanno avuto incrementi di performance, in quanto essendo eseguiti su un solo core, non beneficiavano dell’intera potenza della CPU. Per sfruttare appieno la potenza delle nuove CPU la programmazione concorrente, precedentemente utilizzata solo su sistemi costosi o supercomputers, è diventata una pratica sempre più utilizzata dagli sviluppatori. Allo stesso tempo, l’industria videoludica ha conquistato una fetta di mercato notevole: solo nel 2013 verranno spesi quasi 100 miliardi di dollari fra hardware e software dedicati al gaming. Le software houses impegnate nello sviluppo di videogames, per rendere i loro titoli più accattivanti, puntano su motori grafici sempre più potenti e spesso scarsamente ottimizzati, rendendoli estremamente esosi in termini di performance. Per questo motivo i produttori di GPU (Graphic Processing Unit), specialmente nell’ultimo decennio, hanno dato vita ad una vera e propria rincorsa alle performances che li ha portati ad ottenere dei prodotti con capacità di calcolo vertiginose. Ma al contrario delle CPU che agli inizi del 2000 intrapresero la strada del multicore per continuare a favorire programmi sequenziali, le GPU sono diventate manycore, ovvero con centinaia e centinaia di piccoli cores che eseguono calcoli in parallelo. Questa immensa capacità di calcolo può essere utilizzata in altri campi applicativi? La risposta è si e l’obiettivo di questa tesi è proprio quello di constatare allo stato attuale, in che modo e con quale efficienza pùo un software generico, avvalersi dell’utilizzo della GPU invece della CPU.
Resumo:
This thesis deals with heterogeneous architectures in standard workstations. Heterogeneous architectures represent an appealing alternative to traditional supercomputers because they are based on commodity components fabricated in large quantities. Hence their price-performance ratio is unparalleled in the world of high performance computing (HPC). In particular, different aspects related to the performance and consumption of heterogeneous architectures have been explored. The thesis initially focuses on an efficient implementation of a parallel application, where the execution time is dominated by an high number of floating point instructions. Then the thesis touches the central problem of efficient management of power peaks in heterogeneous computing systems. Finally it discusses a memory-bounded problem, where the execution time is dominated by the memory latency. Specifically, the following main contributions have been carried out: A novel framework for the design and analysis of solar field for Central Receiver Systems (CRS) has been developed. The implementation based on desktop workstation equipped with multiple Graphics Processing Units (GPUs) is motivated by the need to have an accurate and fast simulation environment for studying mirror imperfection and non-planar geometries. Secondly, a power-aware scheduling algorithm on heterogeneous CPU-GPU architectures, based on an efficient distribution of the computing workload to the resources, has been realized. The scheduler manages the resources of several computing nodes with a view to reducing the peak power. The two main contributions of this work follow: the approach reduces the supply cost due to high peak power whilst having negligible impact on the parallelism of computational nodes. from another point of view the developed model allows designer to increase the number of cores without increasing the capacity of the power supply unit. Finally, an implementation for efficient graph exploration on reconfigurable architectures is presented. The purpose is to accelerate graph exploration, reducing the number of random memory accesses.
Resumo:
Immersed boundary simulations have been under development for physiological flows, allowing for elegant handling of fluid-structure interaction modelling with large deformations due to retained domain-specific meshing. We couple a structural system in Lagrangian representation that is formulated in a weak form with a Navier-Stokes system discretized through a finite differences scheme. We build upon a proven highly scalable imcompressible flow solver that we extend to handle FSI. We aim at applying our method to investigating the hemodynamics of Aortic Valves. The code is going to be extended to conform to the new hybrid-node supercomputers.
Resumo:
Esta tesis constituye un gran avance en el conocimiento del estudio y análisis de inestabilidades hidrodinámicas desde un punto de vista físico y teórico, como consecuencia de haber desarrollado innovadoras técnicas para la resolución computacional eficiente y precisa de la parte principal del espectro correspondiente a los problemas de autovalores (EVP) multidimensionales que gobiernan la inestabilidad de flujos con dos o tres direcciones espaciales inhomogéneas, denominados problemas de estabilidad global lineal. En el contexto del trabajo de desarrollo de herramientas computacionales presentado en la tesis, la discretización mediante métodos de diferencias finitas estables de alto orden de los EVP bidimensionales y tridimensionales que se derivan de las ecuaciones de Navier-Stokes linealizadas sobre flujos con dos o tres direcciones espaciales inhomogéneas, ha permitido una aceleración de cuatro órdenes de magnitud en su resolución. Esta mejora de eficiencia numérica se ha conseguido gracias al hecho de que usando estos esquemas de diferencias finitas, técnicas eficientes de resolución de problemas lineales son utilizables, explotando el alto nivel de dispersión o alto número de elementos nulos en las matrices involucradas en los problemas tratados. Como más notable consecuencia cabe destacar que la resolución de EVPs multidimensionales de inestabilidad global, que hasta la fecha necesitaban de superordenadores, se ha podido realizar en ordenadores de sobremesa. Además de la solución de problemas de estabilidad global lineal, el mencionado desarrollo numérico facilitó la extensión de las ecuaciones de estabilidad parabolizadas (PSE) lineales y no lineales para analizar la inestabilidad de flujos que dependen fuertemente en dos direcciones espaciales y suavemente en la tercera con las ecuaciones de estabilidad parabolizadas tridimensionales (PSE-3D). Precisamente la capacidad de extensión del novedoso algoritmo PSE-3D para el estudio de interacciones no lineales de los modos de estabilidad, desarrollado íntegramente en esta tesis, permite la predicción de transición en flujos complejos de gran interés industrial y por lo tanto extiende el concepto clásico de PSE, el cuál ha sido empleado exitosamente durante las pasadas tres décadas en el mismo contexto para problemas de capa límite bidimensional. Típicos ejemplos de flujos incompresibles se han analizado en este trabajo sin la necesidad de recurrir a restrictivas presuposiciones usadas en el pasado. Se han estudiado problemas vorticales como es el caso de un vórtice aislado o sistemas de vórtices simulando la estela de alas, en los que la homogeneidad axial no se impone y así se puede considerar la difusión viscosa del flujo. Además, se ha estudiado el chorro giratorio turbulento, cuya inestabilidad se utiliza para mejorar las características de funcionamiento de combustores. En la tesis se abarcan adicionalmente problemas de flujos compresibles. Se presenta el estudio de inestabilidad de flujos de borde de ataque a diferentes velocidades de vuelo. También se analiza la estela formada por un elemento rugoso aislado en capa límite supersónica e hipersónica, mostrando excelentes comparaciones con resultados obtenidos mediante simulación numérica directa. Finalmente, nuevas inestabilidades se han identificado en el flujo hipersónico a Mach 7 alrededor de un cono elíptico que modela el vehículo de pruebas en vuelo HIFiRE-5. Los resultados comparan favorablemente con experimentos en vuelo, lo que subraya aún más el potencial de las metodologías de análisis de estabilidad desarrolladas en esta tesis. ABSTRACT The present thesis constitutes a step forward in advancing the frontiers of knowledge of fluid flow instability from a physical point of view, as a consequence of having been successful in developing groundbreaking methodologies for the efficient and accurate computation of the leading part of the spectrum pertinent to multi-dimensional eigenvalue problems (EVP) governing instability of flows with two or three inhomogeneous spatial directions. In the context of the numerical work presented in this thesis, the discretization of the spatial operator resulting from linearization of the Navier-Stokes equations around flows with two or three inhomogeneous spatial directions by variable-high-order stable finite-difference methods has permitted a speedup of four orders of magnitude in the solution of the corresponding two- and three-dimensional EVPs. This improvement of numerical performance has been achieved thanks to the high-sparsity level offered by the high-order finite-difference schemes employed for the discretization of the operators. This permitted use of efficient sparse linear algebra techniques without sacrificing accuracy and, consequently, solutions being obtained on typical workstations, as opposed to the previously employed supercomputers. Besides solution of the two- and three-dimensional EVPs of global linear instability, this development paved the way for the extension of the (linear and nonlinear) Parabolized Stability Equations (PSE) to analyze instability of flows which depend in a strongly-coupled inhomogeneous manner on two spatial directions and weakly on the third. Precisely the extensibility of the novel PSE-3D algorithm developed in the framework of the present thesis to study nonlinear flow instability permits transition prediction in flows of industrial interest, thus extending the classic PSE concept which has been successfully employed in the same context to boundary-layer type of flows over the last three decades. Typical examples of incompressible flows, the instability of which was analyzed in the present thesis without the need to resort to the restrictive assumptions used in the past, range from isolated vortices, and systems thereof, in which axial homogeneity is relaxed to consider viscous diffusion, as well as turbulent swirling jets, the instability of which is exploited in order to improve flame-holding properties of combustors. The instability of compressible subsonic and supersonic leading edge flows has been solved, and the wake of an isolated roughness element in a supersonic and hypersonic boundary-layer has also been analyzed with respect to its instability: excellent agreement with direct numerical simulation results has been obtained in all cases. Finally, instability analysis of Mach number 7 ow around an elliptic cone modeling the HIFiRE-5 flight test vehicle has unraveled flow instabilities near the minor-axis centerline, results comparing favorably with flight test predictions.