41 resultados para Supercomputers
Resumo:
This paper focuses on the parallelization of an ocean model applying current multicore processor-based cluster architectures to an irregular computational mesh. The aim is to maximize the efficiency of the computational resources used. To make the best use of the resources offered by these architectures, this parallelization has been addressed at all the hardware levels of modern supercomputers: firstly, exploiting the internal parallelism of the CPU through vectorization; secondly, taking advantage of the multiple cores of each node using OpenMP; and finally, using the cluster nodes to distribute the computational mesh, using MPI for communication within the nodes. The speedup obtained with each parallelization technique as well as the combined overall speedup have been measured for the western Mediterranean Sea for different cluster configurations, achieving a speedup factor of 73.3 using 256 processors. The results also show the efficiency achieved in the different cluster nodes and the advantages obtained by combining OpenMP and MPI versus using only OpenMP or MPI. Finally, the scalability of the model has been analysed by examining computation and communication times as well as the communication and synchronization overhead due to parallelization.
Resumo:
Protein folding occurs on a time scale ranging from milliseconds to minutes for a majority of proteins. Computer simulation of protein folding, from a random configuration to the native structure, is nontrivial owing to the large disparity between the simulation and folding time scales. As an effort to overcome this limitation, simple models with idealized protein subdomains, e.g., the diffusion–collision model of Karplus and Weaver, have gained some popularity. We present here new results for the folding of a four-helix bundle within the framework of the diffusion–collision model. Even with such simplifying assumptions, a direct application of standard Brownian dynamics methods would consume 10,000 processor-years on current supercomputers. We circumvent this difficulty by invoking a special Brownian dynamics simulation. The method features the calculation of the mean passage time of an event from the flux overpopulation method and the sampling of events that lead to productive collisions even if their probability is extremely small (because of large free-energy barriers that separate them from the higher probability events). Using these developments, we demonstrate that a coarse-grained model of the four-helix bundle can be simulated in several days on current supercomputers. Furthermore, such simulations yield folding times that are in the range of time scales observed in experiments.
Resumo:
In this paper, parallel Relaxed and Extrapolated algorithms based on the Power method for accelerating the PageRank computation are presented. Different parallel implementations of the Power method and the proposed variants are analyzed using different data distribution strategies. The reported experiments show the behavior and effectiveness of the designed algorithms for realistic test data using either OpenMP, MPI or an hybrid OpenMP/MPI approach to exploit the benefits of shared memory inside the nodes of current SMP supercomputers.
Resumo:
Shipping list no.: 2000-0347-P.
Resumo:
Mode of access: Internet.
Resumo:
Climate change has a great impact on the build and the work of natural ecosystems. Disappearance of some population or growth of the number in some species can be already caused by little change in temperature. A Theoretical Ecosystem Growth Model was investigated in order to examine the effects of various climate patterns on the ecological equilibrium. The answers of the ecosystems which are given to the climate change could be described by means of global climate modelling and dynamic vegetation models. The examination of the operation of the ecosystems is only possible in huge centres on supercomputers because of the number and the complexity of the calculation. The number of the calculation could be decreased to the level of a PC by considering the temperature and the reproduction during the modelling of a theoretical ecosystem and several important theoretical questions could be answered.
Resumo:
Climate change has serious effects on the setting up and the operation of natural ecosystems. Small increase in temperature could cause rise in the amount of some species or potential disappearance of others. During our researches, the dispersion of the species and biomass production of a theoretical ecosystem were examined on the effect of the temperature–climate change. The answers of the ecosystems which are given to the climate change could be described by means of global climate modelling and dynamic vegetation models. The examination of the operation of the ecosystems is only possible in huge centres on supercomputers because of the number and the complexity of the calculation. The number of the calculation could be decreased to the level of a PC by considering the temperature and the reproduction during modelling a theoretical ecosystem, and several important theoretical questions could be answered.
Resumo:
Energy-efficient computing remains a critical challenge across the wide range of future data-processing engines — from ultra-low-power embedded systems to servers, mainframes, and supercomputers. In addition, the advent of cloud and mobile computing as well as the explosion of IoT technologies have created new research challenges in the already complex, multidimensional space of modern and future computer systems. These new research challenges led to the establishment of the IEEE Rebooting Computing Initiative, which specifically addresses novel low-power solutions and technologies as one of the main areas of concern.With this in mind, we thought it timely to survey the state of the art of energy-efficient computing.
Resumo:
The availability of CFD software that can easily be used and produce high efficiency on a wide range of parallel computers is extremely limited. The investment and expertise required to parallelise a code can be enormous. In addition, the cost of supercomputers forces high utilisation to justify their purchase, requiring a wide range of software. To break this impasse, tools are urgently required to assist in the parallelisation process that dramatically reduce the parallelisation time but do not degrade the performance of the resulting parallel software. In this paper we discuss enhancements to the Computer Aided Parallelisation Tools (CAPTools) to assist in the parallelisation of complex unstructured mesh-based computational mechanics codes.
Resumo:
Due to the growth of design size and complexity, design verification is an important aspect of the Logic Circuit development process. The purpose of verification is to validate that the design meets the system requirements and specification. This is done by either functional or formal verification. The most popular approach to functional verification is the use of simulation based techniques. Using models to replicate the behaviour of an actual system is called simulation. In this thesis, a software/data structure architecture without explicit locks is proposed to accelerate logic gate circuit simulation. We call thus system ZSIM. The ZSIM software architecture simulator targets low cost SIMD multi-core machines. Its performance is evaluated on the Intel Xeon Phi and 2 other machines (Intel Xeon and AMD Opteron). The aim of these experiments is to: • Verify that the data structure used allows SIMD acceleration, particularly on machines with gather instructions ( section 5.3.1). • Verify that, on sufficiently large circuits, substantial gains could be made from multicore parallelism ( section 5.3.2 ). • Show that a simulator using this approach out-performs an existing commercial simulator on a standard workstation ( section 5.3.3 ). • Show that the performance on a cheap Xeon Phi card is competitive with results reported elsewhere on much more expensive super-computers ( section 5.3.5 ). To evaluate the ZSIM, two types of test circuits were used: 1. Circuits from the IWLS benchmark suit [1] which allow direct comparison with other published studies of parallel simulators.2. Circuits generated by a parametrised circuit synthesizer. The synthesizer used an algorithm that has been shown to generate circuits that are statistically representative of real logic circuits. The synthesizer allowed testing of a range of very large circuits, larger than the ones for which it was possible to obtain open source files. The experimental results show that with SIMD acceleration and multicore, ZSIM gained a peak parallelisation factor of 300 on Intel Xeon Phi and 11 on Intel Xeon. With only SIMD enabled, ZSIM achieved a maximum parallelistion gain of 10 on Intel Xeon Phi and 4 on Intel Xeon. Furthermore, it was shown that this software architecture simulator running on a SIMD machine is much faster than, and can handle much bigger circuits than a widely used commercial simulator (Xilinx) running on a workstation. The performance achieved by ZSIM was also compared with similar pre-existing work on logic simulation targeting GPUs and supercomputers. It was shown that ZSIM simulator running on a Xeon Phi machine gives comparable simulation performance to the IBM Blue Gene supercomputer at very much lower cost. The experimental results have shown that the Xeon Phi is competitive with simulation on GPUs and allows the handling of much larger circuits than have been reported for GPU simulation. When targeting Xeon Phi architecture, the automatic cache management of the Xeon Phi, handles and manages the on-chip local store without any explicit mention of the local store being made in the architecture of the simulator itself. However, targeting GPUs, explicit cache management in program increases the complexity of the software architecture. Furthermore, one of the strongest points of the ZSIM simulator is its portability. Note that the same code was tested on both AMD and Xeon Phi machines. The same architecture that efficiently performs on Xeon Phi, was ported into a 64 core NUMA AMD Opteron. To conclude, the two main achievements are restated as following: The primary achievement of this work was proving that the ZSIM architecture was faster than previously published logic simulators on low cost platforms. The secondary achievement was the development of a synthetic testing suite that went beyond the scale range that was previously publicly available, based on prior work that showed the synthesis technique is valid.
Resumo:
After a decade evolving in the High Performance Computing arena, GPU-equipped supercomputers have con- quered the top500 and green500 lists, providing us unprecedented levels of computational power and memory bandwidth. This year, major vendors have introduced new accelerators based on 3D memory, like Xeon Phi Knights Landing by Intel and Pascal architecture by Nvidia. This paper reviews hardware features of those new HPC accelerators and unveils potential performance for scientific applications, with an emphasis on Hybrid Memory Cube (HMC) and High Bandwidth Memory (HBM) used by commercial products according to roadmaps already announced.