872 resultados para High Performance Computing
Resumo:
Numerical weather prediction and climate simulation have been among the computationally most demanding applications of high performance computing eversince they were started in the 1950's. Since the 1980's, the most powerful computers have featured an ever larger number of processors. By the early 2000's, this number is often several thousand. An operational weather model must use all these processors in a highly coordinated fashion. The critical resource in running such models is not computation, but the amount of necessary communication between the processors. The communication capacity of parallel computers often fallsfar short of their computational power. The articles in this thesis cover fourteen years of research into how to harness thousands of processors on a single weather forecast or climate simulation, so that the application can benefit as much as possible from the power of parallel high performance computers. The resultsattained in these articles have already been widely applied, so that currently most of the organizations that carry out global weather forecasting or climate simulation anywhere in the world use methods introduced in them. Some further studies extend parallelization opportunities into other parts of the weather forecasting environment, in particular to data assimilation of satellite observations.
Resumo:
The purpose of this work is to demonstrate the usefulness of low cost high performance computers. It is presented technics and software packages used by computational chemists. Access to high-performance computing power remains crucial for many computational quantum chemistry. So, this work introduces the concept of PC cluster, an economical computing plataform.
Resumo:
Beowulf cluster is perhaps the cheapest way to construct a high performance computers. The strategy of reaching a high power computing isn't so difficult, but buying and configurating should be done carefully. Technical aspects of hardware components and message-passing libraries are considered, with some results given as examples.
Resumo:
Cloud computing enables on-demand network access to shared resources (e.g., computation, networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort. Cloud computing refers to both the applications delivered as services over the Internet and the hardware and system software in the data centers. Software as a service (SaaS) is part of cloud computing. It is one of the cloud service models. SaaS is software deployed as a hosted service and accessed over the Internet. In SaaS, the consumer uses the provider‘s applications running in the cloud. SaaS separates the possession and ownership of software from its use. The applications can be accessed from any device through a thin client interface. A typical SaaS application is used with a web browser based on monthly pricing. In this thesis, the characteristics of cloud computing and SaaS are presented. Also, a few implementation platforms for SaaS are discussed. Then, four different SaaS implementation cases and one transformation case are deliberated. The pros and cons of SaaS are studied. This is done based on literature references and analysis of the SaaS implementations and the transformation case. The analysis is done both from the customer‘s and service provider‘s point of view. In addition, the pros and cons of on-premises software are listed. The purpose of this thesis is to find when SaaS should be utilized and when it is better to choose a traditional on-premises software. The qualities of SaaS bring many benefits both for the customer as well as the provider. A customer should utilize SaaS when it provides cost savings, ease, and scalability over on-premises software. SaaS is reasonable when the customer does not need tailoring, but he only needs a simple, general-purpose service, and the application supports customer‘s core business. A provider should utilize SaaS when it offers cost savings, scalability, faster development, and wider customer base over on-premises software. It is wise to choose SaaS when the application is cheap, aimed at mass market, needs frequent updating, needs high performance computing, needs storing large amounts of data, or there is some other direct value from the cloud infrastructure.
Resumo:
This work investigates mathematical details and computational aspects of Metropolis-Hastings reptation quantum Monte Carlo and its variants, in addition to the Bounce method and its variants. The issues that concern us include the sensitivity of these algorithms' target densities to the position of the trial electron density along the reptile, time-reversal symmetry of the propagators, and the length of the reptile. We calculate the ground-state energy and one-electron properties of LiH at its equilibrium geometry for all these algorithms. The importance sampling is performed with a single-determinant large Slater-type orbitals (STO) basis set. The computer codes were written to exploit the efficiencies engineered into modern, high-performance computing software. Using the Bounce method in the calculation of non-energy-related properties, those represented by operators that do not commute with the Hamiltonian, is a novel work. We found that the unmodified Bounce gives good ground state energy and very good one-electron properties. We attribute this to its favourable time-reversal symmetry in its target density's Green's functions. Breaking this symmetry gives poorer results. Use of a short reptile in the Bounce method does not alter the quality of the results. This suggests that in future applications one can use a shorter reptile to cut down the computational time dramatically.
Resumo:
The process of developing software that takes advantage of multiple processors is commonly referred to as parallel programming. For various reasons, this process is much harder than the sequential case. For decades, parallel programming has been a problem for a small niche only: engineers working on parallelizing mostly numerical applications in High Performance Computing. This has changed with the advent of multi-core processors in mainstream computer architectures. Parallel programming in our days becomes a problem for a much larger group of developers. The main objective of this thesis was to find ways to make parallel programming easier for them. Different aims were identified in order to reach the objective: research the state of the art of parallel programming today, improve the education of software developers about the topic, and provide programmers with powerful abstractions to make their work easier. To reach these aims, several key steps were taken. To start with, a survey was conducted among parallel programmers to find out about the state of the art. More than 250 people participated, yielding results about the parallel programming systems and languages in use, as well as about common problems with these systems. Furthermore, a study was conducted in university classes on parallel programming. It resulted in a list of frequently made mistakes that were analyzed and used to create a programmers' checklist to avoid them in the future. For programmers' education, an online resource was setup to collect experiences and knowledge in the field of parallel programming - called the Parawiki. Another key step in this direction was the creation of the Thinking Parallel weblog, where more than 50.000 readers to date have read essays on the topic. For the third aim (powerful abstractions), it was decided to concentrate on one parallel programming system: OpenMP. Its ease of use and high level of abstraction were the most important reasons for this decision. Two different research directions were pursued. The first one resulted in a parallel library called AthenaMP. It contains so-called generic components, derived from design patterns for parallel programming. These include functionality to enhance the locks provided by OpenMP, to perform operations on large amounts of data (data-parallel programming), and to enable the implementation of irregular algorithms using task pools. AthenaMP itself serves a triple role: the components are well-documented and can be used directly in programs, it enables developers to study the source code and learn from it, and it is possible for compiler writers to use it as a testing ground for their OpenMP compilers. The second research direction was targeted at changing the OpenMP specification to make the system more powerful. The main contributions here were a proposal to enable thread-cancellation and a proposal to avoid busy waiting. Both were implemented in a research compiler, shown to be useful in example applications, and proposed to the OpenMP Language Committee.
Resumo:
We consider the often-studied problem of sorting, for a parallel computer. Given an input array distributed evenly over p processors, the task is to compute the sorted output array, also distributed over the p processors. Many existing algorithms take the approach of approximately load-balancing the output, leaving each processor with Θ(n/p) elements. However, in many cases, approximate load-balancing leads to inefficiencies in both the sorting itself and in further uses of the data after sorting. We provide a deterministic parallel sorting algorithm that uses parallel selection to produce any output distribution exactly, particularly one that is perfectly load-balanced. Furthermore, when using a comparison sort, this algorithm is 1-optimal in both computation and communication. We provide an empirical study that illustrates the efficiency of exact data splitting, and shows an improvement over two sample sort algorithms.
Resumo:
FAMOUS is an ocean-atmosphere general circulation model of low resolution, capable of simulating approximately 120 years of model climate per wallclock day using current high performance computing facilities. It uses most of the same code as HadCM3, a widely used climate model of higher resolution and computational cost, and has been tuned to reproduce the same climate reasonably well. FAMOUS is useful for climate simulations where the computational cost makes the application of HadCM3 unfeasible, either because of the length of simulation or the size of the ensemble desired. We document a number of scientific and technical improvements to the original version of FAMOUS. These improvements include changes to the parameterisations of ozone and sea-ice which alleviate a significant cold bias from high northern latitudes and the upper troposphere, and the elimination of volume-averaged drifts in ocean tracers. A simple model of the marine carbon cycle has also been included. A particular goal of FAMOUS is to conduct millennial-scale paleoclimate simulations of Quaternary ice ages; to this end, a number of useful changes to the model infrastructure have been made.
Resumo:
Europe's widely distributed climate modelling expertise, now organized in the European Network for Earth System Modelling (ENES), is both a strength and a challenge. Recognizing this, the European Union's Program for Integrated Earth System Modelling (PRISM) infrastructure project aims at designing a flexible and friendly user environment to assemble, run and post-process Earth System models. PRISM was started in December 2001 with a duration of three years. This paper presents the major stages of PRISM, including: (1) the definition and promotion of scientific and technical standards to increase component modularity; (2) the development of an end-to-end software environment (graphical user interface, coupling and I/O system, diagnostics, visualization) to launch, monitor and analyse complex Earth system models built around state-of-art community component models (atmosphere, ocean, atmospheric chemistry, ocean bio-chemistry, sea-ice, land-surface); and (3) testing and quality standards to ensure high-performance computing performance on a variety of platforms. PRISM is emerging as a core strategic software infrastructure for building the European research area in Earth system sciences. Copyright (c) 2005 John Wiley & Sons, Ltd.
Resumo:
Three naming strategies are discussed that allow the processes of a distributed application to continue being addressed by their original logical name, along all the migrations they may be forced to undertake because of performance-improvement goals. A simple centralised solution is firstly discussed which showed a software bottleneck with the increase of the number of processes; other two solutions are considered that entail different communication schemes and different communication overheads for the naming protocol. All these strategies are based on the facility that each process is allowed to survive after migration, even in its original site, only to provide a forwarding service to those communications that used its obsolete address.
Resumo:
A full assessment of para-virtualization is important, because without knowledge about the various overheads, users can not understand whether using virtualization is a good idea or not. In this paper we are very interested in assessing the overheads of running various benchmarks on bare-‐metal, as well as on para-‐virtualization. The idea is to see what the overheads of para-‐ virtualization are, as well as looking at the overheads of turning on monitoring and logging. The knowledge from assessing various benchmarks on these different systems will help a range of users understand the use of virtualization systems. In this paper we assess the overheads of using Xen, VMware, KVM and Citrix, see Table 1. These different virtualization systems are used extensively by cloud-‐users. We are using various Netlib1 benchmarks, which have been developed by the University of Tennessee at Knoxville (UTK), and Oak Ridge National Laboratory (ORNL). In order to assess these virtualization systems, we run the benchmarks on bare-‐metal, then on the para-‐virtualization, and finally we turn on monitoring and logging. The later is important as users are interested in Service Level Agreements (SLAs) used by the Cloud providers, and the use of logging is a means of assessing the services bought and used from commercial providers. In this paper we assess the virtualization systems on three different systems. We use the Thamesblue supercomputer, the Hactar cluster and IBM JS20 blade server (see Table 2), which are all servers available at the University of Reading. A functional virtualization system is multi-‐layered and is driven by the privileged components. Virtualization systems can host multiple guest operating systems, which run on its own domain, and the system schedules virtual CPUs and memory within each Virtual Machines (VM) to make the best use of the available resources. The guest-‐operating system schedules each application accordingly. You can deploy virtualization as full virtualization or para-‐virtualization. Full virtualization provides a total abstraction of the underlying physical system and creates a new virtual system, where the guest operating systems can run. No modifications are needed in the guest OS or application, e.g. the guest OS or application is not aware of the virtualized environment and runs normally. Para-‐virualization requires user modification of the guest operating systems, which runs on the virtual machines, e.g. these guest operating systems are aware that they are running on a virtual machine, and provide near-‐native performance. You can deploy both para-‐virtualization and full virtualization across various virtualized systems. Para-‐virtualization is an OS-‐assisted virtualization; where some modifications are made in the guest operating system to enable better performance. In this kind of virtualization, the guest operating system is aware of the fact that it is running on the virtualized hardware and not on the bare hardware. In para-‐virtualization, the device drivers in the guest operating system coordinate the device drivers of host operating system and reduce the performance overheads. The use of para-‐virtualization [0] is intended to avoid the bottleneck associated with slow hardware interrupts that exist when full virtualization is employed. It has revealed [0] that para-‐ virtualization does not impose significant performance overhead in high performance computing, and this in turn this has implications for the use of cloud computing for hosting HPC applications. The “apparent” improvement in virtualization has led us to formulate the hypothesis that certain classes of HPC applications should be able to execute in a cloud environment, with minimal performance degradation. In order to support this hypothesis, first it is necessary to define exactly what is meant by a “class” of application, and secondly it will be necessary to observe application performance, both within a virtual machine and when executing on bare hardware. A further potential complication is associated with the need for Cloud service providers to support Service Level Agreements (SLA), so that system utilisation can be audited.
Resumo:
This paper presents a paralleled Two-Pass Hexagonal (TPA) algorithm constituted by Linear Hashtable Motion Estimation Algorithm (LHMEA) and Hexagonal Search (HEXBS) for motion estimation. In the TPA, Motion Vectors (MV) are generated from the first-pass LHMEA and are used as predictors for second-pass HEXBS motion estimation, which only searches a small number of Macroblocks (MBs). We introduced hashtable into video processing and completed parallel implementation. We propose and evaluate parallel implementations of the LHMEA of TPA on clusters of workstations for real time video compression. It discusses how parallel video coding on load balanced multiprocessor systems can help, especially on motion estimation. The effect of load balancing for improved performance is discussed. The performance of the algorithm is evaluated by using standard video sequences and the results are compared to current algorithms.