833 resultados para Parallel computers


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Magdeburg, Univ., Fak. für Verfahrens- und Systemtechnik, Diss., 2012

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Numerical weather prediction and climate simulation have been among the computationally most demanding applications of high performance computing eversince they were started in the 1950's. Since the 1980's, the most powerful computers have featured an ever larger number of processors. By the early 2000's, this number is often several thousand. An operational weather model must use all these processors in a highly coordinated fashion. The critical resource in running such models is not computation, but the amount of necessary communication between the processors. The communication capacity of parallel computers often fallsfar short of their computational power. The articles in this thesis cover fourteen years of research into how to harness thousands of processors on a single weather forecast or climate simulation, so that the application can benefit as much as possible from the power of parallel high performance computers. The resultsattained in these articles have already been widely applied, so that currently most of the organizations that carry out global weather forecasting or climate simulation anywhere in the world use methods introduced in them. Some further studies extend parallelization opportunities into other parts of the weather forecasting environment, in particular to data assimilation of satellite observations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In models of complicated physical-chemical processes operator splitting is very often applied in order to achieve sufficient accuracy as well as efficiency of the numerical solution. The recently rediscovered weighted splitting schemes have the great advantage of being parallelizable on operator level, which allows us to reduce the computational time if parallel computers are used. In this paper, the computational times needed for the weighted splitting methods are studied in comparison with the sequential (S) splitting and the Marchuk-Strang (MSt) splitting and are illustrated by numerical experiments performed by use of simplified versions of the Danish Eulerian model (DEM).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Includes bibliographical references.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract not available

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Face à estagnação da tecnologia uniprocessador registada na passada década, aos principais fabricantes de microprocessadores encontraram na tecnologia multi-core a resposta `as crescentes necessidades de processamento do mercado. Durante anos, os desenvolvedores de software viram as suas aplicações acompanhar os ganhos de performance conferidos por cada nova geração de processadores sequenciais, mas `a medida que a capacidade de processamento escala em função do número de processadores, a computação sequencial tem de ser decomposta em várias partes concorrentes que possam executar em paralelo, para que possam utilizar as unidades de processamento adicionais e completar mais rapidamente. A programação paralela implica um paradigma completamente distinto da programação sequencial. Ao contrário dos computadores sequenciais tipificados no modelo de Von Neumann, a heterogeneidade de arquiteturas paralelas requer modelos de programação paralela que abstraiam os programadores dos detalhes da arquitectura e simplifiquem o desenvolvimento de aplicações concorrentes. Os modelos de programação paralela mais populares incitam os programadores a identificar instruções concorrentes na sua lógica de programação, e a especificá-las sob a forma de tarefas que possam ser atribuídas a processadores distintos para executarem em simultâneo. Estas tarefas são tipicamente lançadas durante a execução, e atribuídas aos processadores pelo motor de execução subjacente. Como os requisitos de processamento costumam ser variáveis, e não são conhecidos a priori, o mapeamento de tarefas para processadores tem de ser determinado dinamicamente, em resposta a alterações imprevisíveis dos requisitos de execução. `A medida que o volume da computação cresce, torna-se cada vez menos viável garantir as suas restrições temporais em plataformas uniprocessador. Enquanto os sistemas de tempo real se começam a adaptar ao paradigma de computação paralela, há uma crescente aposta em integrar execuções de tempo real com aplicações interativas no mesmo hardware, num mundo em que a tecnologia se torna cada vez mais pequena, leve, ubíqua, e portável. Esta integração requer soluções de escalonamento que simultaneamente garantam os requisitos temporais das tarefas de tempo real e mantenham um nível aceitável de QoS para as restantes execuções. Para tal, torna-se imperativo que as aplicações de tempo real paralelizem, de forma a minimizar os seus tempos de resposta e maximizar a utilização dos recursos de processamento. Isto introduz uma nova dimensão ao problema do escalonamento, que tem de responder de forma correcta a novos requisitos de execução imprevisíveis e rapidamente conjeturar o mapeamento de tarefas que melhor beneficie os critérios de performance do sistema. A técnica de escalonamento baseado em servidores permite reservar uma fração da capacidade de processamento para a execução de tarefas de tempo real, e assegurar que os efeitos de latência na sua execução não afectam as reservas estipuladas para outras execuções. No caso de tarefas escalonadas pelo tempo de execução máximo, ou tarefas com tempos de execução variáveis, torna-se provável que a largura de banda estipulada não seja consumida por completo. Para melhorar a utilização do sistema, os algoritmos de partilha de largura de banda (capacity-sharing) doam a capacidade não utilizada para a execução de outras tarefas, mantendo as garantias de isolamento entre servidores. Com eficiência comprovada em termos de espaço, tempo, e comunicação, o mecanismo de work-stealing tem vindo a ganhar popularidade como metodologia para o escalonamento de tarefas com paralelismo dinâmico e irregular. O algoritmo p-CSWS combina escalonamento baseado em servidores com capacity-sharing e work-stealing para cobrir as necessidades de escalonamento dos sistemas abertos de tempo real. Enquanto o escalonamento em servidores permite partilhar os recursos de processamento sem interferências a nível dos atrasos, uma nova política de work-stealing que opera sobre o mecanismo de capacity-sharing aplica uma exploração de paralelismo que melhora os tempos de resposta das aplicações e melhora a utilização do sistema. Esta tese propõe uma implementação do algoritmo p-CSWS para o Linux. Em concordância com a estrutura modular do escalonador do Linux, ´e definida uma nova classe de escalonamento que visa avaliar a aplicabilidade da heurística p-CSWS em circunstâncias reais. Ultrapassados os obstáculos intrínsecos `a programação da kernel do Linux, os extensos testes experimentais provam que o p-CSWS ´e mais do que um conceito teórico atrativo, e que a exploração heurística de paralelismo proposta pelo algoritmo beneficia os tempos de resposta das aplicações de tempo real, bem como a performance e eficiência da plataforma multiprocessador.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The Lattice Solid Model has been used successfully as a virtual laboratory to simulate fracturing of rocks, the dynamics of faults, earthquakes and gouge processes. However, results from those simulations show that in order to make the next step towards more realistic experiments it will be necessary to use models containing a significantly larger number of particles than current models. Thus, those simulations will require a greatly increased amount of computational resources. Whereas the computing power provided by single processors can be expected to increase according to Moore's law, i.e., to double every 18-24 months, parallel computers can provide significantly larger computing power today. In order to make this computing power available for the simulation of the microphysics of earthquakes, a parallel version of the Lattice Solid Model has been implemented. Benchmarks using large models with several millions of particles have shown that the parallel implementation of the Lattice Solid Model can achieve a high parallel-efficiency of about 80% for large numbers of processors on different computer architectures.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Experimental and theoretical studies have shown the importance of stochastic processes in genetic regulatory networks and cellular processes. Cellular networks and genetic circuits often involve small numbers of key proteins such as transcriptional factors and signaling proteins. In recent years stochastic models have been used successfully for studying noise in biological pathways, and stochastic modelling of biological systems has become a very important research field in computational biology. One of the challenge problems in this field is the reduction of the huge computing time in stochastic simulations. Based on the system of the mitogen-activated protein kinase cascade that is activated by epidermal growth factor, this work give a parallel implementation by using OpenMP and parallelism across the simulation. Special attention is paid to the independence of the generated random numbers in parallel computing, that is a key criterion for the success of stochastic simulations. Numerical results indicate that parallel computers can be used as an efficient tool for simulating the dynamics of large-scale genetic regulatory networks and cellular processes

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A large class of computational problems are characterised by frequent synchronisation, and computational requirements which change as a function of time. When such a problem is solved on a message passing multiprocessor machine [5], the combination of these characteristics leads to system performance which deteriorate in time. As the communication performance of parallel hardware steadily improves so load balance becomes a dominant factor in obtaining high parallel efficiency. Performance can be improved with periodic redistribution of computational load; however, redistribution can sometimes be very costly. We study the issue of deciding when to invoke a global load re-balancing mechanism. Such a decision policy must actively weigh the costs of remapping against the performance benefits, and should be general enough to apply automatically to a wide range of computations. This paper discusses a generic strategy for Dynamic Load Balancing (DLB) in unstructured mesh computational mechanics applications. The strategy is intended to handle varying levels of load changes throughout the run. The major issues involved in a generic dynamic load balancing scheme will be investigated together with techniques to automate the implementation of a dynamic load balancing mechanism within the Computer Aided Parallelisation Tools (CAPTools) environment, which is a semi-automatic tool for parallelisation of mesh based FORTRAN codes.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The difficulties encountered in implementing large scale CM codes on multiprocessor systems are now fairly well understood. Despite the claims of shared memory architecture manufacturers to provide effective parallelizing compilers, these have not proved to be adequate for large or complex programs. Significant programmer effort is usually required to achieve reasonable parallel efficiencies on significant numbers of processors. The paradigm of Single Program Multi Data (SPMD) domain decomposition with message passing, where each processor runs the same code on a subdomain of the problem, communicating through exchange of messages, has for some time been demonstrated to provide the required level of efficiency, scalability, and portability across both shared and distributed memory systems, without the need to re-author the code into a new language or even to support differing message passing implementations. Extension of the methods into three dimensions has been enabled through the engineering of PHYSICA, a framework for supporting 3D, unstructured mesh and continuum mechanics modeling. In PHYSICA, six inspectors are used. Part of the challenge for automation of parallelization is being able to prove the equivalence of inspectors so that they can be merged into as few as possible.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This chapter describes a parallel optimization technique that incorporates a distributed load-balancing algorithm and provides an extremely fast solution to the problem of load-balancing adaptive unstructured meshes. Moreover, a parallel graph contraction technique can be employed to enhance the partition quality and the resulting strategy outperforms or matches results from existing state-of-the-art static mesh partitioning algorithms. The strategy can also be applied to static partitioning problems. Dynamic procedures have been found to be much faster than static techniques, to provide partitions of similar or higher quality and, in comparison, involve the migration of a fraction of the data. The method employs a new iterative optimization technique that balances the workload and attempts to minimize the interprocessor communications overhead. Experiments on a series of adaptively refined meshes indicate that the algorithm provides partitions of an equivalent or higher quality to static partitioners (which do not reuse the existing partition) and much more quickly. The dynamic evolution of load has three major influences on possible partitioning techniques; cost, reuse, and parallelism. The unstructured mesh may be modified every few time-steps and so the load-balancing must have a low cost relative to that of the solution algorithm in between remeshing.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A method is outlined for optimising graph partitions which arise in mapping un- structured mesh calculations to parallel computers. The method employs a combination of iterative techniques to both evenly balance the workload and minimise the number and volume of interprocessor communications. They are designed to work efficiently in parallel as well as sequentially and when combined with a fast direct partitioning technique (such as the Greedy algorithm) to give an initial partition, the resulting two-stage process proves itself to be both a powerful and flexible solution to the static graph-partitioning problem. The algorithms can also be used for dynamic load-balancing and a clustering technique can additionally be employed to speed up the whole process. Experiments indicate that the resulting parallel code can provide high quality partitions, independent of the initial partition, within a few seconds.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A method is outlined for optimising graph partitions which arise in mapping unstructured mesh calculations to parallel computers. The method employs a relative gain iterative technique to both evenly balance the workload and minimise the number and volume of interprocessor communications. A parallel graph reduction technique is also briefly described and can be used to give a global perspective to the optimisation. The algorithms work efficiently in parallel as well as sequentially and when combined with a fast direct partitioning technique (such as the Greedy algorithm) to give an initial partition, the resulting two-stage process proves itself to be both a powerful and flexible solution to the static graph-partitioning problem. Experiments indicate that the resulting parallel code can provide high quality partitions, independent of the initial partition, within a few seconds. The algorithms can also be used for dynamic load-balancing, reusing existing partitions and in this case the procedures are much faster than static techniques, provide partitions of similar or higher quality and, in comparison, involve the migration of a fraction of the data.