864 resultados para High performance processors


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Graphics Processing Units (GPUs) are becoming popular accelerators in modern High-Performance Computing (HPC) clusters. Installing GPUs on each node of the cluster is not efficient resulting in high costs and power consumption as well as underutilisation of the accelerator. The research reported in this paper is motivated towards the use of few physical GPUs by providing cluster nodes access to remote GPUs on-demand for a financial risk application. We hypothesise that sharing GPUs between several nodes, referred to as multi-tenancy, reduces the execution time and energy consumed by an application. Two data transfer modes between the CPU and the GPUs, namely concurrent and sequential, are explored. The key result from the experiments is that multi-tenancy with few physical GPUs using sequential data transfers lowers the execution time and the energy consumed, thereby improving the overall performance of the application.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This talk explores how the runtime system and operating system can leverage metrics that express the significance and resilience of application components in order to reduce the energy footprint of parallel applications. We will explore in particular how software can tolerate and indeed exploit higher error rates in future processors and memory technologies that may operate outside their safe margins.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper presents a novel high symmetry balun which significantly improves the performance of dipole-based dual-polarized antennas. The new balun structure provides enhanced differential capability leading to high performance in terms of port-to-port isolation and far-field cross polarization. An example antenna using this balun is proposed. The simulated results show 53.5% of fractional bandwidth within the band 1.71−2.96 GHz (VSWR<1.5) and port-to-port isolation >59 dB. The radiation characteristic shows around 9 dBi of gain and far-field cross polarization <−48 dBi over the entire bandwidth. The detailed balun functioning and full antenna measurements will be presented during the conference. Performance comparison with similar structures will be also provided.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Through the awareness-raising efforts of several high-profile current and former athletes, the issue of common mental disorders (CMD) in this population is gaining increasing attention from researchers and practitioners alike. Yet the prevalence is unclear and most likely, under-reported. Whilst the characteristics of the sporting environment may generate CMD within the athletic population, it also may exacerbate pre-existing conditions, and hence it is not surprising that sport psychology and sport science practitioners are anecdotally reporting increased incidences of athletes seeking support for CMDs. In a population where there are many barriers to reporting and seeking help for CMD, due in part to the culture of the high performance sporting environment, anecdotal reports suggest that those athletes asking for help are approaching personnel who they are most comfortable talking to. In some cases, this may be a sport scientist, the sport psychologist or sport psychology consultant. Among personnel in the sporting domain, there is a perception that the sport psychologist or sport psychology consultant is best placed to assist athletes seeking assistance for CMD. However, sport psychology as a profession is split by two competing philosophical perspectives; one of which suggests that sport psychologists should work exclusively with athletes on performance enhancement, and the other views the athlete more holistically and accepts that their welfare may directly impact on their performance. To add further complication, the development of the profession of sport psychology varies widely between countries, meaning that practice in this field is not always clearly defined. This article examines case studies that illustrate the blurred lines in applied sport psychology practice, highlighting challenges with the process of referral in the U.K. athletic population. The article concludes with suggestions for ensuring the field of applied sport psychology is continually evolving and reconfiguring to ensure that it continues to meet the demands of its clients.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Unstructured mesh based codes for the modelling of continuum physics phenomena have evolved to provide the facility to model complex interacting systems. Such codes have the potential to provide a high performance on parallel platforms for a small investment in programming. The critical parameters for success are to minimise changes to the code to allow for maintenance while providing high parallel efficiency, scalability to large numbers of processors and portability to a wide range of platforms. The paradigm of domain decomposition with message passing has for some time been demonstrated to provide a high level of efficiency, scalability and portability across shared and distributed memory systems without the need to re-author the code into a new language. This paper addresses these issues in the parallelisation of a complex three dimensional unstructured mesh Finite Volume multiphysics code and discusses the implications of automating the parallelisation process.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In parallel adaptive finite element simulations the work load on the individual processors may change frequently. To (re)distribute the load evenly over the processors a load balancing heuristic is needed. Common strategies try to minimise subdomain dependencies by optimising the cutsize of the partitioning. However for certain solvers cutsize only plays a minor role, and their convergence is highly dependent on the subdomain shapes. Degenerated subdomain shapes cause them to need significantly more iterations to converge. In this work a new parallel load balancing strategy is introduced which directly addresses the problem of generating and conserving reasonably good subdomain shapes in a dynamically changing Finite Element Simulation. Geometric data is used to formulate several cost functions to rate elements in terms of their suitability to be migrated. The well known diffusive method which calculates the necessary load flow is enhanced by weighting the subdomain edges with the help of these cost functions. The proposed methods have been tested and results are presented.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A constante evolução da tecnologia disponibilizou, atualmente, ferramentas computacionais que eram apenas expectativas há 10 anos atrás. O aumento do potencial computacional aplicado a modelos numéricos que simulam a atmosfera permitiu ampliar o estudo de fenômenos atmosféricos, através do uso de ferramentas de computação de alto desempenho. O trabalho propôs o desenvolvimento de algoritmos com base em arquiteturas SIMT e aplicação de técnicas de paralelismo com uso da ferramenta OpenACC para processamento de dados de previsão numérica do modelo Weather Research and Forecast. Esta proposta tem forte conotação interdisciplinar, buscando a interação entre as áreas de modelagem atmosférica e computação científica. Foram testadas a influência da computação do cálculo de microfísica de nuvens na degradação temporal do modelo. Como a entrada de dados para execução na GPU não era suficientemente grande, o tempo necessário para transferir dados da CPU para a GPU foi maior do que a execução da computação na CPU. Outro fator determinante foi a adição de código CUDA dentro de um contexto MPI, causando assim condições de disputa de recursos entre os processadores, mais uma vez degradando o tempo de execução. A proposta do uso de diretivas para aplicar computação de alto desempenho em uma estrutura CUDA parece muito promissora, mas ainda precisa ser utilizada com muita cautela a fim de produzir bons resultados. A construção de um híbrido MPI + CUDA foi testada, mas os resultados não foram conclusivos.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The high performance computing community has traditionally focused uniquely on the reduction of execution time, though in the last years, the optimization of energy consumption has become a main issue. A reduction of energy usage without a degradation of performance requires the adoption of energy-efficient hardware platforms accompanied by the development of energy-aware algorithms and computational kernels. The solution of linear systems is a key operation for many scientific and engineering problems. Its relevance has motivated an important amount of work, and consequently, it is possible to find high performance solvers for a wide variety of hardware platforms. In this work, we aim to develop a high performance and energy-efficient linear system solver. In particular, we develop two solvers for a low-power CPU-GPU platform, the NVIDIA Jetson TK1. These solvers implement the Gauss-Huard algorithm yielding an efficient usage of the target hardware as well as an efficient memory access. The experimental evaluation shows that the novel proposal reports important savings in both time and energy-consumption when compared with the state-of-the-art solvers of the platform.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Advances in FPGA technology and higher processing capabilities requirements have pushed to the emerge of All Programmable Systems-on-Chip, which incorporate a hard designed processing system and a programmable logic that enable the development of specialized computer systems for a wide range of practical applications, including data and signal processing, high performance computing, embedded systems, among many others. To give place to an infrastructure that is capable of using the benefits of such a reconfigurable system, the main goal of the thesis is to implement an infrastructure composed of hardware, software and network resources, that incorporates the necessary services for the operation, management and interface of peripherals, that coompose the basic building blocks for the execution of applications. The project will be developed using a chip from the Zynq-7000 All Programmable Systems-on-Chip family.