4 resultados para Graphics processing unit programming
Resumo:
Graphics Processing Units (GPUs) are becoming popular accelerators in modern High-Performance Computing (HPC) clusters. Installing GPUs on each node of the cluster is not efficient resulting in high costs and power consumption as well as underutilisation of the accelerator. The research reported in this paper is motivated towards the use of few physical GPUs by providing cluster nodes access to remote GPUs on-demand for a financial risk application. We hypothesise that sharing GPUs between several nodes, referred to as multi-tenancy, reduces the execution time and energy consumed by an application. Two data transfer modes between the CPU and the GPUs, namely concurrent and sequential, are explored. The key result from the experiments is that multi-tenancy with few physical GPUs using sequential data transfers lowers the execution time and the energy consumed, thereby improving the overall performance of the application.
Resumo:
A major weakness among loading models for pedestrians walking on flexible structures proposed in recent years is the various uncorroborated assumptions made in their development. This applies to spatio-temporal characteristics of pedestrian loading and the nature of multi-object interactions. To alleviate this problem, a framework for the determination of localised pedestrian forces on full-scale structures is presented using a wireless attitude and heading reference systems (AHRS). An AHRS comprises a triad of tri-axial accelerometers, gyroscopes and magnetometers managed by a dedicated data processing unit, allowing motion in three-dimensional space to be reconstructed. A pedestrian loading model based on a single point inertial measurement from an AHRS is derived and shown to perform well against benchmark data collected on an instrumented treadmill. Unlike other models, the current model does not take any predefined form nor does it require any extrapolations as to the timing and amplitude of pedestrian loading. In order to assess correctly the influence of the moving pedestrian on behaviour of a structure, an algorithm for tracking the point of application of pedestrian force is developed based on data from a single AHRS attached to a foot. A set of controlled walking tests with a single pedestrian is conducted on a real footbridge for validation purposes. A remarkably good match between the measured and simulated bridge response is found, indeed confirming applicability of the proposed framework.
Resumo:
Structured parallel programming, and in particular programming models using the algorithmic skeleton or parallel design pattern concepts, are increasingly considered to be the only viable means of supporting effective development of scalable and efficient parallel programs. Structured parallel programming models have been assessed in a number of works in the context of performance. In this paper we consider how the use of structured parallel programming models allows knowledge of the parallel patterns present to be harnessed to address both performance and energy consumption. We consider different features of structured parallel programming that may be leveraged to impact the performance/energy trade-off and we discuss a preliminary set of experiments validating our claims.
Resumo:
Graph analytics is an important and computationally demanding class of data analytics. It is essential to balance scalability, ease-of-use and high performance in large scale graph analytics. As such, it is necessary to hide the complexity of parallelism, data distribution and memory locality behind an abstract interface. The aim of this work is to build a scalable graph analytics framework that does not demand significant parallel programming experience based on NUMA-awareness.
The realization of such a system faces two key problems:
(i)~how to develop a scale-free parallel programming framework that scales efficiently across NUMA domains; (ii)~how to efficiently apply graph partitioning in order to create separate and largely independent work items that can be distributed among threads.