58 resultados para Parallel design multicenter

em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Structured parallel programming is recognised as a viable and effective means of tackling parallel programming problems. Recently, a set of simple and powerful parallel building blocks RISC pb2l) has been proposed to support modelling and implementation of parallel frameworks. In this work we demonstrate how that same parallel building block set may be used to model both general purpose parallel programming abstractions, not usually listed in classical skeleton sets, and more specialized domain specific parallel patterns. We show how an implementation of RISC pb2 l can be realised via the FastFlow framework and present experimental evidence of the feasibility and efficiency of the approach.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper describes the ParaPhrase project, a new 3-year targeted research project funded under EU Framework 7 Objective 3.4 (Computer Systems), starting in October 2011. ParaPhrase aims to follow a new approach to introducing parallelism using advanced refactoring techniques coupled with high-level parallel design patterns. The refactoring approach will use these design patterns to restructure programs defined as networks of software components into other forms that are more suited to parallel execution. The programmer will be aided by high-level cost information that will be integrated into the refactoring tools. The implementation of these patterns will then use a well-understood algorithmic skeleton approach to achieve good parallelism. A key ParaPhrase design goal is that parallel components are intended to match heterogeneous architectures, defined in terms of CPU/GPU combinations, for example. In order to achieve this, the ParaPhrase approach will map components at link time to the available hardware, and will then re-map them during program execution, taking account of multiple applications, changes in hardware resource availability, the desire to reduce communication costs etc. In this way, we aim to develop a new approach to programming that will be able to produce software that can adapt to dynamic changes in the system environment. Moreover, by using a strong component basis for parallelism, we can achieve potentially significant gains in terms of reducing sharing at a high level of abstraction, and so in reducing or even eliminating the costs that are usually associated with cache management, locking, and synchronisation. © 2013 Springer-Verlag Berlin Heidelberg.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We propose a methodology for optimizing the execution of data parallel (sub-)tasks on CPU and GPU cores of the same heterogeneous architecture. The methodology is based on two main components: i) an analytical performance model for scheduling tasks among CPU and GPU cores, such that the global execution time of the overall data parallel pattern is optimized; and ii) an autonomic module which uses the analytical performance model to implement the data parallel computations in a completely autonomic way, requiring no programmer intervention to optimize the computation across CPU and GPU cores. The analytical performance model uses a small set of simple parameters to devise a partitioning-between CPU and GPU cores-of the tasks derived from structured data parallel patterns/algorithmic skeletons. The model takes into account both hardware related and application dependent parameters. It computes the percentage of tasks to be executed on CPU and GPU cores such that both kinds of cores are exploited and performance figures are optimized. The autonomic module, implemented in FastFlow, executes a generic map (reduce) data parallel pattern scheduling part of the tasks to the GPU and part to CPU cores so as to achieve optimal execution time. Experimental results on state-of-the-art CPU/GPU architectures are shown that assess both performance model properties and autonomic module effectiveness. © 2013 IEEE.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We introduce a new parallel pattern derived from a specific application domain and show how it turns out to have application beyond its domain of origin. The pool evolution pattern models the parallel evolution of a population subject to mutations and evolving in such a way that a given fitness function is optimized. The pattern has been demonstrated to be suitable for capturing and modeling the parallel patterns underpinning various evolutionary algorithms, as well as other parallel patterns typical of symbolic computation. In this paper we introduce the pattern, we discuss its implementation on modern multi/many core architectures and finally present experimental results obtained with FastFlow and Erlang implementations to assess its feasibility and scalability.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Structured parallel programming, and in particular programming models using the algorithmic skeleton or parallel design pattern concepts, are increasingly considered to be the only viable means of supporting effective development of scalable and efficient parallel programs. Structured parallel programming models have been assessed in a number of works in the context of performance. In this paper we consider how the use of structured parallel programming models allows knowledge of the parallel patterns present to be harnessed to address both performance and energy consumption. We consider different features of structured parallel programming that may be leveraged to impact the performance/energy trade-off and we discuss a preliminary set of experiments validating our claims.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We propose a data flow based run time system as an efficient tool for supporting execution of parallel code on heterogeneous architectures hosting both multicore CPUs and GPUs. We discuss how the proposed run time system may be the target of both structured parallel applications developed using algorithmic skeletons/parallel design patterns and also more "domain specific" programming models. Experimental results demonstrating the feasibility of the approach are presented. © 2012 World Scientific Publishing Company.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

To assess the efficacy of trauma-focused cognitive behavioral therapy (TF-CBT) delivered by nonclinical facilitators in reducing posttraumatic stress, depression, and anxiety and conduct problems and increasing prosocial behavior in a group of war-affected, sexually exploited girls in a single-blind, parallel-design, randomized,+ controlled trial.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Fully Homomorphic Encryption (FHE) is a recently developed cryptographic technique which allows computations on encrypted data. There are many interesting applications for this encryption method, especially within cloud computing. However, the computational complexity is such that it is not yet practical for real-time applications. This work proposes optimised hardware architectures of the encryption step of an integer-based FHE scheme with the aim of improving its practicality. A low-area design and a high-speed parallel design are proposed and implemented on a Xilinx Virtex-7 FPGA, targeting the available DSP slices, which offer high-speed multiplication and accumulation. Both use the Comba multiplication scheduling method to manage the large multiplications required with uneven sized multiplicands and to minimise the number of read and write operations to RAM. Results show that speed up factors of 3.6 and 10.4 can be achieved for the encryption step with medium-sized security parameters for the low-area and parallel designs respectively, compared to the benchmark software implementation on an Intel Core2 Duo E8400 platform running at 3 GHz.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A new three-limb, six-degree-of-freedom (DOF) parallel manipulator (PM), termed a selectively actuated PM (SA-PM), is proposed. The end-effector of the manipulator can produce 3-DOF spherical motion, 3-DOF translation, 3-DOF hybrid motion, or complete 6-DOF spatial motion, depending on the types of the actuation (rotary or linear) chosen for the actuators. The manipulator architecture completely decouples translation and rotation of the end-effector for individual control. The structure synthesis of SA-PM is achieved using the line geometry. Singularity analysis shows that the SA-PM is an isotropic translation PM when all the actuators are in linear mode. Because of the decoupled motion structure, a decomposition method is applied for both the displacement analysis and dimension optimization. With the index of maximal workspace satisfying given global conditioning requirements, the geometrical parameters are optimized. As a result, the translational workspace is a cube, and the orientation workspace is nearly unlimited.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper presents criteria for the design of a flow distributor for even distribution of gas and liquid flows over parallel microchannels. The design criteria are illustrated for the case of a nitrogen-water Taylor flow (1

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Test procedures for a pipelined bit-parallel IIR filter chip which maximally exploit its regularity are described. It is shown that small modifications to the basic architecture result in significant reductions in the number of test patterns required to test such chips. The methods used allow 100% fault coverage to be achieved using less than 1000 test vectors for a chip which has 12 bit data and coefficients.