8 resultados para Multi-cores heterogêneos

em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast


Relevância:

100.00% 100.00%

Publicador:

Resumo:

FastFlow is a programming framework specifically targeting cache-coherent shared-memory multi-cores. It is implemented as a stack of C++ template libraries built on top of lock-free (and memory fence free) synchronization mechanisms. Its philosophy is to combine programmability with performance. In this paper a new FastFlow programming methodology aimed at supporting parallelization of existing sequential code via offloading onto a dynamically created software accelerator is presented. The new methodology has been validated using a set of simple micro-benchmarks and some real applications. © 2011 Springer-Verlag.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a scalable, statistical ‘black-box’ model for predicting the performance of parallel programs on multi-core non-uniform memory access (NUMA) systems. We derive a model with low overhead, by reducing data collection and model training time. The model can accurately predict the behaviour of parallel applications in response to changes in their concurrency, thread layout on NUMA nodes, and core voltage and frequency. We present a framework that applies the model to achieve significant energy and energy-delay-square (ED2) savings (9% and 25%, respectively) along with performance improvement (10% mean) on an actual 16-core NUMA system running realistic application workloads. Our prediction model proves substantially more accurate than previous efforts.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The efficient development of multi-threaded software has, for many years, been an unsolved problem in computer science. Finding a solution to this problem has become urgent with the advent of multi-core processors. Furthermore, the problem has become more complicated because multi-cores are everywhere (desktop, laptop, embedded system). As such, they execute generic programs which exhibit very different characteristics than the scientific applications that have been the focus of parallel computing in the past.
Implicitly parallel programming is an approach to parallel pro- gramming that promises high productivity and efficiency and rules out synchronization errors and race conditions by design. There are two main ingredients to implicitly parallel programming: (i) a con- ventional sequential programming language that is extended with annotations that describe the semantics of the program and (ii) an automatic parallelizing compiler that uses the annotations to in- crease the degree of parallelization.
It is extremely important that the annotations and the automatic parallelizing compiler are designed with the target application do- main in mind. In this paper, we discuss the Paralax approach to im- plicitly parallel programming and we review how the annotations and the compiler design help to successfully parallelize generic programs. We evaluate Paralax on SPECint benchmarks, which are a model for such programs, and demonstrate scalable speedups, up to a factor of 6 on 8 cores.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A generic architecture for implementing the advanced encryption standard (AES) encryption algorithm in silicon is proposed. This allows the instantiation of a wide range of chip specifications, with these taking the form of semiconductor intellectual property (IP) cores. Cores implemented from this architecture can perform both encryption and decryption and support four modes of operation: (i) electronic codebook mode; (ii) output feedback mode; (iii) cipher block chaining mode; and (iv) ciphertext feedback mode. Chip designs can also be generated to cover all three AES key lengths, namely 128 bits, 192 bits and 256 bits. On-the-fly generation of the round keys required during decryption is also possible. The general, flexible and multi-functional nature of the approach described contrasts with previous designs which, to date, have been focused on specific implementations. The presented ideas are demonstrated by implementation in FPGA technology. However, the architecture and IP cores derived from this are easily migratable to other silicon technologies including ASIC and PLD and are capable of covering a wide range of modem communication systems cryptographic requirements. Moreover, the designs produced have a gate count and throughput comparable with or better than the previous one-off solutions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With increasing demands on storage devices in the modern communication environment, the storage area network (SAN) has evolved to provide a direct connection allowing these storage devices to be accessed efficiently. To optimize the performance of a SAN, a three-stage hybrid electronic/optical switching node architecture based on the concept of a MPLS label switching mechanism, aimed at serving as a multi-protocol label switching (MPLS) ingress label edge router (LER) for a SAN-enabled application, has been designed. New shutter-based free-space multi-channel optical switching cores are employed as the core switch fabric to solve the packet contention and switching path conflict problems. The system-level node architecture design constraints are evaluated through self-similar traffic sourced from real gigabit Ethernet network traces and storage systems. The extension performance of a SAN over a proposed WDM ring network, aimed at serving as an MPLS-enabled transport network, is also presented and demonstrated.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Large-scale biological jobs on high-performance computing systems require manual intervention if one or more computing cores on which they execute fail. This places not only a cost on the maintenance of the job, but also a cost on the time taken for reinstating the job and the risk of losing data and execution accomplished by the job before it failed. Approaches which can proactively detect computing core failures and take action to relocate the computing core's job onto reliable cores can make a significant step towards automating fault tolerance. Method: This paper describes an experimental investigation into the use of multi-agent approaches for fault tolerance. Two approaches are studied, the first at the job level and the second at the core level. The approaches are investigated for single core failure scenarios that can occur in the execution of parallel reduction algorithms on computer clusters. A third approach is proposed that incorporates multi-agent technology both at the job and core level. Experiments are pursued in the context of genome searching, a popular computational biology application. Result: The key conclusion is that the approaches proposed are feasible for automating fault tolerance in high-performance computing systems with minimal human intervention. In a typical experiment in which the fault tolerance is studied, centralised and decentralised checkpointing approaches on an average add 90% to the actual time for executing the job. On the other hand, in the same experiment the multi-agent approaches add only 10% to the overall execution time