Biblioteca Digital

230 resultados para architectures

An efficient unbounded lock-free queue for multi-core systems

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The use of efficient synchronization mechanisms is crucial for implementing fine grained parallel programs on modern shared cache multi-core architectures. In this paper we study this problem by considering Single-Producer/Single- Consumer (SPSC) coordination using unbounded queues. A novel unbounded SPSC algorithm capable of reducing the row synchronization latency and speeding up Producer-Consumer coordination is presented. The algorithm has been extensively tested on a shared-cache multi-core platform and a sketch proof of correctness is presented. The queues proposed have been used as basic building blocks to implement the FastFlow parallel framework, which has been demonstrated to offer very good performance for fine-grain parallel applications. © 2012 Springer-Verlag.

Parallel patterns + Macro Data Flow for multi-core programming

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Data flow techniques have been around since the early '70s when they were used in compilers for sequential languages. Shortly after their introduction they were also consideredas a possible model for parallel computing, although the impact here was limited. Recently, however, data flow has been identified as a candidate for efficient implementation of various programming models on multi-core architectures. In most cases, however, the burden of determining data flow "macro" instructions is left to the programmer, while the compiler/run time system manages only the efficient scheduling of these instructions. We discuss a structured parallel programming approach supporting automatic compilation of programs to macro data flow and we show experimental results demonstrating the feasibility of the approach and the efficiency of the resulting "object" code on different classes of state-of-the-art multi-core architectures. The experimental results use different base mechanisms to implement the macro data flow run time support, from plain pthreads with condition variables to more modern and effective lock- and fence-free parallel frameworks. Experimental results comparing efficiency of the proposed approach with those achieved using other, more classical, parallel frameworks are also presented. © 2012 IEEE.

Prefetching and Cache Management using Task Lifetimes

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Task-based dataflow programming models and runtimes emerge as promising candidates for programming multicore and manycore architectures. These programming models analyze dynamically task dependencies at runtime and schedule independent tasks concurrently to the processing elements. In such models, cache locality, which is critical for performance, becomes more challenging in the presence of fine-grain tasks, and in architectures with many simple cores.

This paper presents a combined hardware-software approach to improve cache locality and offer better performance is terms of execution time and energy in the memory system. We propose the explicit bulk prefetcher (EBP) and epoch-based cache management (ECM) to help runtimes prefetch task data and guide the replacement decisions in caches. The runtimem software can use this hardware support to expose its internal knowledge about the tasks to the architecture and achieve more efficient task-based execution. Our combined scheme outperforms HW-only prefetchers and state-of-the-art replacement policies, improves performance by an average of 17%, generates on average 26% fewer L2 misses, and consumes on average 28% less energy in the components of the memory system.

Deployment on GPUs of an application in computational atomic physics

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes the deployment on GPUs of PROP, a program of the 2DRMP suite which models electron collisions with H-like atoms and ions. Because performance on GPUs is better in single precision than in double precision, the numerical stability of the PROP program in single precision has been studied. The numerical quality of PROP results computed in single precision and their impact on the next program of the 2DRMP suite has been analyzed. Successive versions of the PROP program on GPUs have been developed in order to improve its performance. Particular attention has been paid to the optimization of data transfers and of linear algebra operations. Performance obtained on several architectures (including NVIDIA Fermi) are presented.

Prediction Models for Multi-dimensional Power-Performance Optimization on Many Cores

Relevância:

10.00% 10.00%

Publicador:

Modeling Multi-grain Parallelism on Heterogeneous Multicore Processors: A Case Study of the Cell BE

Relevância:

10.00% 10.00%

Publicador:

A Runtime Framework for Optimizing Multi-dimensional Array Accesses on Multi-core Processors

Relevância:

10.00% 10.00%

Publicador:

Tagged Procedure Calls (TPC): Efficient Runtime Support for Task-Based Parallelism on the Cell Processor

Relevância:

10.00% 10.00%

Publicador:

A Unified Scheduler for Recursive and Task Dataflow Parallelism

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Task dataflow languages simplify the specification of parallel programs by dynamically detecting and enforcing dependencies between tasks. These languages are, however, often restricted to a single level of parallelism. This language design is reflected in the runtime system, where a master thread explicitly generates a task graph and worker threads execute ready tasks and wake-up their dependents. Such an approach is incompatible with state-of-the-art schedulers such as the Cilk scheduler, that minimize the creation of idle tasks (work-first principle) and place all task creation and scheduling off the critical path. This paper proposes an extension to the Cilk scheduler in order to reconcile task dependencies with the work-first principle. We discuss the impact of task dependencies on the properties of the Cilk scheduler. Furthermore, we propose a low-overhead ticket-based technique for dependency tracking and enforcement at the object level. Our scheduler also supports renaming of objects in order to increase task-level parallelism. Renaming is implemented using versioned objects, a new type of hyper object. Experimental evaluation shows that the unified scheduler is as efficient as the Cilk scheduler when tasks have no dependencies. Moreover, the unified scheduler is more efficient than SMPSS, a particular implementation of a task dataflow language.

Design and Evaluation of a Task-based Parallel H.264 Video Encoder for Heterogeneous Processors

Relevância:

10.00% 10.00%

Publicador:

Inference and Declaration of Independence: Impact on Deterministic Task Parallelism

Relevância:

10.00% 10.00%

Publicador:

Crime, justice and the legitimacy of military power in the international sphere

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This article examines how a discourse of crime and justice is beginning to play a significant role in justifying international military operations. It suggests that although the coupling of war with crime and justice is not a new phenomenon, its present manifestations invite careful consideration of the connection between crime and political theory. It starts by reviewing the notion of sovereignty to look then at the history of the criminalisation of war and the emergence of new norms to constrain sovereign states. In this context, it examines the three ways in which military force has recently been authorised: in Iraq, in Libya and through drones in Yemen, Pakistan and Somalia. It argues the contemporary coupling of military technology with notions of crime and justice allows the reiteration of the perpetration of crimes by the powerful and the representation of violence as pertaining to specific dangerous populations in the space of the international. It further suggests that this authorises new architectures of authority, fundamentally based on military power as a source of social power.

A Framework for Context-Driven End-to-End QoS Control in Converged Next Generation Networks”

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a framework for context-driven policy-based QoS control and end-to-end resource management in converged next generation networks. The Converged Networks QoS Framework (CNQF) is being developed within the IU-ATC project, and comprises distributed functional entities whose instances co-ordinate the converged network infrastructure to facilitate scalable and efficient end-to-end QoS management. The CNQF design leverages aspects of TISPAN, IETF and 3GPP policy-based management architectures whilst also introducing important innovative extensions to support context-aware QoS control in converged networks. The framework architecture is presented and its functionalities and operation in specific application scenarios are described.

Using Mesh-Geometry Relationships to Transfer Analysis Models between CAE Tools

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Integrating analysis and design models is a complex task due to differences between the models and the architectures of the toolsets used to create them. This complexity is increased with the use of many different tools for specific tasks during an analysis process. In this work various design and analysis models are linked throughout the design lifecycle, allowing them to be moved between packages in a way not currently available. Three technologies named Cellular Modeling, Virtual Topology and Equivalencing are combined to demonstrate how different finite element meshes generated on abstract analysis geometries can be linked to their original geometry. Establishing the equivalence relationships between models enables analysts to utilize multiple packages for specialist tasks without worrying about compatibility issues or rework.

A Note on Networks of Collaboration in Multimarket Oligopolies

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this note, we extend the Goyal and Joshi’s model of collaboration networks in oligopoly to multi-market situations. We examine the incentive of firms to form links and the architectures of the resulting equilibrium networks in this setting. We then present some results on efficient networks.

«
1
2
...
8
9
10
11
12
13
14
15
16
»