973 resultados para programming models


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The prevalence of multicore processors is bound to drive most kinds of software development towards parallel programming. To limit the difficulty and overhead of parallel software design and maintenance, it is crucial that parallel programming models allow an easy-to-understand, concise and dense representation of parallelism. Parallel programming models such as Cilk++ and Intel TBBs attempt to offer a better, higher-level abstraction for parallel programming than threads and locking synchronization. It is not straightforward, however, to express all patterns of parallelism in these models. Pipelines are an important parallel construct, although difficult to express in Cilk and TBBs in a straightfor- ward way, not without a verbose restructuring of the code. In this paper we demonstrate that pipeline parallelism can be easily and concisely expressed in a Cilk-like language, which we extend with input, output and input/output dependency types on procedure arguments, enforced at runtime by the scheduler. We evaluate our implementation on real applications and show that our Cilk-like scheduler, extended to track and enforce these dependencies has performance comparable to Cilk++.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many scientific applications are programmed using hybrid programming models that use both message passing and shared memory, due to the increasing prevalence of large-scale systems with multicore, multisocket nodes. Previous work has shown that energy efficiency can be improved using software-controlled execution schemes that consider both the programming model and the power-aware execution capabilities of the system. However, such approaches have focused on identifying optimal resource utilization for one programming model, either shared memory or message passing, in isolation. The potential solution space, thus the challenge, increases substantially when optimizing hybrid models since the possible resource configurations increase exponentially. Nonetheless, with the accelerating adoption of hybrid programming models, we increasingly need improved energy efficiency in hybrid parallel applications on large-scale systems. In this work, we present new software-controlled execution schemes that consider the effects of dynamic concurrency throttling (DCT) and dynamic voltage and frequency scaling (DVFS) in the context of hybrid programming models. Specifically, we present predictive models and novel algorithms based on statistical analysis that anticipate application power and time requirements under different concurrency and frequency configurations. We apply our models and methods to the NPB MZ benchmarks and selected applications from the ASC Sequoia codes. Overall, we achieve substantial energy savings (8.74 percent on average and up to 13.8 percent) with some performance gain (up to 7.5 percent) or negligible performance loss.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

On multiprocessors with explicitly managed memory hierarchies (EMM), software has the responsibility of moving data in and out of fast local memories. This task can be complex and error-prone even for expert programmers. Before we can allow compilers to handle the complexity for us, we must identify the abstractions that are general enough to allow us to write applications with reasonable effort, yet speci?c enough to exploit the vast on-chip memory bandwidth of EMM multi-processors. To this end, we compare two programming models against hand-tuned codes on the STI Cell, paying attention to programmability and performance. The ?rst programming model, Sequoia, abstracts the memory hierarchy as private address spaces, each corresponding to a parallel task. The second, Cellgen, is a new framework which provides OpenMP-like semantics and the abstraction of a shared address spaces divided into private and shared data. We compare three applications programmed using these models against their hand-optimized counterparts in terms of abstractions, programming complexity, and performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces hybrid address spaces as a fundamental design methodology for implementing scalable runtime systems on many-core architectures without hardware support for cache coherence. We use hybrid address spaces for an implementation of MapReduce, a programming model for large-scale data processing, and the implementation of a remote memory access (RMA) model. Both implementations are available on the Intel SCC and are portable to similar architectures. We present the design and implementation of HyMR, a MapReduce runtime system whereby different stages and the synchronization operations between them alternate between a distributed memory address space and a shared memory address space, to improve performance and scalability. We compare HyMR to a reference implementation and we find that HyMR improves performance by a factor of 1.71× over a set of representative MapReduce benchmarks. We also compare HyMR with Phoenix++, a state-of-art implementation for systems with hardware-managed cache coherence in terms of scalability and sustained to peak data processing bandwidth, where HyMR demon- strates improvements of a factor of 3.1× and 3.2× respectively. We further evaluate our hybrid remote memory access (HyRMA) programming model and assess its performance to be superior of that of message passing.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years several countries have set up policies that allow exchange of kidneys between two or more incompatible patient–donor pairs. These policies lead to what is commonly known as kidney exchange programs. The underlying optimization problems can be formulated as integer programming models. Previously proposed models for kidney exchange programs have exponential numbers of constraints or variables, which makes them fairly difficult to solve when the problem size is large. In this work we propose two compact formulations for the problem, explain how these formulations can be adapted to address some problem variants, and provide results on the dominance of some models over others. Finally we present a systematic comparison between our models and two previously proposed ones via thorough computational analysis. Results show that compact formulations have advantages over non-compact ones when the problem size is large.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a new method for the inclusion of nonlinear demand and supply relationships within a linear programming model. An existing method for this purpose is described first and its shortcomings are pointed out before showing how the new approach overcomes those difficulties and how it provides a more accurate and 'smooth' (rather than a kinked) approximation of the nonlinear functions as well as dealing with equilibrium under perfect competition instead of handling just the monopolistic situation. The workings of the proposed method are illustrated by extending a previously available sectoral model for the UK agriculture.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The optimized allocation of protective devices in strategic points of the circuit improves the quality of the energy supply and the system reliability index. This paper presents a nonlinear integer programming (NLIP) model with binary variables, to deal with the problem of protective device allocation in the main feeder and all branches of an overhead distribution circuit, to improve the reliability index and to provide customers with service of high quality and reliability. The constraints considered in the problem take into account technical and economical limitations, such as coordination problems of serial protective devices, available equipment, the importance of the feeder and the circuit topology. The use of genetic algorithms (GAs) is proposed to solve this problem, using a binary representation that does (1) or does not (0) show allocation of protective devices (reclosers, sectionalizers and fuses) in predefined points of the circuit. Results are presented for a real circuit (134 busses), with the possibility of protective device allocation in 29 points. Also the ability of the algorithm in finding good solutions while improving significantly the indicators of reliability is shown. (C) 2003 Elsevier B.V. All rights reserved.