151 resultados para Programmer


Relevância:

10.00% 10.00%

Publicador:

Resumo:

The parallelization of real-world compute intensive Fortran application codes is generally not a trivial task. If the time to complete the parallelization is to be significantly reduced then an environment is needed that will assist the programmer in the various tasks of code parallelization. In this paper the authors present a code parallelization environment where a number of tools that address the main tasks such as code parallelization, debugging and optimization are available. The ParaWise and CAPO parallelization tools are discussed which enable the near automatic parallelization of real-world scientific application codes for shared and distributed memory-based parallel systems. As user involvement in the parallelization process can introduce errors, a relative debugging tool (P2d2) is also available and can be used to perform nearly automatic relative debugging of a program that has been parallelized using the tools. A high quality interprocedural dependence analysis as well as user-tool interaction are also highlighted and are vital to the generation of efficient parallel code and in the optimization of the backtracking and speculation process used in relative debugging. Results of benchmark and real-world application codes parallelized are presented and show the benefits of using the environment

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents innovative work in the development of policy-based autonomic computing. The core of the work is a powerful and flexible policy-expression language AGILE, which facilitates run-time adaptable policy configuration of autonomic systems. AGILE also serves as an integrating platform for other self-management technologies including signal processing, automated trend analysis and utility functions. Each of these technologies has specific advantages and applicability to different types of dynamic adaptation. The AGILE platform enables seamless interoperability of the different technologies to each perform various aspects of self-management within a single application. The various technologies are implemented as object components. Self-management behaviour is specified using the policy language semantics to bind the various components together as required. Since the policy semantics support run-time re-configuration, the self-management architecture is dynamically composable. Additional benefits include the standardisation of the application programmer interface, terminology and semantics, and only a single point of embedding is required.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Functional and non-functional concerns require different programming effort, different techniques and different methodologies when attempting to program efficient parallel/distributed applications. In this work we present a "programmer oriented" methodology based on formal tools that permits reasoning about parallel/distributed program development and refinement. The proposed methodology is semi-formal in that it does not require the exploitation of highly formal tools and techniques, while providing a palatable and effective support to programmers developing parallel/distributed applications, in particular when handling non-functional concerns.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Speeding up sequential programs on multicores is a challenging problem that is in urgent need of a solution. Automatic parallelization of irregular pointer-intensive codes, exempli?ed by the SPECint codes, is a very hard problem. This paper shows that, with a helping hand, such auto-parallelization is possible and fruitful. This paper makes the following contributions: (i) A compiler framework for extracting pipeline-like parallelism from outer program loops is presented. (ii) Using a light-weight programming model based on annotations, the programmer helps the compiler to ?nd thread-level parallelism. Each of the annotations speci?es only a small piece of semantic information that compiler analysis misses, e.g. stating that a variable is dead at a certain program point. The annotations are designed such that correctness is easily veri?ed. Furthermore, we present a tool for suggesting annotations to the programmer. (iii) The methodology is applied to autoparallelize several SPECint benchmarks. For the benchmark with most parallelism (hmmer), we obtain a scalable 7-fold speedup on an AMD quad-core dual processor. The annotations constitute a parallel programming model that relies extensively on a sequential program representation. Hereby, the complexity of debugging is not increased and it does not obscure the source code. These properties could prove valuable to increase the ef?ciency of parallel programming.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Traditional static analysis fails to auto-parallelize programs with a complex control and data flow. Furthermore, thread-level parallelism in such programs is often restricted to pipeline parallelism, which can be hard to discover by a programmer. In this paper we propose a tool that, based on profiling information, helps the programmer to discover parallelism. The programmer hand-picks the code transformations from among the proposed candidates which are then applied by automatic code transformation techniques.

This paper contributes to the literature by presenting a profiling tool for discovering thread-level parallelism. We track dependencies at the whole-data structure level rather than at the element level or byte level in order to limit the profiling overhead. We perform a thorough analysis of the needs and costs of this technique. Furthermore, we present and validate the belief that programs with complex control and data flow contain significant amounts of exploitable coarse-grain pipeline parallelism in the program’s outer loops. This observation validates our approach to whole-data structure dependencies. As state-of-the-art compilers focus on loops iterating over data structure members, this observation also explains why our approach finds coarse-grain pipeline parallelism in cases that have remained out of reach for state-of-the-art compilers. In cases where traditional compilation techniques do find parallelism, our approach allows to discover higher degrees of parallelism, allowing a 40% speedup over traditional compilation techniques. Moreover, we demonstrate real speedups on multiple hardware platforms.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Autonomic management can be used to improve the QoS provided by parallel/distributed applications. We discuss behavioural skeletons introduced in earlier work: rather than relying on programmer ability to design “from scratch” efficient autonomic policies, we encapsulate general autonomic controller features into algorithmic skeletons. Then we leave to the programmer the duty of specifying the parameters needed to specialise the skeletons to the needs of the particular application at hand. This results in the programmer having the ability to fast prototype and tune distributed/parallel applications with non-trivial autonomic management capabilities. We discuss how behavioural skeletons have been implemented in the framework of GCM(the Grid ComponentModel developed within the CoreGRID NoE and currently being implemented within the GridCOMP STREP project). We present results evaluating the overhead introduced by autonomic management activities as well as the overall behaviour of the skeletons. We also present results achieved with a long running application subject to autonomic management and dynamically adapting to changing features of the target architecture.
Overall the results demonstrate both the feasibility of implementing autonomic control via behavioural skeletons and the effectiveness of our sample behavioural skeletons in managing the “functional replication” pattern(s).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recent trends towards increasingly parallel computers mean that there needs to be a seismic shift in programming practice. The time is rapidly approaching when most programming will be for parallel systems. However, most programming techniques in use today are geared towards sequential, or occasionally small-scale parallel, programming. While refactoring has so far mainly been applied to sequential programs, it is our contention that refactoring can play a key role in significantly improving the programmability of parallel systems, by allowing the programmer to apply a set of well-defined transformations in order to parallelise their programs. In this paper, we describe a new language-independent refactoring approach that helps introduce and tune parallelism through high-level design patterns targeting a set of well-specified parallel skeletons. We believe this new refactoring process is the key to allowing programmers to truly start thinking in parallel. © 2012 ACM.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Data flow techniques have been around since the early '70s when they were used in compilers for sequential languages. Shortly after their introduction they were also consideredas a possible model for parallel computing, although the impact here was limited. Recently, however, data flow has been identified as a candidate for efficient implementation of various programming models on multi-core architectures. In most cases, however, the burden of determining data flow "macro" instructions is left to the programmer, while the compiler/run time system manages only the efficient scheduling of these instructions. We discuss a structured parallel programming approach supporting automatic compilation of programs to macro data flow and we show experimental results demonstrating the feasibility of the approach and the efficiency of the resulting "object" code on different classes of state-of-the-art multi-core architectures. The experimental results use different base mechanisms to implement the macro data flow run time support, from plain pthreads with condition variables to more modern and effective lock- and fence-free parallel frameworks. Experimental results comparing efficiency of the proposed approach with those achieved using other, more classical, parallel frameworks are also presented. © 2012 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present BDDT, a task-parallel runtime system that dynamically discovers and resolves dependencies among parallel tasks. BDDT allows the programmer to specify detailed task footprints on any memory address range, multidimensional array tile or dynamic region. BDDT uses a block-based dependence analysis with arbitrary granularity. The analysis is applicable to existing C programs without having to restructure object or array allocation, and provides flexibility in array layouts and tile dimensions.
We evaluate BDDT using a representative set of benchmarks, and we compare it to SMPSs (the equivalent runtime system in StarSs) and OpenMP. BDDT performs comparable to or better than SMPSs and is able to cope with task granularity as much as one order of magnitude finer than SMPSs. Compared to OpenMP, BDDT performs up to 3.9× better for benchmarks that benefit from dynamic dependence analysis. BDDT provides additional data annotations to bypass dependence analysis. Using these annotations, BDDT outperforms OpenMP also in benchmarks where dependence analysis does not discover additional parallelism, thanks to a more efficient implementation of the runtime system.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes the ParaPhrase project, a new 3-year targeted research project funded under EU Framework 7 Objective 3.4 (Computer Systems), starting in October 2011. ParaPhrase aims to follow a new approach to introducing parallelism using advanced refactoring techniques coupled with high-level parallel design patterns. The refactoring approach will use these design patterns to restructure programs defined as networks of software components into other forms that are more suited to parallel execution. The programmer will be aided by high-level cost information that will be integrated into the refactoring tools. The implementation of these patterns will then use a well-understood algorithmic skeleton approach to achieve good parallelism. A key ParaPhrase design goal is that parallel components are intended to match heterogeneous architectures, defined in terms of CPU/GPU combinations, for example. In order to achieve this, the ParaPhrase approach will map components at link time to the available hardware, and will then re-map them during program execution, taking account of multiple applications, changes in hardware resource availability, the desire to reduce communication costs etc. In this way, we aim to develop a new approach to programming that will be able to produce software that can adapt to dynamic changes in the system environment. Moreover, by using a strong component basis for parallelism, we can achieve potentially significant gains in terms of reducing sharing at a high level of abstraction, and so in reducing or even eliminating the costs that are usually associated with cache management, locking, and synchronisation. © 2013 Springer-Verlag Berlin Heidelberg.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Processor architectures has taken a turn towards many-core processors, which integrate multiple processing cores on a single chip to increase overall performance, and there are no signs that this trend will stop in the near future. Many-core processors are harder to program than multi-core and single-core processors due to the need of writing parallel or concurrent programs with high degrees of parallelism. Moreover, many-cores have to operate in a mode of strong scaling because of memory bandwidth constraints. In strong scaling increasingly finer-grain parallelism must be extracted in order to keep all processing cores busy.

Task dataflow programming models have a high potential to simplify parallel program- ming because they alleviate the programmer from identifying precisely all inter-task de- pendences when writing programs. Instead, the task dataflow runtime system detects and enforces inter-task dependences during execution based on the description of memory each task accesses. The runtime constructs a task dataflow graph that captures all tasks and their dependences. Tasks are scheduled to execute in parallel taking into account dependences specified in the task graph.

Several papers report important overheads for task dataflow systems, which severely limits the scalability and usability of such systems. In this paper we study efficient schemes to manage task graphs and analyze their scalability. We assume a programming model that supports input, output and in/out annotations on task arguments, as well as commutative in/out and reductions. We analyze the structure of task graphs and identify versions and generations as key concepts for efficient management of task graphs. Then, we present three schemes to manage task graphs building on graph representations, hypergraphs and lists. We also consider a fourth edge-less scheme that synchronizes tasks using integers. Analysis using micro-benchmarks shows that the graph representation is not always scalable and that the edge-less scheme introduces least overhead in nearly all situations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Following earlier work demonstrating the utility of Orc as a means of specifying and reasoning about grid applications we propose the enhancement of such specifications with metadata that provide a means to extend an Orc specification with implementation oriented information. We argue that such specifications provide a useful refinement step in allowing reasoning about implementation related issues ahead of actual implementation or even prototyping. As examples, we demonstrate how such extended specifications can be used for investigating security related issues and for evaluating the cost of handling grid resource faults. The approach emphasises a semi-formal style of reasoning that makes maximum use of programmer domain knowledge and experience.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We propose a methodology for optimizing the execution of data parallel (sub-)tasks on CPU and GPU cores of the same heterogeneous architecture. The methodology is based on two main components: i) an analytical performance model for scheduling tasks among CPU and GPU cores, such that the global execution time of the overall data parallel pattern is optimized; and ii) an autonomic module which uses the analytical performance model to implement the data parallel computations in a completely autonomic way, requiring no programmer intervention to optimize the computation across CPU and GPU cores. The analytical performance model uses a small set of simple parameters to devise a partitioning-between CPU and GPU cores-of the tasks derived from structured data parallel patterns/algorithmic skeletons. The model takes into account both hardware related and application dependent parameters. It computes the percentage of tasks to be executed on CPU and GPU cores such that both kinds of cores are exploited and performance figures are optimized. The autonomic module, implemented in FastFlow, executes a generic map (reduce) data parallel pattern scheduling part of the tasks to the GPU and part to CPU cores so as to achieve optimal execution time. Experimental results on state-of-the-art CPU/GPU architectures are shown that assess both performance model properties and autonomic module effectiveness. © 2013 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The goal of the POBICOS project is a platform that facilitates the development and deployment of pervasive computing applications destined for networked, cooperating objects. POBICOS object communities are heterogeneous in terms of the sensing, actuating, and computing resources contributed by each object. Moreover, it is assumed that an object community is formed without any master plan; for example, it may emerge as a by-product of acquiring everyday, POBICOS-enabled objects by a household. As a result, the target object community is, at least partially, unknown to the application programmer, and so a POBICOS application should be able to deliver its functionality on top of diverse object communities (we call this opportunistic computing). The POBICOS platform includes a middleware offering a programming model for opportunistic computing, as well as development and monitoring tools. This paper briefly describes the tools produced in the first phase of the project. Also, the stakeholders using these tools are identified, and a development process for both the middleware and applications is presented. © 2009 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes middleware-level support for agent mobility, targeted at hierarchically structured wireless sensor and actuator network applications. Agent mobility enables a dynamic deployment and adaptation of the application on top of the wireless network at runtime, while allowing the middleware to optimize the placement of agents, e.g., to reduce wireless network traffic, transparently to the application programmer. The paper presents the design of the mechanisms and protocols employed to instantiate agents on nodes and to move agents between nodes. It also gives an evaluation of a middleware prototype running on Imote2 nodes that communicate over ZigBee. The results show that our implementation is reasonably efficient and fast enough to support the envisioned functionality on top of a commodity multi-hop wireless technology. Our work is to a large extent platform-neutral, thus it can inform the design of other systems that adopt a hierarchical structuring of mobile components. © 2012 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering.