246 resultados para parallel architectures


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recently, Bell ( 2004 Mon. Not. R. Astron. Soc. 353 550) has reanalysed the problem of wave excitation by cosmic rays propagating in the pre-cursor region of a supernova remnant shock front. He pointed out a strong, non-resonant, current-driven instability that had been overlooked in the kinetic treatments by Achterberg ( 1983 Astron. Astrophys. 119 274) and McKenzie and Volk ( 1982 Astron. Astrophys. 116 191), and suggested that it is responsible for substantial amplification of the ambient magnetic field. Magnetic field amplification is also an important issue in the problem of the formation and structure of relativistic shock fronts, particularly in relation to models of gamma-ray bursts. We have therefore generalized the linear analysis to apply to this case, assuming a relativistic background plasma and a monoenergetic, unidirectional incoming proton beam. We find essentially the same non-resonant instability observed by Bell and show that also, under GRB conditions, it grows much faster than the resonant waves. We quantify the extent to which thermal effects in the background plasma limit the maximum growth rate.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The inherent difficulty of thread-based shared-memory programming has recently motivated research in high-level, task-parallel programming models. Recent advances of Task-Parallel models add implicit synchronization, where the system automatically detects and satisfies data dependencies among spawned tasks. However, dynamic dependence analysis incurs significant runtime overheads, because the runtime must track task resources and use this information to schedule tasks while avoiding conflicts and races.
We present SCOOP, a compiler that effectively integrates static and dynamic analysis in code generation. SCOOP combines context-sensitive points-to, control-flow, escape, and effect analyses to remove redundant dependence checks at runtime. Our static analysis can work in combination with existing dynamic analyses and task-parallel runtimes that use annotations to specify tasks and their memory footprints. We use our static dependence analysis to detect non-conflicting tasks and an existing dynamic analysis to handle the remaining dependencies. We evaluate the resulting hybrid dependence analysis on a set of task-parallel programs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The management of non-functional features (performance, security, power management, etc.) is traditionally a difficult, error prone task for programmers of parallel applications. To take care of these non-functional features, autonomic managers running policies represented as rules using sensors and actuators to monitor and transform a running parallel application may be used. We discuss an approach aimed at providing formal tool support to the integration of independently developed autonomic managers taking care of different non-functional concerns within the same parallel application. Our approach builds on the Behavioural Skeleton experience (autonomic management of non-functional features in structured parallel applications) and on previous results on conflict detection and resolution in rule-based systems. © 2013 Springer-Verlag Berlin Heidelberg.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Refactoring is the process of changing the structure of a program without changing its behaviour. Refactoring has so far only really been deployed effectively for sequential programs. However, with the increased availability of multicore (and, soon, manycore) systems, refactoring can play an important role in helping both expert and non-expert parallel programmers structure and implement their parallel programs. This paper describes the design of a new refactoring tool that is aimed at increasing the programmability of parallel systems. To motivate our design, we refactor a number of examples in C, C++ and Erlang into good parallel implementations, using a set of formal pattern rewrite rules. © 2013 Springer-Verlag Berlin Heidelberg.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Processor architectures has taken a turn towards many-core processors, which integrate multiple processing cores on a single chip to increase overall performance, and there are no signs that this trend will stop in the near future. Many-core processors are harder to program than multi-core and single-core processors due to the need of writing parallel or concurrent programs with high degrees of parallelism. Moreover, many-cores have to operate in a mode of strong scaling because of memory bandwidth constraints. In strong scaling increasingly finer-grain parallelism must be extracted in order to keep all processing cores busy.

Task dataflow programming models have a high potential to simplify parallel program- ming because they alleviate the programmer from identifying precisely all inter-task de- pendences when writing programs. Instead, the task dataflow runtime system detects and enforces inter-task dependences during execution based on the description of memory each task accesses. The runtime constructs a task dataflow graph that captures all tasks and their dependences. Tasks are scheduled to execute in parallel taking into account dependences specified in the task graph.

Several papers report important overheads for task dataflow systems, which severely limits the scalability and usability of such systems. In this paper we study efficient schemes to manage task graphs and analyze their scalability. We assume a programming model that supports input, output and in/out annotations on task arguments, as well as commutative in/out and reductions. We analyze the structure of task graphs and identify versions and generations as key concepts for efficient management of task graphs. Then, we present three schemes to manage task graphs building on graph representations, hypergraphs and lists. We also consider a fourth edge-less scheme that synchronizes tasks using integers. Analysis using micro-benchmarks shows that the graph representation is not always scalable and that the edge-less scheme introduces least overhead in nearly all situations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work presents a novel algorithm for decomposing NFA automata into one-state-active modules for parallel execution on Multiprocessor Systems on Chip (MP-SoC). Furthermore, performance related studies based on a 16-PE system for Snort, Bro and Linux-L7 regular expressions are presented. ©2009 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An overview of research on reconfigurable architectures for network processing applications within the Institute of Electronics, Communications and Information Technology (ECIT) is presented. Three key network processing topics, namely node throughput, Quality of Service (QoS) and security are examined where custom reconfigurability allows network nodes to adapt to fluctuating network traffic and customer demands. Various architectural possibilities have been investigated in order to explore the options and tradeoffs available when using reconfigurability for packet/frame processing, packet-scheduling and data encryption/decryption. This research has shown there is no common approach that can be applied. Rather the methodologies used and the cost-benefits for incorporation of reconfigurability depend on each of the functions considered, for example being well suited to encryption/decryption but not packet/frame processing. © 2005 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The cycle of the academic year impacts on efforts to refine and improve major group design-build-test (DBT) projects since the time to run and evaluate projects is generally a full calendar year. By definition these major projects have a high degree of complexity since they act as the vehicle for the application of a range of technical knowledge and skills. There is also often an extensive list of desired learning outcomes which extends to include professional skills and attributes such as communication and team working. It is contended that student project definition and operation, like any other designed product, requires a number of iterations to achieve optimisation. The problem however is that if this cycle takes four or more years then by the time a project’s operational structure is fine tuned it is quite possible that the project theme is no longer relevant. The majority of the students will also inevitably experience a sub-optimal project experience over the 5 year development period. It would be much better if the ratio were flipped so that in 1 year an optimised project definition could be achieved which had sufficient longevity that it could run in the same efficient manner for 4 further years. An increased number of parallel investigators would also enable more varied and adventurous project concepts to be examined than a single institution could undertake alone in the same time frame.
This work-in-progress paper describes a parallel processing methodology for the accelerated definition of new student DBT project concepts. This methodology has been devised and implemented by a number of CDIO partner institutions in the UK & Ireland region. An agreed project theme was operated in parallel in one academic year with the objective of replacing a multi-year iterative cycle. Additionally the close collaboration and peer learning derived from the interaction between the coordinating academics facilitated the development of faculty teaching skills in line with CDIO standard 10.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper introduces hybrid address spaces as a fundamental design methodology for implementing scalable runtime systems on many-core architectures without hardware support for cache coherence. We use hybrid address spaces for an implementation of MapReduce, a programming model for large-scale data processing, and the implementation of a remote memory access (RMA) model. Both implementations are available on the Intel SCC and are portable to similar architectures. We present the design and implementation of HyMR, a MapReduce runtime system whereby different stages and the synchronization operations between them alternate between a distributed memory address space and a shared memory address space, to improve performance and scalability. We compare HyMR to a reference implementation and we find that HyMR improves performance by a factor of 1.71× over a set of representative MapReduce benchmarks. We also compare HyMR with Phoenix++, a state-of-art implementation for systems with hardware-managed cache coherence in terms of scalability and sustained to peak data processing bandwidth, where HyMR demon- strates improvements of a factor of 3.1× and 3.2× respectively. We further evaluate our hybrid remote memory access (HyRMA) programming model and assess its performance to be superior of that of message passing.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present TProf, an energy profiling tool for OpenMP-like task-parallel programs. To compute the energy consumed by each task in a parallel application, TProf dynamically traces the parallel execution and uses a novel technique to estimate the per-task energy consumption. To achieve this estimation, TProf apportions the total processor energy among cores and overcomes the limitation of current works which would otherwise make parallel accounting impossible to achieve. We demonstrate the value of TProf by characterizing a set of task parallel programs, where we find that data locality, memory access patterns and task working sets are responsible for significant variance in energy consumption between seemingly homogeneous tasks. In addition, we identify opportunities for fine-grain energy optimization by applying per-task Dynamic Voltage and Frequency Scaling (DVFS).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A framework supporting fast prototyping as well as tuning of distributed applications is presented. The approach is based on the adoption of a formal model that is used to describe the orchestration of distributed applications. The formal model (Orc by Misra and Cook) can be used to support semi-formal reasoning about the applications at hand. The paper describes how the framework can be used to derive and evaluate alternative orchestrations of a well know parallel/distributed computation pattern; and shows how the same formal model can be used to support generation of prototypes of distributed applications skeletons directly from the application description.