839 resultados para Simplex. CPLEXR. Parallel Efficiency. Parallel Scalability. Linear Programming
Resumo:
We describe an approach aimed at addressing the issue of joint exploitation of control (stream) and data parallelism in a skeleton based parallel programming environment, based on annotations and refactoring. Annotations drive efficient implementation of a parallel computation. Refactoring is used to transform the associated skeleton tree into a more efficient, functionally equivalent skeleton tree. In most cases, cost models are used to drive the refactoring process. We show how sample use case applications/kernels may be optimized and discuss preliminary experiments with FastFlow assessing the theoretical results. © 2013 Springer-Verlag.
Resumo:
Recent trends towards increasingly parallel computers mean that there needs to be a seismic shift in programming practice. The time is rapidly approaching when most programming will be for parallel systems. However, most programming techniques in use today are geared towards sequential, or occasionally small-scale parallel, programming. While refactoring has so far mainly been applied to sequential programs, it is our contention that refactoring can play a key role in significantly improving the programmability of parallel systems, by allowing the programmer to apply a set of well-defined transformations in order to parallelise their programs. In this paper, we describe a new language-independent refactoring approach that helps introduce and tune parallelism through high-level design patterns targeting a set of well-specified parallel skeletons. We believe this new refactoring process is the key to allowing programmers to truly start thinking in parallel. © 2012 ACM.
Resumo:
Recently, Bell ( 2004 Mon. Not. R. Astron. Soc. 353 550) has reanalysed the problem of wave excitation by cosmic rays propagating in the pre-cursor region of a supernova remnant shock front. He pointed out a strong, non-resonant, current-driven instability that had been overlooked in the kinetic treatments by Achterberg ( 1983 Astron. Astrophys. 119 274) and McKenzie and Volk ( 1982 Astron. Astrophys. 116 191), and suggested that it is responsible for substantial amplification of the ambient magnetic field. Magnetic field amplification is also an important issue in the problem of the formation and structure of relativistic shock fronts, particularly in relation to models of gamma-ray bursts. We have therefore generalized the linear analysis to apply to this case, assuming a relativistic background plasma and a monoenergetic, unidirectional incoming proton beam. We find essentially the same non-resonant instability observed by Bell and show that also, under GRB conditions, it grows much faster than the resonant waves. We quantify the extent to which thermal effects in the background plasma limit the maximum growth rate.
Resumo:
The inherent difficulty of thread-based shared-memory programming has recently motivated research in high-level, task-parallel programming models. Recent advances of Task-Parallel models add implicit synchronization, where the system automatically detects and satisfies data dependencies among spawned tasks. However, dynamic dependence analysis incurs significant runtime overheads, because the runtime must track task resources and use this information to schedule tasks while avoiding conflicts and races.
We present SCOOP, a compiler that effectively integrates static and dynamic analysis in code generation. SCOOP combines context-sensitive points-to, control-flow, escape, and effect analyses to remove redundant dependence checks at runtime. Our static analysis can work in combination with existing dynamic analyses and task-parallel runtimes that use annotations to specify tasks and their memory footprints. We use our static dependence analysis to detect non-conflicting tasks and an existing dynamic analysis to handle the remaining dependencies. We evaluate the resulting hybrid dependence analysis on a set of task-parallel programs.
Resumo:
Refactoring is the process of changing the structure of a program without changing its behaviour. Refactoring has so far only really been deployed effectively for sequential programs. However, with the increased availability of multicore (and, soon, manycore) systems, refactoring can play an important role in helping both expert and non-expert parallel programmers structure and implement their parallel programs. This paper describes the design of a new refactoring tool that is aimed at increasing the programmability of parallel systems. To motivate our design, we refactor a number of examples in C, C++ and Erlang into good parallel implementations, using a set of formal pattern rewrite rules. © 2013 Springer-Verlag Berlin Heidelberg.
Resumo:
This paper describes the ParaPhrase project, a new 3-year targeted research project funded under EU Framework 7 Objective 3.4 (Computer Systems), starting in October 2011. ParaPhrase aims to follow a new approach to introducing parallelism using advanced refactoring techniques coupled with high-level parallel design patterns. The refactoring approach will use these design patterns to restructure programs defined as networks of software components into other forms that are more suited to parallel execution. The programmer will be aided by high-level cost information that will be integrated into the refactoring tools. The implementation of these patterns will then use a well-understood algorithmic skeleton approach to achieve good parallelism. A key ParaPhrase design goal is that parallel components are intended to match heterogeneous architectures, defined in terms of CPU/GPU combinations, for example. In order to achieve this, the ParaPhrase approach will map components at link time to the available hardware, and will then re-map them during program execution, taking account of multiple applications, changes in hardware resource availability, the desire to reduce communication costs etc. In this way, we aim to develop a new approach to programming that will be able to produce software that can adapt to dynamic changes in the system environment. Moreover, by using a strong component basis for parallelism, we can achieve potentially significant gains in terms of reducing sharing at a high level of abstraction, and so in reducing or even eliminating the costs that are usually associated with cache management, locking, and synchronisation. © 2013 Springer-Verlag Berlin Heidelberg.
Resumo:
This paper presents a new programming methodology for introducing and tuning parallelism in Erlang programs, using source-level code refactoring from sequential source programs to parallel programs written using our skeleton library, Skel. High-level cost models allow us to predict with reasonable accuracy the parallel performance of the refactored program, enabling programmers to make informed decisions about which refactorings to apply. Using our approach, we demonstrate easily obtainable, significant and scalable speedups of up to 21 on a 24-core machine over the sequential code.
Resumo:
Bonded-in rod connections in timber possess many desirable attributes in terms of efficiency, manufacture, performance, aesthetics and cost. In recent years research has been conducted on such connections using fibre reinforced polymers (FRPs) as an alternative to steel. This research programme investigates the pull-out capacity of Basalt FRP rods bonded-in in low grade Irish Sitka Spruce. Embedded length is thought to be the most influential variable contributing to pull- out capacity of bonded-in rods after rod diameter. Previous work has established an optimum embedded length of 15 times the hole diameter. However, this work only considered the effects of axial stress on the bond using a pull-compression testing system which may have given an artificially high pull out capacity as bending effects were neglected. A hinge system was utilised that allows the effects of bending force to be taken in to consideration along with axial forces in a pull-out test. This paper describes an experimental programme where such pull-bending tests were carried out on samples constructed of 12mm diameter BFRP bars with a 2mm glueline thickness and embedded lengths between 80mm and 280mm bonded-in to low-grade timber with an epoxy resin. Nine repetitions of each were tested. A clear increase in pull-out strength was found with increasing embedded length.
Resumo:
We introduce a task-based programming model and runtime system that exploit the observation that not all parts of a program are equally significant for the accuracy of the end-result, in order to trade off the quality of program outputs for increased energy-efficiency. This is done in a structured and flexible way, allowing for easy exploitation of different points in the quality/energy space, without adversely affecting application performance. The runtime system can apply a number of different policies to decide whether it will execute less-significant tasks accurately or approximately.
The experimental evaluation indicates that our system can achieve an energy reduction of up to 83% compared with a fully accurate execution and up to 35% compared with an approximate version employing loop perforation. At the same time, our approach always results in graceful quality degradation.
Resumo:
The main motivation for the work presented here began with previously conducted experiments with a programming concept at the time named "Macro". These experiments led to the conviction that it would be possible to build a system of engine control from scratch, which could eliminate many of the current problems of engine management systems in a direct and intrinsic way. It was also hoped that it would minimize the full range of software and hardware needed to make a final and fully functional system. Initially, this paper proposes to make a comprehensive survey of the state of the art in the specific area of software and corresponding hardware of automotive tools and automotive ECUs. Problems arising from such software will be identified, and it will be clear that practically all of these problems stem directly or indirectly from the fact that we continue to make comprehensive use of extremely long and complex "tool chains". Similarly, in the hardware, it will be argued that the problems stem from the extreme complexity and inter-dependency inside processor architectures. The conclusions are presented through an extensive list of "pitfalls" which will be thoroughly enumerated, identified and characterized. Solutions will also be proposed for the various current issues and for the implementation of these same solutions. All this final work will be part of a "proof-of-concept" system called "ECU2010". The central element of this system is the before mentioned "Macro" concept, which is an graphical block representing one of many operations required in a automotive system having arithmetic, logic, filtering, integration, multiplexing functions among others. The end result of the proposed work is a single tool, fully integrated, enabling the development and management of the entire system in one simple visual interface. Part of the presented result relies on a hardware platform fully adapted to the software, as well as enabling high flexibility and scalability in addition to using exactly the same technology for ECU, data logger and peripherals alike. Current systems rely on a mostly evolutionary path, only allowing online calibration of parameters, but never the online alteration of their own automotive functionality algorithms. By contrast, the system developed and described in this thesis had the advantage of following a "clean-slate" approach, whereby everything could be rethought globally. In the end, out of all the system characteristics, "LIVE-Prototyping" is the most relevant feature, allowing the adjustment of automotive algorithms (eg. Injection, ignition, lambda control, etc.) 100% online, keeping the engine constantly working, without ever having to stop or reboot to make such changes. This consequently eliminates any "turnaround delay" typically present in current automotive systems, thereby enhancing the efficiency and handling of such systems.
Resumo:
In this paper the parallelization of a new learning algorithm for multilayer perceptrons, specifically targeted for nonlinear function approximation purposes, is discussed. Each major step of the algorithm is parallelized, a special emphasis being put in the most computationally intensive task, a least-squares solution of linear systems of equations.
Resumo:
Embedded real-time applications increasingly present high computation requirements, which need to be completed within specific deadlines, but that present highly variable patterns, depending on the set of data available in a determined instant. The current trend to provide parallel processing in the embedded domain allows providing higher processing power; however, it does not address the variability in the processing pattern. Dimensioning each device for its worst-case scenario implies lower average utilization, and increased available, but unusable, processing in the overall system. A solution for this problem is to extend the parallel execution of the applications, allowing networked nodes to distribute the workload, on peak situations, to neighbour nodes. In this context, this report proposes a framework to develop parallel and distributed real-time embedded applications, transparently using OpenMP and Message Passing Interface (MPI), within a programming model based on OpenMP. The technical report also devises an integrated timing model, which enables the structured reasoning on the timing behaviour of these hybrid architectures.
Resumo:
High-level parallel languages offer a simple way for application programmers to specify parallelism in a form that easily scales with problem size, leaving the scheduling of the tasks onto processors to be performed at runtime. Therefore, if the underlying system cannot efficiently execute those applications on the available cores, the benefits will be lost. In this paper, we consider how to schedule highly heterogenous parallel applications that require real-time performance guarantees on multicore processors. The paper proposes a novel scheduling approach that combines the global Earliest Deadline First (EDF) scheduler with a priority-aware work-stealing load balancing scheme, which enables parallel realtime tasks to be executed on more than one processor at a given time instant. Experimental results demonstrate the better scalability and lower scheduling overhead of the proposed approach comparatively to an existing real-time deadline-oriented scheduling class for the Linux kernel.
Resumo:
Variations in different types of genomes have been found to be responsible for a large degree of physical diversity such as appearance and susceptibility to disease. Identification of genomic variations is difficult and can be facilitated through computational analysis of DNA sequences. Newly available technologies are able to sequence billions of DNA base pairs relatively quickly. These sequences can be used to identify variations within their specific genome but must be mapped to a reference sequence first. In order to align these sequences to a reference sequence, we require mapping algorithms that make use of approximate string matching and string indexing methods. To date, few mapping algorithms have been tailored to handle the massive amounts of output generated by newly available sequencing technologies. In otrder to handle this large amount of data, we modified the popular mapping software BWA to run in parallel using OpenMPI. Parallel BWA matches the efficiency of multithreaded BWA functions while providing efficient parallelism for BWA functions that do not currently support multithreading. Parallel BWA shows significant wall time speedup in comparison to multithreaded BWA on high-performance computing clusters, and will thus facilitate the analysis of genome sequencing data.
Resumo:
Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal