808 resultados para scalable parallel programming
Resumo:
A poster of this paper will be presented at the 25th International Conference on Parallel Architecture and Compilation Technology (PACT ’16), September 11-15, 2016, Haifa, Israel.
Resumo:
Structured parallel programming, and in particular programming models using the algorithmic skeleton or parallel design pattern concepts, are increasingly considered to be the only viable means of supporting effective development of scalable and efficient parallel programs. Structured parallel programming models have been assessed in a number of works in the context of performance. In this paper we consider how the use of structured parallel programming models allows knowledge of the parallel patterns present to be harnessed to address both performance and energy consumption. We consider different features of structured parallel programming that may be leveraged to impact the performance/energy trade-off and we discuss a preliminary set of experiments validating our claims.
Resumo:
Over the last three decades, computer architects have been able to achieve an increase in performance for single processors by, e.g., increasing clock speed, introducing cache memories and using instruction level parallelism. However, because of power consumption and heat dissipation constraints, this trend is going to cease. In recent times, hardware engineers have instead moved to new chip architectures with multiple processor cores on a single chip. With multi-core processors, applications can complete more total work than with one core alone. To take advantage of multi-core processors, parallel programming models are proposed as promising solutions for more effectively using multi-core processors. This paper discusses some of the existent models and frameworks for parallel programming, leading to outline a draft parallel programming model for Ada.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Biomédica
Resumo:
This paper describes our plans to evaluate the present state of affairs concerning parallel programming and its systems. Three subprojects are proposed: a survey among programmers and scientists, a comparison of parallel programming systems using a standard set of test programs, and a wiki resource for the parallel programming community - the Parawiki. We would like to invite you to participate and turn these subprojects into true community efforts.
Resumo:
In this publication, we report on an online survey that was carried out among parallel programmers. More than 250 people worldwide have submitted answers to our questions, and their responses are analyzed here. Although not statistically sound, the data we provide give useful insights about which parallel programming systems and languages are known and in actual use. For instance, the collected data indicate that for our survey group MPI and (to a lesser extent) C are the most widely used parallel programming system and language, respectively.
Resumo:
The process of developing software that takes advantage of multiple processors is commonly referred to as parallel programming. For various reasons, this process is much harder than the sequential case. For decades, parallel programming has been a problem for a small niche only: engineers working on parallelizing mostly numerical applications in High Performance Computing. This has changed with the advent of multi-core processors in mainstream computer architectures. Parallel programming in our days becomes a problem for a much larger group of developers. The main objective of this thesis was to find ways to make parallel programming easier for them. Different aims were identified in order to reach the objective: research the state of the art of parallel programming today, improve the education of software developers about the topic, and provide programmers with powerful abstractions to make their work easier. To reach these aims, several key steps were taken. To start with, a survey was conducted among parallel programmers to find out about the state of the art. More than 250 people participated, yielding results about the parallel programming systems and languages in use, as well as about common problems with these systems. Furthermore, a study was conducted in university classes on parallel programming. It resulted in a list of frequently made mistakes that were analyzed and used to create a programmers' checklist to avoid them in the future. For programmers' education, an online resource was setup to collect experiences and knowledge in the field of parallel programming - called the Parawiki. Another key step in this direction was the creation of the Thinking Parallel weblog, where more than 50.000 readers to date have read essays on the topic. For the third aim (powerful abstractions), it was decided to concentrate on one parallel programming system: OpenMP. Its ease of use and high level of abstraction were the most important reasons for this decision. Two different research directions were pursued. The first one resulted in a parallel library called AthenaMP. It contains so-called generic components, derived from design patterns for parallel programming. These include functionality to enhance the locks provided by OpenMP, to perform operations on large amounts of data (data-parallel programming), and to enable the implementation of irregular algorithms using task pools. AthenaMP itself serves a triple role: the components are well-documented and can be used directly in programs, it enables developers to study the source code and learn from it, and it is possible for compiler writers to use it as a testing ground for their OpenMP compilers. The second research direction was targeted at changing the OpenMP specification to make the system more powerful. The main contributions here were a proposal to enable thread-cancellation and a proposal to avoid busy waiting. Both were implemented in a research compiler, shown to be useful in example applications, and proposed to the OpenMP Language Committee.
Resumo:
A novel two-step paradigm was used to investigate the parallel programming of consecutive, stimulus-elicited ('reflexive') and endogenous ('voluntary') saccades. The mean latency of voluntary saccades, made following the first reflexive saccades in two-step conditions, was significantly reduced compared to that of voluntary saccades made in the single-step control trials. The latency of the first reflexive saccades was modulated by the requirement to make a second saccade: first saccade latency increased when a second voluntary saccade was required in the opposite direction to the first saccade, and decreased when a second saccade was required in the same direction as the first reflexive saccade. A second experiment confirmed the basic effect and also showed that a second reflexive saccade may be programmed in parallel with a first voluntary saccade. The results support the view that voluntary and reflexive saccades can be programmed in parallel on a common motor map. (c) 2006 Elsevier Ltd. All rights reserved.
Resumo:
Graph analytics is an important and computationally demanding class of data analytics. It is essential to balance scalability, ease-of-use and high performance in large scale graph analytics. As such, it is necessary to hide the complexity of parallelism, data distribution and memory locality behind an abstract interface. The aim of this work is to build a scalable graph analytics framework that does not demand significant parallel programming experience based on NUMA-awareness.
The realization of such a system faces two key problems:
(i)~how to develop a scale-free parallel programming framework that scales efficiently across NUMA domains; (ii)~how to efficiently apply graph partitioning in order to create separate and largely independent work items that can be distributed among threads.
Resumo:
A raíz de la aparición de los procesadores dotados de varios “cores”, la programación paralela, un concepto que, por otra parte no era nada nuevo y se conocía desde hace décadas, sufrió un nuevo impulso, pues se creía que se podía superar el techo tecnológico que había estado limitando el rendimiento de esta programación durante años. Este impulso se ha ido manteniendo hasta la actualidad, movido por la necesidad de sistemas cada vez más potentes y gracias al abaratamiento de los costes de fabricación. Esta tendencia ha motivado la aparición de nuevo software y lenguajes con componentes orientados precisamente al campo de la programación paralela. Este es el caso del lenguaje Go, desarrollado por Google y lanzado en 2009. Este lenguaje se basa en modelos de concurrencia que lo hacen muy adecuados para abordar desarrollos de naturaleza paralela. Sin embargo, la programación paralela es un campo complejo y heterogéneo, y los programadores son reticentes a utilizar herramientas nuevas, en beneficio de aquellas que ya conocen y les son familiares. Un buen ejemplo son aquellas implementaciones de lenguajes conocidos, pero orientadas a programación paralela, y que siguen las directrices de un estándar ampliamente reconocido y aceptado. Este es el caso del estándar OpenMP, un Interfaz de Programación de Aplicaciones (API) flexible, portable y escalable, orientado a la programación paralela multiproceso en arquitecturas multi-core o multinucleo. Dicho estándar posee actualmente implementaciones en los lenguajes C, C++ y Fortran. Este proyecto nace como un intento de aunar ambos conceptos: un lenguaje emergente con interesantes posibilidades en el campo de la programación paralela, y un estándar reputado y ampliamente extendido, con el que los programadores se encuentran familiarizados. El objetivo principal es el desarrollo de un conjunto de librerías del sistema (que engloben directivas de compilación o pragmas, librerías de ejecución y variables de entorno), soportadas por las características y los modelos de concurrencia propios de Go; y que añadan funcionalidades propias del estándar OpenMP. La idea es añadir funcionalidades que permitan programar en lenguaje Go utilizando la sintaxis que OpenMP proporciona para otros lenguajes, como Fortan y C/C++ (concretamente, similar a esta última), y, de esta forma, dotar al usuario de Go de herramientas para programar estructuras de procesamiento paralelo de forma sencilla y transparente, de la misma manera que lo haría utilizando C/C++.---ABSTRACT---As a result of the appearance of processors equipped with multiple "cores ", parallel programming, a concept which, moreover, it was not new and it was known for decades, suffered a new impulse, because it was believed they could overcome the technological ceiling had been limiting the performance of this program for years. This impulse has been maintained until today, driven by the need for ever more powerful systems and thanks to the decrease in manufacturing costs. This trend has led to the emergence of new software and languages with components guided specifically to the field of parallel programming. This is the case of Go language, developed by Google and released in 2009. This language is based on concurrency models that make it well suited to tackle developments in parallel nature. However, parallel programming is a complex and heterogeneous field, and developers are reluctant to use new tools to benefit those who already know and are familiar. A good example are those implementations from well-known languages, but parallel programming oriented, and witch follow the guidelines of a standard widely recognized and accepted. This is the case of the OpenMP standard, an application programming interface (API), flexible, portable and scalable, parallel programming oriented, and designed for multi-core architectures. This standard currently has implementations in C, C ++ and Fortran. This project was born as an attempt to combine two concepts: an emerging language, with interesting possibilities in the field of parallel programming, and a reputed and widespread standard, with which programmers are familiar with. The main objective is to develop a set of system libraries (which includes compiler directives or pragmas, runtime libraries and environment variables), supported by the characteristics and concurrency patterns of Go; and that add custom features from the OpenMP standard. The idea is to add features that allow programming in Go language using the syntax OpenMP provides for other languages, like Fortran and C / C ++ (specifically, similar to the latter ), and, in this way, provide Go users with tools for programming parallel structures easily and, in the same way they would using C / C ++.
Resumo:
A novel high throughput and scalable unified architecture for the computation of the transform operations in video codecs for advanced standards is presented in this paper. This structure can be used as a hardware accelerator in modern embedded systems to efficiently compute all the two-dimensional 4 x 4 and 2 x 2 transforms of the H.264/AVC standard. Moreover, its highly flexible design and hardware efficiency allows it to be easily scaled in terms of performance and hardware cost to meet the specific requirements of any given video coding application. Experimental results obtained using a Xilinx Virtex-5 FPGA demonstrated the superior performance and hardware efficiency levels provided by the proposed structure, which presents a throughput per unit of area relatively higher than other similar recently published designs targeting the H.264/AVC standard. Such results also showed that, when integrated in a multi-core embedded system, this architecture provides speedup factors of about 120x concerning pure software implementations of the transform algorithms, therefore allowing the computation, in real-time, of all the above mentioned transforms for Ultra High Definition Video (UHDV) sequences (4,320 x 7,680 @ 30 fps).
Resumo:
Processors with large numbers of cores are becoming commonplace. In order to utilise the available resources in such systems, the programming paradigm has to move towards increased parallelism. However, increased parallelism does not necessarily lead to better performance. Parallel programming models have to provide not only flexible ways of defining parallel tasks, but also efficient methods to manage the created tasks. Moreover, in a general-purpose system, applications residing in the system compete for the shared resources. Thread and task scheduling in such a multiprogrammed multithreaded environment is a significant challenge. In this thesis, we introduce a new task-based parallel reduction model, called the Glasgow Parallel Reduction Machine (GPRM). Our main objective is to provide high performance while maintaining ease of programming. GPRM supports native parallelism; it provides a modular way of expressing parallel tasks and the communication patterns between them. Compiling a GPRM program results in an Intermediate Representation (IR) containing useful information about tasks, their dependencies, as well as the initial mapping information. This compile-time information helps reduce the overhead of runtime task scheduling and is key to high performance. Generally speaking, the granularity and the number of tasks are major factors in achieving high performance. These factors are even more important in the case of GPRM, as it is highly dependent on tasks, rather than threads. We use three basic benchmarks to provide a detailed comparison of GPRM with Intel OpenMP, Cilk Plus, and Threading Building Blocks (TBB) on the Intel Xeon Phi, and with GNU OpenMP on the Tilera TILEPro64. GPRM shows superior performance in almost all cases, only by controlling the number of tasks. GPRM also provides a low-overhead mechanism, called “Global Sharing”, which improves performance in multiprogramming situations. We use OpenMP, as the most popular model for shared-memory parallel programming as the main GPRM competitor for solving three well-known problems on both platforms: LU factorisation of Sparse Matrices, Image Convolution, and Linked List Processing. We focus on proposing solutions that best fit into the GPRM’s model of execution. GPRM outperforms OpenMP in all cases on the TILEPro64. On the Xeon Phi, our solution for the LU Factorisation results in notable performance improvement for sparse matrices with large numbers of small blocks. We investigate the overhead of GPRM’s task creation and distribution for very short computations using the Image Convolution benchmark. We show that this overhead can be mitigated by combining smaller tasks into larger ones. As a result, GPRM can outperform OpenMP for convolving large 2D matrices on the Xeon Phi. Finally, we demonstrate that our parallel worksharing construct provides an efficient solution for Linked List processing and performs better than OpenMP implementations on the Xeon Phi. The results are very promising, as they verify that our parallel programming framework for manycore processors is flexible and scalable, and can provide high performance without sacrificing productivity.
Resumo:
One of the challenges in scientific visualization is to generate software libraries suitable for the large-scale data emerging from tera-scale simulations and instruments. We describe the efforts currently under way at SDSC and NPACI to address these challenges. The scope of the SDSC project spans data handling, graphics, visualization, and scientific application domains. Components of the research focus on the following areas: intelligent data storage, layout and handling, using an associated “Floor-Plan” (meta data); performance optimization on parallel architectures; extension of SDSC’s scalable, parallel, direct volume renderer to allow perspective viewing; and interactive rendering of fractional images (“imagelets”), which facilitates the examination of large datasets. These concepts are coordinated within a data-visualization pipeline, which operates on component data blocks sized to fit within the available computing resources. A key feature of the scheme is that the meta data, which tag the data blocks, can be propagated and applied consistently. This is possible at the disk level, in distributing the computations across parallel processors; in “imagelet” composition; and in feature tagging. The work reflects the emerging challenges and opportunities presented by the ongoing progress in high-performance computing (HPC) and the deployment of the data, computational, and visualization Grids.
Resumo:
Multicore platforms have transformed parallelism into a main concern. Parallel programming models are being put forward to provide a better approach for application programmers to expose the opportunities for parallelism by pointing out potentially parallel regions within tasks, leaving the actual and dynamic scheduling of these regions onto processors to be performed at runtime, exploiting the maximum amount of parallelism. It is in this context that this paper proposes a scheduling approach that combines the constant-bandwidth server abstraction with a priority-aware work-stealing load balancing scheme which, while ensuring isolation among tasks, enables parallel tasks to be executed on more than one processor at a given time instant.
Resumo:
Article in Press, Corrected Proof