Biblioteca Digital

905 resultados para shared memory

&-prolog and its performance: exploiting independent and-parallelism

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An Independent And-Parallel Prolog model and implementation, &-Prolog, are described. The description includes a summary of the system's architecture, some details of its execution model (based on the RAP-WAM model), and most importantly, its performance on sequential workstations and shared memory multiprocessors as compared with state-of-the-art Prolog systems. Speedup curves are provided for a collection of benchmark programs which demónstrate significant speed advantages over state-of the art sequential systems.

On the uses of attributed variables in parallel and concurrent logic programming systems

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Incorporating the possibility of attaching attributes to variables in a logic programming system has been shown to allow the addition of general constraint solving capabilities to it. This approach is very attractive in that by adding a few primitives any logic programming system can be turned into a generic constraint logic programming system in which constraint solving can be user defined, and at source level - an extreme example of the "glass box" approach. In this paper we propose a different and novel use for the concept of attributed variables: developing a generic parallel/concurrent (constraint) logic programming system, using the same "glass box" flavor. We argüe that a system which implements attributed variables and a few additional primitives can be easily customized at source level to implement many of the languages and execution models of parallelism and concurrency currently proposed, in both shared memory and distributed systems. We illustrate this through examples.

Oprimización y paralelización de un algoritmo de sincronización mediante el uso de GPUs y la tecnología CUDA

Relevância:

60.00% 60.00%

Publicador:

Resumo:

En el presente documento se hablará acerca del desarrollo de un proyecto para la mejora de un programa de análisis de señales; con ese fin, se hará uso de técnicas de optimización del software y de tecnologías de aceleración, mediante el aprovechamiento del paralelismo del programa. Además se hará un análisis de acerca del uso de dos tecnologías basadas en diferentes paradigmas de programación paralela; una mediante múltiples hilos con memoria compartida y la otra mediante el uso de GPUs como dispositivos de coprocesamiento. This paper will talk about the development of a Project to improve a program that does signals analysis; to that end, it will make use of software optimization techniques and acceleration technologies by exploiting parallelism in the program. In Addition will be done an analysis on the use of two technologies based on two different paradigms; one using multiple threads with shared memory and the other using GPU as co-processing devices.

Concurrent library abstraction without information hiding

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The commonly accepted approach to specifying libraries of concurrent algorithms is a library abstraction. Its idea is to relate a library to another one that abstracts away from details of its implementation and is simpler to reason about. A library abstraction relation has to validate the Abstraction Theorem: while proving a property of the client of the concurrent library, the library can be soundly replaced with its abstract implementation. Typically a library abstraction relation, such as linearizability, assumes a complete information hiding between a library and its client, which disallows them to communicate by means of shared memory. However, such way of communication may be used in a program, and correctness of interactions on a shared memory depends on the implicit contract between the library and the client. In this work we approach library abstraction without any assumptions about information hiding. To be able to formulate the contract between components of the program, we augment machine states of the program with two abstract states, views, of the client and the library. It enables formalising the contract with the internal safety, which requires components to preserve each other's views whenever their command is executed. We define the library a a correspondence between possible uses of a concrete and an abstract library. For our library abstraction relation and traces of a program, components of which follow their contract, we prove an Abstraction Theorem. RESUMEN. La técnica más aceptada actualmente para la especificación de librerías de algoritmos concurrentes es la abstracción de librerías (library abstraction). La idea subyacente es relacionar la librería original con otra que abstrae los detalles de implementación y conóon que describa dicha abstracción de librerías debe validar el Teorema de Abstracción: durante la prueba de la validez de una propiedad del cliente de la librería concurrente, el reemplazo de esta última por su implementación abstracta es lógicamente correcto. Usualmente, una relación de abstracción de librerías como la linearizabilidad (linearizability), tiene como premisa el ocultamiento de información entre el cliente y la librería (information hiding), es decir, que no se les permite comunicarse mediante la memoria compartida. Sin embargo, dicha comunicación ocurre en la práctica y la correctitud de estas interacciones en una memoria compartida depende de un contrato implícito entre la librería y el cliente. En este trabajo, se propone un nueva definición del concepto de abtracción de librerías que no presupone un ocultamiento de información entre la librería y el cliente. Con el fin de establecer un contrato entre diferentes componentes de un programa, extendemos la máquina de estados subyacente con dos estados abstractos que representan las vistas del cliente y la librería. Esto permite la formalización de la propiedad de seguridad interna (internal safety), que requiere que cada componente preserva la vista del otro durante la ejecuci on de un comando. Consecuentemente, se define la relación de abstracción de librerías mediante una correspondencia entre los usos posibles de una librería abstracta y una concreta. Finalmente, se prueba el Teorema de Abstracción para la relación de abstracción de librerías propuesta, para cualquier traza de un programa y cualquier componente que satisface los contratos apropiados.

Parallel relaxed and extrapolated algorithms for computing PageRank

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, parallel Relaxed and Extrapolated algorithms based on the Power method for accelerating the PageRank computation are presented. Different parallel implementations of the Power method and the proposed variants are analyzed using different data distribution strategies. The reported experiments show the behavior and effectiveness of the designed algorithms for realistic test data using either OpenMP, MPI or an hybrid OpenMP/MPI approach to exploit the benefits of shared memory inside the nodes of current SMP supercomputers.

Functionally partitioned module-based programmable architecture for wireless base-band processing

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A specialised reconfigurable architecture is targeted at wireless base-band processing. It is built to cater for multiple wireless standards. It has lower power consumption than the processor-based solution. It can be scaled to run in parallel for processing multiple channels. Test resources are embedded on the architecture and testing strategies are included. This architecture is functionally partitioned according to the common operations found in wireless standards, such as CRC error correction, convolution and interleaving. These modules are linked via Virtual Wire Hardware modules and route-through switch matrices. Data can be processed in any order through this interconnect structure. Virtual Wire ensures the same flexibility as normal interconnects, but the area occupied and the number of switches needed is reduced. The testing algorithm scans all possible paths within the interconnection network exhaustively and searches for faults in the processing modules. The testing algorithm starts by scanning the externally addressable memory space and testing the master controller. The controller then tests every switch in the route-through switch matrix by making loops from the shared memory to each of the switches. The local switch matrix is also tested in the same way. Next the local memory is scanned. Finally, pre-defined test vectors are loaded into local memory to check the processing modules. This paper compares various base-band processing solutions. It describes the proposed platform and its implementation. It outlines the test resources and algorithm. It concludes with the mapping of Bluetooth and GSM base-band onto the platform.

Test strategies on functionally partitioned module-based programmable architecture for base-band processing

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A specialised reconfigurable architecture for telecommunication base-band processing is augmented with testing resources. The routing network is linked via virtual wire hardware modules to reduce the area occupied by connecting buses. The number of switches within the routing matrices is also minimised, which increases throughput without sacrificing flexibility. The testing algorithm was developed to systematically search for faults in the processing modules and the flexible high-speed routing network within the architecture. The testing algorithm starts by scanning the externally addressable memory space and testing the master controller. The controller then tests every switch in the route-through switch matrix by making loops from the shared memory to each of the switches. The local switch matrix is also tested in the same way. Next the local memory is scanned. Finally, pre-defined test vectors are loaded into local memory to check the processing modules. This algorithm scans all possible paths within the interconnection network exhaustively and reports all faults. Strategies can be inserted to bypass minor faults

Modelling continuum mechanics phenomena using three dimensional unstructured meshes on massively parallel processors

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The difficulties encountered in implementing large scale CM codes on multiprocessor systems are now fairly well understood. Despite the claims of shared memory architecture manufacturers to provide effective parallelizing compilers, these have not proved to be adequate for large or complex programs. Significant programmer effort is usually required to achieve reasonable parallel efficiencies on significant numbers of processors. The paradigm of Single Program Multi Data (SPMD) domain decomposition with message passing, where each processor runs the same code on a subdomain of the problem, communicating through exchange of messages, has for some time been demonstrated to provide the required level of efficiency, scalability, and portability across both shared and distributed memory systems, without the need to re-author the code into a new language or even to support differing message passing implementations. Extension of the methods into three dimensions has been enabled through the engineering of PHYSICA, a framework for supporting 3D, unstructured mesh and continuum mechanics modeling. In PHYSICA, six inspectors are used. Part of the challenge for automation of parallelization is being able to prove the equivalence of inspectors so that they can be merged into as few as possible.

The stochastic geometric machine model

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper introduces the stochastic version of the Geometric Machine Model for the modelling of sequential, alternative, parallel (synchronous) and nondeterministic computations with stochastic numbers stored in a (possibly infinite) shared memory. The programming language L(D! 1), induced by the Coherence Space of Processes D! 1, can be applied to sequential and parallel products in order to provide recursive definitions for such processes, together with a domain-theoretic semantics of the Stochastic Arithmetic. We analyze both the spacial (ordinal) recursion, related to spacial modelling of the stochastic memory, and the temporal (structural) recursion, given by the inclusion relation modelling partial objects in the ordered structure of process construction.

GPRM: a high performance programming framework for manycore processors

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Processors with large numbers of cores are becoming commonplace. In order to utilise the available resources in such systems, the programming paradigm has to move towards increased parallelism. However, increased parallelism does not necessarily lead to better performance. Parallel programming models have to provide not only flexible ways of defining parallel tasks, but also efficient methods to manage the created tasks. Moreover, in a general-purpose system, applications residing in the system compete for the shared resources. Thread and task scheduling in such a multiprogrammed multithreaded environment is a significant challenge. In this thesis, we introduce a new task-based parallel reduction model, called the Glasgow Parallel Reduction Machine (GPRM). Our main objective is to provide high performance while maintaining ease of programming. GPRM supports native parallelism; it provides a modular way of expressing parallel tasks and the communication patterns between them. Compiling a GPRM program results in an Intermediate Representation (IR) containing useful information about tasks, their dependencies, as well as the initial mapping information. This compile-time information helps reduce the overhead of runtime task scheduling and is key to high performance. Generally speaking, the granularity and the number of tasks are major factors in achieving high performance. These factors are even more important in the case of GPRM, as it is highly dependent on tasks, rather than threads. We use three basic benchmarks to provide a detailed comparison of GPRM with Intel OpenMP, Cilk Plus, and Threading Building Blocks (TBB) on the Intel Xeon Phi, and with GNU OpenMP on the Tilera TILEPro64. GPRM shows superior performance in almost all cases, only by controlling the number of tasks. GPRM also provides a low-overhead mechanism, called “Global Sharing”, which improves performance in multiprogramming situations. We use OpenMP, as the most popular model for shared-memory parallel programming as the main GPRM competitor for solving three well-known problems on both platforms: LU factorisation of Sparse Matrices, Image Convolution, and Linked List Processing. We focus on proposing solutions that best fit into the GPRM’s model of execution. GPRM outperforms OpenMP in all cases on the TILEPro64. On the Xeon Phi, our solution for the LU Factorisation results in notable performance improvement for sparse matrices with large numbers of small blocks. We investigate the overhead of GPRM’s task creation and distribution for very short computations using the Image Convolution benchmark. We show that this overhead can be mitigated by combining smaller tasks into larger ones. As a result, GPRM can outperform OpenMP for convolving large 2D matrices on the Xeon Phi. Finally, we demonstrate that our parallel worksharing construct provides an efficient solution for Linked List processing and performs better than OpenMP implementations on the Xeon Phi. The results are very promising, as they verify that our parallel programming framework for manycore processors is flexible and scalable, and can provide high performance without sacrificing productivity.

Guiding Monte Carlo tree searches with neural networks in the game of go

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A dissertation submitted in fulfillment of the requirements to the degree of Master in Computer Science and Computer Engineering

A parallel numerical solver using hierarchically tiled arrays

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Solving linear systems is an important problem for scientific computing. Exploiting parallelism is essential for solving complex systems, and this traditionally involves writing parallel algorithms on top of a library such as MPI. The SPIKE family of algorithms is one well-known example of a parallel solver for linear systems. The Hierarchically Tiled Array data type extends traditional data-parallel array operations with explicit tiling and allows programmers to directly manipulate tiles. The tiles of the HTA data type map naturally to the block nature of many numeric computations, including the SPIKE family of algorithms. The higher level of abstraction of the HTA enables the same program to be portable across different platforms. Current implementations target both shared-memory and distributed-memory models. In this thesis we present a proof-of-concept for portable linear solvers. We implement two algorithms from the SPIKE family using the HTA library. We show that our implementations of SPIKE exploit the abstractions provided by the HTA to produce a compact, clean code that can run on both shared-memory and distributed-memory models without modification. We discuss how we map the algorithms to HTA programs as well as examine their performance. We compare the performance of our HTA codes to comparable codes written in MPI as well as current state-of-the-art linear algebra routines.

Is There a Shared European memory? Holocaust Remembrance in the European Parliament after 1989

Relevância:

40.00% 40.00%

Publicador:

3 Vorträge: 1. Recent trends and approaches in church history; 2. The Ruins of Port-Royal: an injured memory transfigured; 3. Shared authority. Ministry and Leadership in the Church

Relevância:

40.00% 40.00%

Publicador:

The association between auditory memory span and speech rate in children from kindergarten to sixth grade

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study reexamined the association between speech rate and memory span in children from kindergarten to sixth grade (N = 152) in order to potentially account for the inconsistencies within the published literature on this topic. Some of the inconsistencies in past research may reflect the different methods adopted in assessing speech rate. In particular, repeating word triples may itself involve memory demands, contaminating the correlation between speech rate and memory span in younger children. Analyses using composite speech rate and memory span measures showed that speech rate for word triples shared variance with memory span that was independent of speech rate for single words. Moreover, speech rate for word triples was largely redundant with age in explaining additional variation in memory span once the effects of speech rate for single words were controlled. (C) 2002 Elsevier Science.

«
1
2
3
4
5
6
7
8
...
60
61
»