20 resultados para distributed shared memory
em Universidad Politécnica de Madrid
Resumo:
The goal of the RAP-WAM AND-parallel Prolog abstract architecture is to provide inference speeds significantly beyond those of sequential systems, while supporting Prolog semantics and preserving sequential performance and storage efficiency. This paper presents simulation results supporting these claims with special emphasis on memory performance on a two-level sharedmemory multiprocessor organization. Several solutions to the cache coherency problem are analyzed. It is shown that RAP-WAM offers good locality and storage efficiency and that it can effectively take advantage of broadcast caches. It is argued that speeds in excess of 2 ML IPS on real applications exhibiting medium parallelism can be attained with current technology.
Resumo:
We present an overview of the stack-based memory management techniques that we used in our non-deterministic and-parallel Prolog systems: &-Prolog and DASWAM. We believe that the problems associated with non-deterministic and-parallel systems are more general than those encountered in or-parallel and deterministic and-parallel systems, which can be seen as subsets of this more general case. We develop on the previously proposed "marker scheme", lifting some of the restrictions associated with the selection of goals while keeping (virtual) memory consumption down. We also review some of the other problems associated with the stack-based management scheme, such as handling of forward and backward execution, cut, and roll-backs.
Resumo:
Incorporating the possibility of attaching attributes to variables in a logic programming system has been shown to allow the addition of general constraint solving capabilities to it. This approach is very attractive in that by adding a few primitives any logic programming system can be turned into a generic constraint logic programming system in which constraint solving can be user deñned, and at source level - an extreme example of the "glass box" approach. In this paper we propose a different and novel use for the concept of attributed variables: developing a generic parallel/concurrent (constraint) logic programming system, using the same "glass box" flavor. We argüe that a system which implements attributed variables and a few additional primitives can be easily customized at source level to implement many of the languages and execution models of parallelism and concurrency currently proposed, in both shared memory and distributed systems. We illustrate this through examples and report on an implementation of our ideas.
Resumo:
An approximate analytic model of a shared memory multiprocessor with a Cache Only Memory Architecture (COMA), the busbased Data Difussion Machine (DDM), is presented and validated. It describes the timing and interference in the system as a function of the hardware, the protocols, the topology and the workload. Model results have been compared to results from an independent simulator. The comparison shows good model accuracy specially for non-saturated systems, where the errors in response times and device utilizations are independent of the number of processors and remain below 10% in 90% of the simulations. Therefore, the model can be used as an average performance prediction tool that avoids expensive simulations in the design of systems with many processors.
Resumo:
Incorporating the possibility of attaching attributes to variables in a logic programming system has been shown to allow the addition of general constraint solving capabilities to it. This approach is very attractive in that by adding a few primitives any logic programming system can be turned into a generic constraint logic programming system in which constraint solving can be user defined, and at source level - an extreme example of the "glass box" approach. In this paper we propose a different and novel use for the concept of attributed variables: developing a generic parallel/concurrent (constraint) logic programming system, using the same "glass box" flavor. We argüe that a system which implements attributed variables and a few additional primitives can be easily customized at source level to implement many of the languages and execution models of parallelism and concurrency currently proposed, in both shared memory and distributed systems. We illustrate this through examples.
Resumo:
In this paper, we examine the issue of memory management in the parallel execution of logic programs. We concentrate on non-deterministic and-parallel schemes which we believe present a relatively general set of problems to be solved, including most of those encountered in the memory management of or-parallel systems. We present a distributed stack memory management model which allows flexible scheduling of goals. Previously proposed models (based on the "Marker model") are lacking in that they impose restrictions on the selection of goals to be executed or they may require consume a large amount of virtual memory. This paper first presents results which imply that the above mentioned shortcomings can have significant performance impacts. An extension of the Marker Model is then proposed which allows flexible scheduling of goals while keeping (virtual) memory consumption down. Measurements are presented which show the advantage of this solution. Methods for handling forward and backward execution, cut and roll back are discussed in the context of the proposed scheme. In addition, the paper shows how the same mechanism for flexible scheduling can be applied to allow the efficient handling of the very general form of suspension that can occur in systems which combine several types of and-parallelism and more sophisticated methods of executing logic programs. We believe that the results are applicable to many and- and or-parallel systems.
Resumo:
Since the early days of logic programming, researchers in the field realized the potential for exploitation of parallelism present in the execution of logic programs. Their high-level nature, the presence of nondeterminism, and their referential transparency, among other characteristics, make logic programs interesting candidates for obtaining speedups through parallel execution. At the same time, the fact that the typical applications of logic programming frequently involve irregular computations, make heavy use of dynamic data structures with logical variables, and involve search and speculation, makes the techniques used in the corresponding parallelizing compilers and run-time systems potentially interesting even outside the field. The objective of this article is to provide a comprehensive survey of the issues arising in parallel execution of logic programming languages along with the most relevant approaches explored to date in the field. Focus is mostly given to the challenges emerging from the parallel execution of Prolog programs. The article describes the major techniques used for shared memory implementation of Or-parallelism, And-parallelism, and combinations of the two. We also explore some related issues, such as memory management, compile-time analysis, and execution visualization.
Resumo:
The &-Prolog system, a practical implementation of a parallel execution niodel for Prolog exploiting strict and non-strict independent and-parallelism, is described. Both automatic and manual parallelization of programs is supported. This description includes a summary of the system's language and architecture, some details of its execution model (based on the RAP-WAM model), and data on its performance on sequential workstations and shared memory multiprocessors, which is compared to that of current Prolog systems. The results to date show significant speed advantages over state-of-the-art sequential systems.
Resumo:
In recent years a lot of research has been invested in parallel processing of numerical applications. However, parallel processing of Symbolic and AI applications has received less attention. This paper presents a system for parallel symbolic computitig, narned ACE, based on the logic programming paradigm. ACE is a computational model for the full Prolog language, capable of exploiting Or-parall< lism and Independent And-parallelism. In this paper vve focus on the implementation of the and-parallel part of the ACE system (ralled &ACE) on a shared memory multiprocessor, d< scribing its organization, some optimizations, and presenting some performance figures, proving the abilhy of &ACE to efficiently exploit parallelism.
Resumo:
We present a parallel graph narrowing machine, which is used to implement a functional logic language on a shared memory multiprocessor. It is an extensión of an abstract machine for a purely functional language. The result is a programmed graph reduction machine which integrates the mechanisms of unification, backtracking, and independent and-parallelism. In the machine, the subexpressions of an expression can run in parallel. In the case of backtracking, the structure of an expression is used to avoid the reevaluation of subexpressions as far as possible. Deterministic computations are detected. Their results are maintained and need not be reevaluated after backtracking.
Resumo:
An Independent And-Parallel Prolog model and implementation, &-Prolog, are described. The description includes a summary of the system's architecture, some details of its execution model (based on the RAP-WAM model), and most importantly, its performance on sequential workstations and shared memory multiprocessors as compared with state-of-the-art Prolog systems. Speedup curves are provided for a collection of benchmark programs which demónstrate significant speed advantages over state-of the art sequential systems.
Resumo:
En el presente documento se hablará acerca del desarrollo de un proyecto para la mejora de un programa de análisis de señales; con ese fin, se hará uso de técnicas de optimización del software y de tecnologías de aceleración, mediante el aprovechamiento del paralelismo del programa. Además se hará un análisis de acerca del uso de dos tecnologías basadas en diferentes paradigmas de programación paralela; una mediante múltiples hilos con memoria compartida y la otra mediante el uso de GPUs como dispositivos de coprocesamiento. This paper will talk about the development of a Project to improve a program that does signals analysis; to that end, it will make use of software optimization techniques and acceleration technologies by exploiting parallelism in the program. In Addition will be done an analysis on the use of two technologies based on two different paradigms; one using multiple threads with shared memory and the other using GPU as co-processing devices.
Resumo:
The commonly accepted approach to specifying libraries of concurrent algorithms is a library abstraction. Its idea is to relate a library to another one that abstracts away from details of its implementation and is simpler to reason about. A library abstraction relation has to validate the Abstraction Theorem: while proving a property of the client of the concurrent library, the library can be soundly replaced with its abstract implementation. Typically a library abstraction relation, such as linearizability, assumes a complete information hiding between a library and its client, which disallows them to communicate by means of shared memory. However, such way of communication may be used in a program, and correctness of interactions on a shared memory depends on the implicit contract between the library and the client. In this work we approach library abstraction without any assumptions about information hiding. To be able to formulate the contract between components of the program, we augment machine states of the program with two abstract states, views, of the client and the library. It enables formalising the contract with the internal safety, which requires components to preserve each other's views whenever their command is executed. We define the library a a correspondence between possible uses of a concrete and an abstract library. For our library abstraction relation and traces of a program, components of which follow their contract, we prove an Abstraction Theorem. RESUMEN. La técnica más aceptada actualmente para la especificación de librerías de algoritmos concurrentes es la abstracción de librerías (library abstraction). La idea subyacente es relacionar la librería original con otra que abstrae los detalles de implementación y conóon que describa dicha abstracción de librerías debe validar el Teorema de Abstracción: durante la prueba de la validez de una propiedad del cliente de la librería concurrente, el reemplazo de esta última por su implementación abstracta es lógicamente correcto. Usualmente, una relación de abstracción de librerías como la linearizabilidad (linearizability), tiene como premisa el ocultamiento de información entre el cliente y la librería (information hiding), es decir, que no se les permite comunicarse mediante la memoria compartida. Sin embargo, dicha comunicación ocurre en la práctica y la correctitud de estas interacciones en una memoria compartida depende de un contrato implícito entre la librería y el cliente. En este trabajo, se propone un nueva definición del concepto de abtracción de librerías que no presupone un ocultamiento de información entre la librería y el cliente. Con el fin de establecer un contrato entre diferentes componentes de un programa, extendemos la máquina de estados subyacente con dos estados abstractos que representan las vistas del cliente y la librería. Esto permite la formalización de la propiedad de seguridad interna (internal safety), que requiere que cada componente preserva la vista del otro durante la ejecuci on de un comando. Consecuentemente, se define la relación de abstracción de librerías mediante una correspondencia entre los usos posibles de una librería abstracta y una concreta. Finalmente, se prueba el Teorema de Abstracción para la relación de abstracción de librerías propuesta, para cualquier traza de un programa y cualquier componente que satisface los contratos apropiados.
Resumo:
Distributed real-time embedded systems are becoming increasingly important to society. More demands will be made on them and greater reliance will be placed on the delivery of their services. A relevant subset of them is high-integrity or hard real-time systems, where failure can cause loss of life, environmental harm, or significant financial loss. Additionally, the evolution of communication networks and paradigms as well as the necessity of demanding processing power and fault tolerance, motivated the interconnection between electronic devices; many of the communications have the possibility of transferring data at a high speed. The concept of distributed systems emerged as systems where different parts are executed on several nodes that interact with each other via a communication network. Java’s popularity, facilities and platform independence have made it an interesting language for the real-time and embedded community. This was the motivation for the development of RTSJ (Real-Time Specification for Java), which is a language extension intended to allow the development of real-time systems. The use of Java in the development of high-integrity systems requires strict development and testing techniques. However, RTJS includes a number of language features that are forbidden in such systems. In the context of the HIJA project, the HRTJ (Hard Real-Time Java) profile was developed to define a robust subset of the language that is amenable to static analysis for high-integrity system certification. Currently, a specification under the Java community process (JSR- 302) is being developed. Its purpose is to define those capabilities needed to create safety critical applications with Java technology called Safety Critical Java (SCJ). However, neither RTSJ nor its profiles provide facilities to develop distributed realtime applications. This is an important issue, as most of the current and future systems will be distributed. The Distributed RTSJ (DRTSJ) Expert Group was created under the Java community process (JSR-50) in order to define appropriate abstractions to overcome this problem. Currently there is no formal specification. The aim of this thesis is to develop a communication middleware that is suitable for the development of distributed hard real-time systems in Java, based on the integration between the RMI (Remote Method Invocation) model and the HRTJ profile. It has been designed and implemented keeping in mind the main requirements such as the predictability and reliability in the timing behavior and the resource usage. iThe design starts with the definition of a computational model which identifies among other things: the communication model, most appropriate underlying network protocols, the analysis model, and a subset of Java for hard real-time systems. In the design, the remote references are the basic means for building distributed applications which are associated with all non-functional parameters and resources needed to implement synchronous or asynchronous remote invocations with real-time attributes. The proposed middleware separates the resource allocation from the execution itself by defining two phases and a specific threading mechanism that guarantees a suitable timing behavior. It also includes mechanisms to monitor the functional and the timing behavior. It provides independence from network protocol defining a network interface and modules. The JRMP protocol was modified to include two phases, non-functional parameters, and message size optimizations. Although serialization is one of the fundamental operations to ensure proper data transmission, current implementations are not suitable for hard real-time systems and there are no alternatives. This thesis proposes a predictable serialization that introduces a new compiler to generate optimized code according to the computational model. The proposed solution has the advantage of allowing us to schedule the communications and to adjust the memory usage at compilation time. In order to validate the design and the implementation a demanding validation process was carried out with emphasis in the functional behavior, the memory usage, the processor usage (the end-to-end response time and the response time in each functional block) and the network usage (real consumption according to the calculated consumption). The results obtained in an industrial application developed by Thales Avionics (a Flight Management System) and in exhaustive tests show that the design and the prototype are reliable for industrial applications with strict timing requirements. Los sistemas empotrados y distribuidos de tiempo real son cada vez más importantes para la sociedad. Su demanda aumenta y cada vez más dependemos de los servicios que proporcionan. Los sistemas de alta integridad constituyen un subconjunto de gran importancia. Se caracterizan por que un fallo en su funcionamiento puede causar pérdida de vidas humanas, daños en el medio ambiente o cuantiosas pérdidas económicas. La necesidad de satisfacer requisitos temporales estrictos, hace más complejo su desarrollo. Mientras que los sistemas empotrados se sigan expandiendo en nuestra sociedad, es necesario garantizar un coste de desarrollo ajustado mediante el uso técnicas adecuadas en su diseño, mantenimiento y certificación. En concreto, se requiere una tecnología flexible e independiente del hardware. La evolución de las redes y paradigmas de comunicación, así como la necesidad de mayor potencia de cómputo y de tolerancia a fallos, ha motivado la interconexión de dispositivos electrónicos. Los mecanismos de comunicación permiten la transferencia de datos con alta velocidad de transmisión. En este contexto, el concepto de sistema distribuido ha emergido como sistemas donde sus componentes se ejecutan en varios nodos en paralelo y que interactúan entre ellos mediante redes de comunicaciones. Un concepto interesante son los sistemas de tiempo real neutrales respecto a la plataforma de ejecución. Se caracterizan por la falta de conocimiento de esta plataforma durante su diseño. Esta propiedad es relevante, por que conviene que se ejecuten en la mayor variedad de arquitecturas, tienen una vida media mayor de diez anos y el lugar ˜ donde se ejecutan puede variar. El lenguaje de programación Java es una buena base para el desarrollo de este tipo de sistemas. Por este motivo se ha creado RTSJ (Real-Time Specification for Java), que es una extensión del lenguaje para permitir el desarrollo de sistemas de tiempo real. Sin embargo, RTSJ no proporciona facilidades para el desarrollo de aplicaciones distribuidas de tiempo real. Es una limitación importante dado que la mayoría de los actuales y futuros sistemas serán distribuidos. El grupo DRTSJ (DistributedRTSJ) fue creado bajo el proceso de la comunidad de Java (JSR-50) con el fin de definir las abstracciones que aborden dicha limitación, pero en la actualidad aun no existe una especificacion formal. El objetivo de esta tesis es desarrollar un middleware de comunicaciones para el desarrollo de sistemas distribuidos de tiempo real en Java, basado en la integración entre el modelo de RMI (Remote Method Invocation) y el perfil HRTJ. Ha sido diseñado e implementado teniendo en cuenta los requisitos principales, como la predecibilidad y la confiabilidad del comportamiento temporal y el uso de recursos. El diseño parte de la definición de un modelo computacional el cual identifica entre otras cosas: el modelo de comunicaciones, los protocolos de red subyacentes más adecuados, el modelo de análisis, y un subconjunto de Java para sistemas de tiempo real crítico. En el diseño, las referencias remotas son el medio básico para construcción de aplicaciones distribuidas las cuales son asociadas a todos los parámetros no funcionales y los recursos necesarios para la ejecución de invocaciones remotas síncronas o asíncronas con atributos de tiempo real. El middleware propuesto separa la asignación de recursos de la propia ejecución definiendo dos fases y un mecanismo de hebras especifico que garantiza un comportamiento temporal adecuado. Además se ha incluido mecanismos para supervisar el comportamiento funcional y temporal. Se ha buscado independencia del protocolo de red definiendo una interfaz de red y módulos específicos. También se ha modificado el protocolo JRMP para incluir diferentes fases, parámetros no funcionales y optimizaciones de los tamaños de los mensajes. Aunque la serialización es una de las operaciones fundamentales para asegurar la adecuada transmisión de datos, las actuales implementaciones no son adecuadas para sistemas críticos y no hay alternativas. Este trabajo propone una serialización predecible que ha implicado el desarrollo de un nuevo compilador para la generación de código optimizado acorde al modelo computacional. La solución propuesta tiene la ventaja que en tiempo de compilación nos permite planificar las comunicaciones y ajustar el uso de memoria. Con el objetivo de validar el diseño e implementación se ha llevado a cabo un exigente proceso de validación con énfasis en: el comportamiento funcional, el uso de memoria, el uso del procesador (tiempo de respuesta de extremo a extremo y en cada uno de los bloques funcionales) y el uso de la red (consumo real conforme al estimado). Los buenos resultados obtenidos en una aplicación industrial desarrollada por Thales Avionics (un sistema de gestión de vuelo) y en las pruebas exhaustivas han demostrado que el diseño y el prototipo son fiables para aplicaciones industriales con estrictos requisitos temporales.
Resumo:
Accumulating evidence suggests a role for the medial temporal lobe (MTL) in working memory (WM). However, little is known concerning its functional interactions with other cortical regions in the distributed neural network subserving WM. To reveal these, we availed of subjects with MTL damage and characterized changes in effective connectivity while subjects engaged in WM task. Specifically, we compared dynamic causal models, extracted from magnetoencephalographic recordings during verbal WM encoding, in temporal lobe epilepsy patients (with left hippocampal sclerosis) and controls. Bayesian model comparison indicated that the best model (across subjects) evidenced bilateral, forward, and backward connections, coupling inferior temporal cortex (ITC), inferior frontal cortex (IFC), and MTL. MTL damage weakened backward connections from left MTL to left ITC, a decrease accompanied by strengthening of (bidirectional) connections between IFC and MTL in the contralesional hemisphere. These findings provide novel evidence concerning functional interactions between nodes of this fundamental cognitive network and sheds light on how these interactions are modified as a result of focal damage to MTL. The findings highlight that a reduced (top-down) influence of the MTL on ipsilateral language regions is accompanied by enhanced reciprocal coupling in the undamaged hemisphere providing a first demonstration of “connectional diaschisis.”