8 resultados para Parallel Architectures

em Universidad Politécnica de Madrid


Relevância:

70.00% 70.00%

Publicador:

Resumo:

The term "Logic Programming" refers to a variety of computer languages and execution models which are based on the traditional concept of Symbolic Logic. The expressive power of these languages offers promise to be of great assistance in facing the programming challenges of present and future symbolic processing applications in Artificial Intelligence, Knowledge-based systems, and many other areas of computing. The sequential execution speed of logic programs has been greatly improved since the advent of the first interpreters. However, higher inference speeds are still required in order to meet the demands of applications such as those contemplated for next generation computer systems. The execution of logic programs in parallel is currently considered a promising strategy for attaining such inference speeds. Logic Programming in turn appears as a suitable programming paradigm for parallel architectures because of the many opportunities for parallel execution present in the implementation of logic programs. This dissertation presents an efficient parallel execution model for logic programs. The model is described from the source language level down to an "Abstract Machine" level suitable for direct implementation on existing parallel systems or for the design of special purpose parallel architectures. Few assumptions are made at the source language level and therefore the techniques developed and the general Abstract Machine design are applicable to a variety of logic (and also functional) languages. These techniques offer efficient solutions to several areas of parallel Logic Programming implementation previously considered problematic or a source of considerable overhead, such as the detection and handling of variable binding conflicts in AND-Parallelism, the specification of control and management of the execution tree, the treatment of distributed backtracking, and goal scheduling and memory management issues, etc. A parallel Abstract Machine design is offered, specifying data areas, operation, and a suitable instruction set. This design is based on extending to a parallel environment the techniques introduced by the Warren Abstract Machine, which have already made very fast and space efficient sequential systems a reality. Therefore, the model herein presented is capable of retaining sequential execution speed similar to that of high performance sequential systems, while extracting additional gains in speed by efficiently implementing parallel execution. These claims are supported by simulations of the Abstract Machine on sample programs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An approximate analytic model of a shared memory multiprocessor with a Cache Only Memory Architecture (COMA), the busbased Data Difussion Machine (DDM), is presented and validated. It describes the timing and interference in the system as a function of the hardware, the protocols, the topology and the workload. Model results have been compared to results from an independent simulator. The comparison shows good model accuracy specially for non-saturated systems, where the errors in response times and device utilizations are independent of the number of processors and remain below 10% in 90% of the simulations. Therefore, the model can be used as an average performance prediction tool that avoids expensive simulations in the design of systems with many processors.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

La motivación de esta tesis es el desarrollo de una herramienta de optimización automática para la mejora del rendimiento de formas aerodinámicas enfocado en la industria aeronáutica. Este trabajo cubre varios aspectos esenciales, desde el empleo de Non-Uniform Rational B-Splines (NURBS), al cálculo de gradientes utilizando la metodología del adjunto continuo, el uso de b-splines volumétricas como parámetros de diseño, el tratamiento de la malla en las intersecciones, y no menos importante, la adaptación de los algoritmos de la dinámica de fluidos computacional (CFD) en arquitecturas hardware de alto paralelismo, como las tarjetas gráficas, para acelerar el proceso de optimización. La metodología adjunta ha posibilitado que los métodos de optimización basados en gradientes sean una alternativa prometedora para la mejora de la eficiencia aerodinámica de los aviones. La formulación del adjunto permite calcular los gradientes de una función de coste, como la resistencia aerodinámica o la sustentación, independientemente del número de variables de diseño, a un coste computacional equivalente a una simulación CFD. Sin embargo, existen problemas prácticos que han imposibilitado su aplicación en la industria, que se pueden resumir en: integrabilidad, rendimiento computacional y robustez de la solución adjunta. Este trabajo aborda estas contrariedades y las analiza en casos prácticos. Como resumen, las contribuciones de esta tesis son: • El uso de NURBS como variables de diseño en un bucle de automático de optimización, aplicado a la mejora del rendimiento aerodinámico de alas en régimen transónico. • El desarrollo de algoritmos de inversión de punto, para calcular las coordenadas paramétricas de las coordenadas espaciales, para ligar los vértices de malla a las NURBS. • El uso y validación de la formulación adjunta para el calculo de los gradientes, a partir de las sensibilidades de la solución adjunta, comparado con diferencias finitas. • Se ofrece una estrategia para utilizar la geometría CAD, en forma de parches NURBS, para tratar las intersecciones, como el ala-fuselaje. • No existen muchas alternativas de librerías NURBS viables. En este trabajo se ha desarrollado una librería, DOMINO NURBS, y se ofrece a la comunidad como código libre y abierto. • También se ha implementado un código CFD en tarjeta gráfica, para realizar una valoración de cómo se puede adaptar un código sobre malla no estructurada a arquitecturas paralelas. • Finalmente, se propone una metodología, basada en la función de Green, como una forma eficiente de paralelizar simulaciones numéricas. Esta tesis ha sido apoyada por las actividades realizadas por el Área de Dinámica da Fluidos del Instituto Nacional de Técnica Aeroespacial (INTA), a través de numerosos proyectos de financiación nacional: DOMINO, SIMUMAT, y CORESFMULAERO. También ha estado en consonancia con las actividades realizadas por el departamento de Métodos y Herramientas de Airbus España y con el grupo Investigación y Tecnología Aeronáutica Europeo (GARTEUR), AG/52. ABSTRACT The motivation of this work is the development of an automatic optimization strategy for large scale shape optimization problems that arise in the aeronautics industry to improve the aerodynamic performance; covering several aspects from the use of Non-Uniform Rational B-Splines (NURBS), the calculation of the gradients with the continuous adjoint formulation, the development of volumetric b-splines parameterization, mesh adaptation and intersection handling, to the adaptation of Computational Fluid Dynamics (CFD) algorithms to take advantage of highly parallel architectures in order to speed up the optimization process. With the development of the adjoint formulation, gradient-based methods for aerodynamic optimization become a promising approach to improve the aerodynamic performance of aircraft designs. The adjoint methodology allows the evaluation the gradients to all design variables of a cost function, such as drag or lift, at the equivalent cost of more or less one CFD simulation. However, some practical problems have been delaying its full implementation to the industry, which can be summarized as: integrability, computer performance, and adjoint robustness. This work tackles some of these issues and analyse them in well-known test cases. As summary, the contributions comprises: • The employment of NURBS as design variables in an automatic optimization loop for the improvement of the aerodynamic performance of aircraft wings in transonic regimen. • The development of point inversion algorithms to calculate the NURBS parametric coordinates from the space coordinates, to link with the computational grid vertex. • The use and validation of the adjoint formulation to calculate the gradients from the surface sensitivities in an automatic optimization loop and evaluate its reliability, compared with finite differences. • This work proposes some algorithms that take advantage of the underlying CAD geometry description, in the form of NURBS patches, to handle intersections and mesh adaptations. • There are not many usable libraries for NURBS available. In this work an open source library DOMINO NURBS has been developed and is offered to the community as free, open source code. • The implementation of a transonic CFD solver from scratch in a graphic card, for an assessment of the implementability of conventional CFD solvers for unstructured grids to highly parallel architectures. • Finally, this research proposes the use of the Green's function as an efficient paralellization scheme of numerical solvers. The presented work has been supported by the activities carried out at the Fluid Dynamics branch of the National Institute for Aerospace Technology (INTA) through national founding research projects: DOMINO, SIMUMAT, and CORESIMULAERO; in line with the activities carried out by the Methods and Tools and Flight Physics department at Airbus and the Group for Aeronautical Research and Technology in Europe (GARTEUR) action group AG/52.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The goal of the RAP-WAM AND-parallel Prolog abstract architecture is to provide inference speeds significantly beyond those of sequential systems, while supporting Prolog semantics and preserving sequential performance and storage efficiency. This paper presents simulation results supporting these claims with special emphasis on memory performance on a two-level sharedmemory multiprocessor organization. Several solutions to the cache coherency problem are analyzed. It is shown that RAP-WAM offers good locality and storage efficiency and that it can effectively take advantage of broadcast caches. It is argued that speeds in excess of 2 ML IPS on real applications exhibiting medium parallelism can be attained with current technology.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We show a method for parallelizing top down dynamic programs in a straightforward way by a careful choice of a lock-free shared hash table implementation and randomization of the order in which the dynamic program computes its subproblems. This generic approach is applied to dynamic programs for knapsack, shortest paths, and RNA structure alignment, as well as to a state-of-the-art solution for minimizing the máximum number of open stacks. Experimental results are provided on three different modern multicore architectures which show that this parallelization is effective and reasonably scalable. In particular, we obtain over 10 times speedup for 32 threads on the open stacks problem.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Membrane systems are computational equivalent to Turing machines. However, their distributed and massively parallel nature obtains polynomial solutions opposite to traditional non-polynomial ones. At this point, it is very important to develop dedicated hardware and software implementations exploiting those two membrane systems features. Dealing with distributed implementations of P systems, the bottleneck communication problem has arisen. When the number of membranes grows up, the network gets congested. The purpose of distributed architectures is to reach a compromise between the massively parallel character of the system and the needed evolution step time to transit from one configuration of the system to the next one, solving the bottleneck communication problem. The goal of this paper is twofold. Firstly, to survey in a systematic and uniform way the main results regarding the way membranes can be placed on processors in order to get a software/hardware simulation of P-Systems in a distributed environment. Secondly, we improve some results about the membrane dissolution problem, prove that it is connected, and discuss the possibility of simulating this property in the distributed model. All this yields an improvement in the system parallelism implementation since it gets an increment of the parallelism of the external communication among processors. Proposed ideas improve previous architectures to tackle the communication bottleneck problem, such as reduction of the total time of an evolution step, increase of the number of membranes that could run on a processor and reduction of the number of processors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The appearance of radix-$2^{2}$ was a milestone in the design of pipelined FFT hardware architectures. Later, radix-$2^{2}$ was extended to radix-$2^{k}$ . However, radix-$2^{k}$ was only proposed for single-path delay feedback (SDF) architectures, but not for feedforward ones, also called multi-path delay commutator (MDC). This paper presents the radix-$2^{k}$ feedforward (MDC) FFT architectures. In feedforward architectures radix-$2^{k}$ can be used for any number of parallel samples which is a power of two. Furthermore, both decimation in frequency (DIF) and decimation in time (DIT) decompositions can be used. In addition to this, the designs can achieve very high throughputs, which makes them suitable for the most demanding applications. Indeed, the proposed radix-$2^{k}$ feedforward architectures require fewer hardware resources than parallel feedback ones, also called multi-path delay feedback (MDF), when several samples in parallel must be processed. As a result, the proposed radix-$2^{k}$ feedforward architectures not only offer an attractive solution for current applications, but also open up a new research line on feedforward structures.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a theoretical analysis and an optimization method for envelope amplifier. Highly efficient envelope amplifiers based on a switching converter in parallel or series with a linear regulator have been analyzed and optimized. The results of the optimization process have been shown and these two architectures are compared regarding their complexity and efficiency. The optimization method that is proposed is based on the previous knowledge about the transmitted signal type (OFDM, WCDMA...) and it can be applied to any signal type as long as the envelope probability distribution is known. Finally, it is shown that the analyzed architectures have an inherent efficiency limit.