34 resultados para parallel implementation
em Universidad Politécnica de Madrid
Resumo:
Zernike polynomials are a well known set of functions that find many applications in image or pattern characterization because they allow to construct shape descriptors that are invariant against translations, rotations or scale changes. The concepts behind them can be extended to higher dimension spaces, making them also fit to describe volumetric data. They have been less used than their properties might suggest due to their high computational cost. We present a parallel implementation of 3D Zernike moments analysis, written in C with CUDA extensions, which makes it practical to employ Zernike descriptors in interactive applications, yielding a performance of several frames per second in voxel datasets about 2003 in size. In our contribution, we describe the challenges of implementing 3D Zernike analysis in a general-purpose GPU. These include how to deal with numerical inaccuracies, due to the high precision demands of the algorithm, or how to deal with the high volume of input data so that it does not become a bottleneck for the system.
Resumo:
We present a parallel graph narrowing machine, which is used to implement a functional logic language on a shared memory multiprocessor. It is an extensión of an abstract machine for a purely functional language. The result is a programmed graph reduction machine which integrates the mechanisms of unification, backtracking, and independent and-parallelism. In the machine, the subexpressions of an expression can run in parallel. In the case of backtracking, the structure of an expression is used to avoid the reevaluation of subexpressions as far as possible. Deterministic computations are detected. Their results are maintained and need not be reevaluated after backtracking.
Resumo:
In this paper we present a novel execution model for parallel implementation of logic programs which is capable of exploiting both independent and-parallelism and or-parallelism in an efficient way. This model extends the stack copying approach, which has been successfully applied in the Muse system to implement or-parallelism, by integrating it with proven techniques used to support independent and-parallelism. We show how all solutions to non-deterministic andparallel goals are found without repetitions. This is done through recomputation as in Prolog (and in various and-parallel systems, like &-Prolog and DDAS), i.e., solutions of and-parallel goals are not shared. We propose a scheme for the efficient management of the address space in a way that is compatible with the apparently incompatible requirements of both and- and or-parallelism. We also show how the full Prolog language, with all its extra-logical features, can be supported in our and-or parallel system so that its sequential semantics is preserved. The resulting system retains the advantages of both purely or-parallel systems as well as purely and-parallel systems. The stack copying scheme together with our proposed memory management scheme can also be used to implement models that combine dependent and-parallelism and or-parallelism, such as Andorra and Prometheus.
Resumo:
With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, function of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells has only very recently been proposed (Jerusalem et al., 2013). In this paper, we present the implementation details of Neurite: the finite difference parallel program used in this reference. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to simulate the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The simulation of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the simulated cells grows. The solvers implemented in Neurite-explicit and implicit-were therefore parallelized using graphics processing units in order to reduce the burden of the simulation costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between lectrophysiology and mechanics (Jerusalem et al., 2013). This paper provides the details of the parallel implementation of Neurite, along with three different application examples: a long myelinated axon, a segmented dendritic tree, and a damaged axon. The capabilities of the program to deal with large scale scenarios, segmented neuronal structures, and functional deficits under mechanical loading are specifically highlighted.
Resumo:
In this paper we will see how the efficiency of the MBS simulations can be improved in two different ways, by considering both an explicit and implicit semi-recursive formulation. The explicit method is based on a double velocity transformation that involves the solution of a redundant but compatible system of equations. The high computational cost of this operation has been drastically reduced by taking into account the sparsity pattern of the system. Regarding this, the goal of this method is the introduction of MA48, a high performance mathematical library provided by Harwell Subroutine Library. The second method proposed in this paper has the particularity that, depending on the case, between 70 and 85% of the computation time is devoted to the evaluation of forces derivatives with respect to the relative position and velocity vectors. Keeping in mind that evaluating these derivatives can be decomposed into concurrent tasks, the main goal of this paper lies on a successful and straightforward parallel implementation that have led to a substantial improvement with a speedup of 3.2 by keeping all the cores busy in a quad-core processor and distributing the workload between them, achieving on this way a huge time reduction by doing an ideal CPU usage
Resumo:
Most implementations of parallel logic programming rely on complex low-level machinery which is arguably difflcult to implement and modify. We explore an alternative approach aimed at taming that complexity by raising core parts of the implementation to the source language level for the particular case of and-parallelism. Therefore, we handle a signiflcant portion of the parallel implementation mechanism at the Prolog level with the help of a comparatively small number of concurrency-related primitives which take care of lower-level tasks such as locking, thread management, stack set management, etc. The approach does not eliminate altogether modiflcations to the abstract machine, but it does greatly simplify them and it also facilitates experimenting with different alternatives. We show how this approach allows implementing both restricted and unrestricted (i.e., non fork-join) parallelism. Preliminary experiments show that the amount of performance sacriflced is reasonable, although granularity control is required in some cases. Also, we observe that the availability of unrestricted parallelism contributes to better observed speedups.
Resumo:
This report presents an overview of the current work performed by us in the context of the efficient parallel implementation of traditional logic programming systems. The work is based on the &-Prolog System, a system for the automatic parallelization and execution of logic programming languages within the Independent And-parallelism model, and the global analysis and parallelization tools which have been developed for this system. In order to make the report self-contained, we first describe the "classical" tools of the &-Prolog system. We then explain in detail the work performed in improving and generalizing the global analysis and parallelization tools. Also, we describe the objectives which will drive our future work in this area.
Resumo:
Una amarra electrodinámica (electrodynamic tether) opera sobre principios electromagnéticos intercambiando momento con la magnetosfera planetaria e interactuando con su ionosfera. Es un subsistema pasivo fiable para desorbitar etapas de cohetes agotadas y satélites al final de su misión, mitigando el crecimiento de la basura espacial. Una amarra sin aislamiento captura electrones del plasma ambiente a lo largo de su segmento polarizado positivamente, el cual puede alcanzar varios kilómetros de longitud, mientras que emite electrones de vuelta al plasma mediante un contactor de plasma activo de baja impedancia en su extremo catódico, tal como un cátodo hueco (hollow cathode). En ausencia de un contactor catódico activo, la corriente que circula por una amarra desnuda en órbita es nula en ambos extremos de la amarra y se dice que ésta está flotando eléctricamente. Para emisión termoiónica despreciable y captura de corriente en condiciones limitadas por movimiento orbital (orbital-motion-limited, OML), el cociente entre las longitudes de los segmentos anódico y catódico es muy pequeño debido a la disparidad de masas entre iones y electrones. Tal modo de operación resulta en una corriente media y fuerza de Lorentz bajas en la amarra, la cual es poco eficiente como dispositivo para desorbitar. El electride C12A7 : e−, que podría presentar una función de trabajo (work function) tan baja como W = 0.6 eV y un comportamiento estable a temperaturas relativamente altas, ha sido propuesto como recubrimiento para amarras desnudas. La emisión termoiónica a lo largo de un segmento así recubierto y bajo el calentamiento de la operación espacial, puede ser más eficiente que la captura iónica. En el modo más simple de fuerza de frenado, podría eliminar la necesidad de un contactor catódico activo y su correspondientes requisitos de alimentación de gas y subsistema de potencia, lo que resultaría en un sistema real de amarra “sin combustible”. Con este recubrimiento de bajo W, cada segmento elemental del segmento catódico de una amarra desnuda de kilómetros de longitud emitiría corriente como si fuese parte de una sonda cilíndrica, caliente y uniformemente polarizada al potencial local de la amarra. La operación es similar a la de una sonda de Langmuir 2D tanto en los segmentos catódico como anódico. Sin embargo, en presencia de emisión, los electrones emitidos resultan en carga espacial (space charge) negativa, la cual reduce el campo eléctrico que los acelera hacia fuera, o incluso puede desacelerarlos y hacerlos volver a la sonda. Se forma una doble vainas (double sheath) estable con electrones emitidos desde la sonda e iones provenientes del plasma ambiente. La densidad de corriente termoiónica, variando a lo largo del segmento catódico, podría seguir dos leyes distintas bajo diferentes condiciones: (i) la ley de corriente limitada por la carga espacial (space-charge-limited, SCL) o (ii) la ley de Richardson-Dushman (RDS). Se presenta un estudio preliminar sobre la corriente SCL frente a una sonda emisora usando la teoría de vainas (sheath) formada por la captura iónica en condiciones OML, y la corriente electrónica SCL entre los electrodos cilíndricos según Langmuir. El modelo, que incluye efectos óhmicos y el efecto de transición de emisión SCL a emisión RDS, proporciona los perfiles de corriente y potencial a lo largo de la longitud completa de la amarra. El análisis muestra que en el modo más simple de fuerza de frenado, bajo condiciones orbitales y de amarras típicas, la emisión termoiónica proporciona un contacto catódico eficiente y resulta en una sección catódica pequeña. En el análisis anterior, tanto la transición de emisión SCL a RD como la propia ley de emisión SCL consiste en un modelo muy simplificado. Por ello, a continuación se ha estudiado con detalle la solución de vaina estacionaria de una sonda con emisión termoiónica polarizada negativamente respecto a un plasma isotrópico, no colisional y sin campo magnético. La existencia de posibles partículas atrapadas ha sido ignorada y el estudio incluye tanto un estudio semi-analítico mediante técnica asintóticas como soluciones numéricas completas del problema. Bajo las tres condiciones (i) alto potencial, (ii) R = Rmax para la validez de la captura iónica OML, y (iii) potencial monotónico, se desarrolla un análisis asintótico auto-consistente para la estructura de plasma compleja que contiene las tres especies de cargas (electrones e iones del plasma, electrones emitidos), y cuatro regiones espaciales distintas, utilizando teorías de movimiento orbital y modelos cinéticos de las especies. Aunque los electrones emitidos presentan carga espacial despreciable muy lejos de la sonda, su efecto no se puede despreciar en el análisis global de la estructura de la vaina y de dos capas finas entre la vaina y la región cuasi-neutra. El análisis proporciona las condiciones paramétricas para que la corriente sea SCL. También muestra que la emisión termoiónica aumenta el radio máximo de la sonda para operar dentro del régimen OML y que la emisión de electrones es mucho más eficiente que la captura iónica para el segmento catódico de la amarra. En el código numérico, los movimientos orbitales de las tres especies son modelados para potenciales tanto monotónico como no-monotónico, y sonda de radio R arbitrario (dentro o más allá del régimen de OML para la captura iónica). Aprovechando la existencia de dos invariante, el sistema de ecuaciones Poisson-Vlasov se escribe como una ecuación integro-diferencial, la cual se discretiza mediante un método de diferencias finitas. El sistema de ecuaciones algebraicas no lineal resultante se ha resuelto de con un método Newton-Raphson paralelizado. Los resultados, comparados satisfactoriamente con el análisis analítico, proporcionan la emisión de corriente y la estructura del plasma y del potencial electrostático. ABSTRACT An electrodynamic tether operates on electromagnetic principles and exchanges momentum through the planetary magnetosphere, by continuously interacting with the ionosphere. It is a reliable passive subsystem to deorbit spent rocket stages and satellites at its end of mission, mitigating the growth of orbital debris. A tether left bare of insulation collects electrons by its own uninsulated and positively biased segment with kilometer range, while electrons are emitted by a low-impedance active device at the cathodic end, such as a hollow cathode, to emit the full electron current. In the absence of an active cathodic device, the current flowing along an orbiting bare tether vanishes at both ends and the tether is said to be electrically floating. For negligible thermionic emission and orbital-motion-limited (OML) collection throughout the entire tether (electron/ion collection at anodic/cathodic segment, respectively), the anodic-to-cathodic length ratio is very small due to ions being much heavier, which results in low average current and Lorentz drag. The electride C12A7 : e−, which might present a possible work function as low as W = 0.6 eV and moderately high temperature stability, has been proposed as coating for floating bare tethers. Thermionic emission along a thus coated cathodic segment, under heating in space operation, can be more efficient than ion collection and, in the simplest drag mode, may eliminate the need for an active cathodic device and its corresponding gas-feed requirements and power subsystem, which would result in a truly “propellant-less” tether system. With this low-W coating, each elemental segment on the cathodic segment of a kilometers-long floating bare-tether would emit current as if it were part of a hot cylindrical probe uniformly polarized at the local tether bias, under 2D probe conditions that are also applied to the anodic-segment analysis. In the presence of emission, emitted electrons result in negative space charge, which decreases the electric field that accelerates them outwards, or even reverses it, decelerating electrons near the emitting probe. A double sheath would be established with electrons being emitted from the probe and ions coming from the ambient plasma. The thermionic current density, varying along the cathodic segment, might follow two distinct laws under different con ditions: i) space-charge-limited (SCL) emission or ii) full Richardson-Dushman (RDS) emission. A preliminary study on the SCL current in front of an emissive probe is presented using the orbital-motion-limited (OML) ion-collection sheath and Langmuir’s SCL electron current between cylindrical electrodes. A detailed calculation of current and bias profiles along the entire tether length is carried out with ohmic effects considered and the transition from SCL to full RDS emission is included. Analysis shows that in the simplest drag mode, under typical orbital and tether conditions, thermionic emission provides efficient cathodic contact and leads to a short cathodic section. In the previous analysis, both the transition between SCL and RDS emission and the current law for SCL condition have used a very simple model. To continue, considering an isotropic, unmagnetized, colissionless plasma and a stationary sheath, the probe-plasma contact is studied in detail for a negatively biased probe with thermionic emission. The possible trapped particles are ignored and this study includes both semianalytical solutions using asymptotic analysis and complete numerical solutions. Under conditions of i) high bias, ii) R = Rmax for ion OML collection validity, and iii) monotonic potential, a self-consistent asymptotic analysis is carried out for the complex plasma structure involving all three charge species (plasma electrons and ions, and emitted electrons) and four distinct spatial regions using orbital motion theories and kinetic modeling of the species. Although emitted electrons present negligible space charge far away from the probe, their effect cannot be neglected in the global analysis for the sheath structure and two thin layers in between the sheath and the quasineutral region. The parametric conditions for the current to be space-chargelimited are obtained. It is found that thermionic emission increases the range of probe radius for OML validity and is greatly more effective than ion collection for cathodic contact of tethers. In the numerical code, the orbital motions of all three species are modeled for both monotonic and non-monotonic potential, and for any probe radius R (within or beyond OML regime for ion collection). Taking advantage of two constants of motion (energy and angular momentum), the Poisson-Vlasov equation is described by an integro differential equation, which is discretized using finite difference method. The non-linear algebraic equations are solved using a parallel implementation of the Newton-Raphson method. The results, which show good agreement with the analytical results, provide the results for thermionic current, the sheath structure, and the electrostatic potential.
Resumo:
Incorporating the possibility of attaching attributes to variables in a logic programming system has been shown to allow the addition of general constraint solving capabilities to it. This approach is very attractive in that by adding a few primitives any logic programming system can be turned into a generic constraint logic programming system in which constraint solving can be user deñned, and at source level - an extreme example of the "glass box" approach. In this paper we propose a different and novel use for the concept of attributed variables: developing a generic parallel/concurrent (constraint) logic programming system, using the same "glass box" flavor. We argüe that a system which implements attributed variables and a few additional primitives can be easily customized at source level to implement many of the languages and execution models of parallelism and concurrency currently proposed, in both shared memory and distributed systems. We illustrate this through examples and report on an implementation of our ideas.
Resumo:
We argüe that in order to exploit both Independent And- and Or-parallelism in Prolog programs there is advantage in recomputing some of the independent goals, as opposed to all their solutions being reused. We present an abstract model, called the Composition- Tree, for representing and-or parallelism in Prolog Programs. The Composition-tree closely mirrors sequential Prolog execution by recomputing some independent goals rather than fully re-using them. We also outline two environment representation techniques for And-Or parallel execution of full Prolog based on the Composition-tree model abstraction. We argüe that these techniques have advantages over earlier proposals for exploiting and-or parallelism in Prolog.
Resumo:
This article presents in an informal way some early results on the design of a series of paradigms for visualization of the parallel execution of logic programs. The results presented here refer to the visualization of or-parallelism, as in MUSE and Aurora, deterministic dependent and-parallelism, as in Andorra-I, and independent and-parallelism as in &-Prolog. A tool has been implemented for this purpose and has been interfaced with these systems. Results are presented showing the visualization of executions from these systems and the usefulness of the resulting tool is briefly discussed.
Resumo:
Abstract is not available
Resumo:
The term "Logic Programming" refers to a variety of computer languages and execution models which are based on the traditional concept of Symbolic Logic. The expressive power of these languages offers promise to be of great assistance in facing the programming challenges of present and future symbolic processing applications in Artificial Intelligence, Knowledge-based systems, and many other areas of computing. The sequential execution speed of logic programs has been greatly improved since the advent of the first interpreters. However, higher inference speeds are still required in order to meet the demands of applications such as those contemplated for next generation computer systems. The execution of logic programs in parallel is currently considered a promising strategy for attaining such inference speeds. Logic Programming in turn appears as a suitable programming paradigm for parallel architectures because of the many opportunities for parallel execution present in the implementation of logic programs. This dissertation presents an efficient parallel execution model for logic programs. The model is described from the source language level down to an "Abstract Machine" level suitable for direct implementation on existing parallel systems or for the design of special purpose parallel architectures. Few assumptions are made at the source language level and therefore the techniques developed and the general Abstract Machine design are applicable to a variety of logic (and also functional) languages. These techniques offer efficient solutions to several areas of parallel Logic Programming implementation previously considered problematic or a source of considerable overhead, such as the detection and handling of variable binding conflicts in AND-Parallelism, the specification of control and management of the execution tree, the treatment of distributed backtracking, and goal scheduling and memory management issues, etc. A parallel Abstract Machine design is offered, specifying data areas, operation, and a suitable instruction set. This design is based on extending to a parallel environment the techniques introduced by the Warren Abstract Machine, which have already made very fast and space efficient sequential systems a reality. Therefore, the model herein presented is capable of retaining sequential execution speed similar to that of high performance sequential systems, while extracting additional gains in speed by efficiently implementing parallel execution. These claims are supported by simulations of the Abstract Machine on sample programs.
Resumo:
This article presents in an informal way some early results on the design of a series of paradigms for visualization of the parallel execution of logic programs. The results presented here refer to the visualization of or-parallelism, as in MUSE and Aurora, deterministic dependent and-parallelism, as in Andorra-I, and independent and-parallelism as in &-Prolog. A tool has been implemented for this purpose and has been interfaced with these systems. Results are presented showing the visualization of executions from these systems and the usefulness of the resulting tool is briefly discussed.
Resumo:
We show a method for parallelizing top down dynamic programs in a straightforward way by a careful choice of a lock-free shared hash table implementation and randomization of the order in which the dynamic program computes its subproblems. This generic approach is applied to dynamic programs for knapsack, shortest paths, and RNA structure alignment, as well as to a state-of-the-art solution for minimizing the máximum number of open stacks. Experimental results are provided on three different modern multicore architectures which show that this parallelization is effective and reasonably scalable. In particular, we obtain over 10 times speedup for 32 threads on the open stacks problem.