45 resultados para Parallel Computation
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
”compositions” is a new R-package for the analysis of compositional and positive data.It contains four classes corresponding to the four different types of compositional andpositive geometry (including the Aitchison geometry). It provides means for computation,plotting and high-level multivariate statistical analysis in all four geometries.These geometries are treated in an fully analogous way, based on the principle of workingin coordinates, and the object-oriented programming paradigm of R. In this way,called functions automatically select the most appropriate type of analysis as a functionof the geometry. The graphical capabilities include ternary diagrams and tetrahedrons,various compositional plots (boxplots, barplots, piecharts) and extensive graphical toolsfor principal components. Afterwards, ortion and proportion lines, straight lines andellipses in all geometries can be added to plots. The package is accompanied by ahands-on-introduction, documentation for every function, demos of the graphical capabilitiesand plenty of usage examples. It allows direct and parallel computation inall four vector spaces and provides the beginner with a copy-and-paste style of dataanalysis, while letting advanced users keep the functionality and customizability theydemand of R, as well as all necessary tools to add own analysis routines. A completeexample is included in the appendix
Resumo:
In computer graphics, global illumination algorithms take into account not only the light that comes directly from the sources, but also the light interreflections. This kind of algorithms produce very realistic images, but at a high computational cost, especially when dealing with complex environments. Parallel computation has been successfully applied to such algorithms in order to make it possible to compute highly-realistic images in a reasonable time. We introduce here a speculation-based parallel solution for a global illumination algorithm in the context of radiosity, in which we have taken advantage of the hierarchical nature of such an algorithm
Resumo:
The resource utilization level in open laboratories of several universities has been shown to be very low. Our aim is to take advantage of those idle resources for parallel computation without disturbing the local load. In order to provide a system that lets us execute parallel applications in such a non-dedicated cluster, we use an integral scheduling system that considers both Space and Time sharing concerns. For dealing with the Time Sharing (TS) aspect, we use a technique based on the communication-driven coscheduling principle. This kind of TS system has some implications on the Space Sharing (SS) system, that force us to modify the way job scheduling is traditionally done. In this paper, we analyze the relation between the TS and the SS systems in a non-dedicated cluster. As a consequence of this analysis, we propose a new technique, termed 3DBackfilling. This proposal implements the well known SS technique of backfilling, but applied to an environment with a MultiProgramming Level (MPL) of the parallel applications that is greater than one. Besides, 3DBackfilling considers the requirements of the local workload running on each node. Our proposal was evaluated in a PVM/MPI Linux cluster, and it was compared with several more traditional SS policies applied to non-dedicated environments.
Resumo:
We present an algorithm for the computation of reducible invariant tori of discrete dynamical systems that is suitable for tori of dimensions larger than 1. It is based on a quadratically convergent scheme that approximates, at the same time, the Fourier series of the torus, its Floquet transformation, and its Floquet matrix. The Floquet matrix describes the linearization of the dynamics around the torus and, hence, its linear stability. The algorithm presents a high degree of parallelism, and the computational effort grows linearly with the number of Fourier modes needed to represent the solution. For these reasons it is a very good option to compute quasi-periodic solutions with several basic frequencies. The paper includes some examples (flows) to show the efficiency of the method in a parallel computer. In these flows we compute invariant tori of dimensions up to 5, by taking suitable sections.
Resumo:
This paper proposes a parallel architecture for estimation of the motion of an underwater robot. It is well known that image processing requires a huge amount of computation, mainly at low-level processing where the algorithms are dealing with a great number of data. In a motion estimation algorithm, correspondences between two images have to be solved at the low level. In the underwater imaging, normalised correlation can be a solution in the presence of non-uniform illumination. Due to its regular processing scheme, parallel implementation of the correspondence problem can be an adequate approach to reduce the computation time. Taking into consideration the complexity of the normalised correlation criteria, a new approach using parallel organisation of every processor from the architecture is proposed
Resumo:
This paper shows how a high level matrix programming language may be used to perform Monte Carlo simulation, bootstrapping, estimation by maximum likelihood and GMM, and kernel regression in parallel on symmetric multiprocessor computers or clusters of workstations. The implementation of parallelization is done in a way such that an investigator may use the programs without any knowledge of parallel programming. A bootable CD that allows rapid creation of a cluster for parallel computing is introduced. Examples show that parallelization can lead to important reductions in computational time. Detailed discussion of how the Monte Carlo problem was parallelized is included as an example for learning to write parallel programs for Octave.
Resumo:
This note describes ParallelKnoppix, a bootable CD that allows creation of a Linux cluster in very little time. An experienced user can create a cluster ready to execute MPI programs in less than 10 minutes. The computers used may be heterogeneous machines, of the IA-32 architecture. When the cluster is shut down, all machines except one are in their original state, and the last can be returned to its original state by deleting a directory. The system thus provides a means of using non-dedicated computers to create a cluster. An example session is documented.
Resumo:
"Vegeu el resum a l'inici del document del fitxer adjunt."
Resumo:
"Vegeu el resum a l'inici del document del fitxer ajunt."
Resumo:
In this paper, we develop numerical algorithms that use small requirements of storage and operations for the computation of invariant tori in Hamiltonian systems (exact symplectic maps and Hamiltonian vector fields). The algorithms are based on the parameterization method and follow closely the proof of the KAM theorem given in [LGJV05] and [FLS07]. They essentially consist in solving a functional equation satisfied by the invariant tori by using a Newton method. Using some geometric identities, it is possible to perform a Newton step using little storage and few operations. In this paper we focus on the numerical issues of the algorithms (speed, storage and stability) and we refer to the mentioned papers for the rigorous results. We show how to compute efficiently both maximal invariant tori and whiskered tori, together with the associated invariant stable and unstable manifolds of whiskered tori. Moreover, we present fast algorithms for the iteration of the quasi-periodic cocycles and the computation of the invariant bundles, which is a preliminary step for the computation of invariant whiskered tori. Since quasi-periodic cocycles appear in other contexts, this section may be of independent interest. The numerical methods presented here allow to compute in a unified way primary and secondary invariant KAM tori. Secondary tori are invariant tori which can be contracted to a periodic orbit. We present some preliminary results that ensure that the methods are indeed implementable and fast. We postpone to a future paper optimized implementations and results on the breakdown of invariant tori.
Resumo:
Este trabajo analiza el rendimiento de cuatro nodos de cómputo multiprocesador de memoria compartida para resolver el problema N-body. Se paraleliza el algoritmo serie, y se codifica usando el lenguaje C extendido con OpenMP. El resultado son dos variantes que obedecen a dos criterios de optimización diferentes: minimizar los requisitos de memoria y minimizar el volumen de cómputo. Posteriormente, se realiza un proceso de análisis de las prestaciones del programa sobre los nodos de cómputo. Se modela el rendimiento de las variantes secuenciales y paralelas de la aplicación, y de los nodos de cómputo; se instrumentan y ejecutan los programas para obtener resultados en forma de varias métricas; finalmente se muestran e interpretan los resultados, proporcionando claves que explican ineficiencias y cuellos de botella en el rendimiento y posibles líneas de mejora. La experiencia de este estudio concreto ha permitido esbozar una incipiente metodología de análisis de rendimiento, identificación de problemas y sintonización de algoritmos a nodos de cómputo multiprocesador de memoria compartida.
Resumo:
We study simply-connected irreducible non-locally symmetric pseudo-Riemannian Spin(q) manifolds admitting parallel quaternionic spinors.
Resumo:
Aquest projecte es tracta de la optimització i la implementació de l’etapa d’adquisició d’un receptor GPS. També inclou una revisió breu del sistema GPS i els seus principis de funcionament. El procés d’adquisició s’ha estudiat amb detall i programat en els entorns de treball Matlab i Simulink. El fet d’implementar aquesta etapa en dos entorns diferents ha estat molt útil tant de cara a l’aprenentatge com també per la comprovació dels resultats obtinguts. El principal objectiu del treball és el disseny d’un model Simulink que es capaç d’adquirir una senyal capturada amb hardware real. En realitat, s’han fet dues implementacions: una que utilitza blocs propis de Simulink i l’altra que utilitza blocs de la llibreria Xilinx. D’aquesta manera, posteriorment, es facilitaria la transició del model a la FPGA utilitzant l’entorn ISE de Xilinx. La implementació de l’etapa d’adquisició es basa en el mètode de cerca de fase de codi en paral·lel, el qual empra la operació correlació creuada mitjançant la transformada ràpida de Fourier (FFT). Per aquest procés es necessari realitzar dues transformades (per a la senyal entrant i el codi de referència) i una antitransformada de Fourier (per al resultat de la correlació). Per tal d’optimitzar el disseny s’utilitza un bloc FFT, ja que tres blocs consumeixen gran part dels recursos d’una FPGA. En lloc de replicar el bloc FFT, en el model el bloc és compartit en el temps gràcies a l’ús de buffers i commutadors, com a resultat la quantitat de recursos requerits per una implementació en una FPGA es podria reduir considerablement.
Resumo:
La gestión de recursos en los procesadores multi-core ha ganado importancia con la evolución de las aplicaciones y arquitecturas. Pero esta gestión es muy compleja. Por ejemplo, una misma aplicación paralela ejecutada múltiples veces con los mismos datos de entrada, en un único nodo multi-core, puede tener tiempos de ejecución muy variables. Hay múltiples factores hardware y software que afectan al rendimiento. La forma en que los recursos hardware (cómputo y memoria) se asignan a los procesos o threads, posiblemente de varias aplicaciones que compiten entre sí, es fundamental para determinar este rendimiento. La diferencia entre hacer la asignación de recursos sin conocer la verdadera necesidad de la aplicación, frente a asignación con una meta específica es cada vez mayor. La mejor manera de realizar esta asignación és automáticamente, con una mínima intervención del programador. Es importante destacar, que la forma en que la aplicación se ejecuta en una arquitectura no necesariamente es la más adecuada, y esta situación puede mejorarse a través de la gestión adecuada de los recursos disponibles. Una apropiada gestión de recursos puede ofrecer ventajas tanto al desarrollador de las aplicaciones, como al entorno informático donde ésta se ejecuta, permitiendo un mayor número de aplicaciones en ejecución con la misma cantidad de recursos. Así mismo, esta gestión de recursos no requeriría introducir cambios a la aplicación, o a su estrategia operativa. A fin de proponer políticas para la gestión de los recursos, se analizó el comportamiento de aplicaciones intensivas de cómputo e intensivas de memoria. Este análisis se llevó a cabo a través del estudio de los parámetros de ubicación entre los cores, la necesidad de usar la memoria compartida, el tamaño de la carga de entrada, la distribución de los datos dentro del procesador y la granularidad de trabajo. Nuestro objetivo es identificar cómo estos parámetros influyen en la eficiencia de la ejecución, identificar cuellos de botella y proponer posibles mejoras. Otra propuesta es adaptar las estrategias ya utilizadas por el Scheduler con el fin de obtener mejores resultados.