35 resultados para Unrelated parallel machines

em Universidad Politécnica de Madrid


Relevância:

80.00% 80.00%

Publicador:

Resumo:

El principal problema que impide actualmente una mayor utilización de las máquinas paralelas es la falta de herramientas de programación que permitan generar programas transportables a máquinas con diferentes prestaciones. En este trabajo se ha estudiado si los lenguajes con paralelismo explícito cumplen este requisito y son, por lo tanto, adecuados para programar este tipo de máquinas. El exceso de paralelismo, esto es, el uso de mayor paralelismo en el programa que el proporcionado por la máquina para esconder la latencia en la comunicación, se presenta en este trabajo como una solución a los problemas de eficiencia de los programas con paralelismo explícito cuando se ejecutan en máquinas que no tienen una granularidad adecuada. Con esta técnica, por lo tanto, los programas escritos con estos lenguajes pueden transportarse con eficiencia a diferentes máquinas. Para llevar a cabo el estudio de los lenguajes con paralelismo explícito, se ha desarrollado un modelo abstracto de paralelismo, en el cual un sistema está formado por una jerarquía de máquinas virtuales paralelas. Este modelo permite realizar un análisis genérico de la implementación de este tipo de lenguajes, ya sea sobre una máquina con sistema operativo o directamente sobre la máquina física. Este análisis genérico se ha aplicado a un lenguaje de este tipo, el lenguaje Ada. Se han estudiado las características específicas de Ada que pueden influir en la implementación eficiente del lenguaje, analizando también la propuesta de modificación del lenguaje correspondiente al proceso de revisión Ada 9X. Dentro del marco del modelo de paralelismo, se analiza también la problemática específica de las implementaciones del lenguaje sobre el sistema operativo. En este tipo de implementaciones, las interacciones de un programa con el entorno externo pueden causar ciertos problemas, como el bloqueo del proceso correspondiente del sistema operativo, que disminuyen el rendimiento del programa. Se analizan estos problemas y se proponen soluciones a los mismos. Se desarrolla en profundidad un ejemplo de este tipo de problemas: El acceso al estándar gráfico GKS desde Ada.---ABSTRACT---The major obstacle to the widespread utilization of the parallel machines is the lack of programming tools allowing the development of software portable between machines with different performance. This dissertation analyzes whether languages with explicit parallelism fulfil this requirement. The approach of using programs with more parallelism than available on the machine (parallel slackness) is presented. This technique can solve the efficiency problems appearing in the execution of programs with explicit parallelism over machines with a too coarse granularity. Therefore, with this approach programs can run efficiently on different machines. A new abstract model of parallelism allowing the generic study of the implementation of languages with explicit parallelism is developed. In this model, a parallel system is described by a hierarchy of parallel virtual machines. This generic analysis is applied to Ada language. Ada specific features with problematic implementation are identified and analyzed. The change proposals to Ada language in the frame of Ada 9X revisión process are also analyzed. The specific problematic of the language implementation on top of the operating system is studied under the scope of the parallelism model. With this kind of implementation, program interactions with extemal environments can lead to problems, like the blocking of the corresponding operating system process, decreasing the program execution performance. A practical example of this kind of problems, the access to GKS (Graphic Kernel System) from Ada programs, is analyzed and the implemented solution is described.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We show a method for parallelizing top down dynamic programs in a straightforward way by a careful choice of a lock-free shared hash table implementation and randomization of the order in which the dynamic program computes its subproblems. This generic approach is applied to dynamic programs for knapsack, shortest paths, and RNA structure alignment, as well as to a state-of-the-art solution for minimizing the máximum number of open stacks. Experimental results are provided on three different modern multicore architectures which show that this parallelization is effective and reasonably scalable. In particular, we obtain over 10 times speedup for 32 threads on the open stacks problem.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Since the early days of logic programming, researchers in the field realized the potential for exploitation of parallelism present in the execution of logic programs. Their high-level nature, the presence of nondeterminism, and their referential transparency, among other characteristics, make logic programs interesting candidates for obtaining speedups through parallel execution. At the same time, the fact that the typical applications of logic programming frequently involve irregular computations, make heavy use of dynamic data structures with logical variables, and involve search and speculation, makes the techniques used in the corresponding parallelizing compilers and run-time systems potentially interesting even outside the field. The objective of this article is to provide a comprehensive survey of the issues arising in parallel execution of logic programming languages along with the most relevant approaches explored to date in the field. Focus is mostly given to the challenges emerging from the parallel execution of Prolog programs. The article describes the major techniques used for shared memory implementation of Or-parallelism, And-parallelism, and combinations of the two. We also explore some related issues, such as memory management, compile-time analysis, and execution visualization.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Este trabajo esta dedicado al estudio de las estructuras macroscópicas conocidas en la literatura como filamentos o blobs que han sido observadas de manera universal en el borde de todo tipo de dispositivos de fusión por confinamiento magnético. Estos filamentos, celdas convectivas elongadas a lo largo de las líneas de campo que surgen en el plasma fuertemente turbulento que existe en este tipo de dispositivos, parecen dominar el transporte radial de partículas y energía en la región conocida como Scrape-off Layer, en la que las líneas de campo dejan de estar cerradas y el plasma es dirigido hacia la pared sólida que forma la cámara de vacío. Aunque el comportamiento y las leyes de escala de estas estructuras son relativamente bien conocidos, no existe aún una teoría generalmente aceptada acerca del mecanismo físico responsable de su formación, que constituye una de las principales incógnitas de la teoría de transporte del borde en plasmas de fusión y una cuestión de gran importancia práctica en el desarrollo de la siguiente generación de reactores de fusión (incluyendo dispositivos como ITER y DEMO), puesto que la eficiencia del confinamiento y la cantidad de energía depositadas en la pared dependen directamente de las características del transporte en el borde. El trabajo ha sido realizado desde una perspectiva eminentemente experimental, incluyendo la observación y el análisis de este tipo de estructuras en el stellarator tipo heliotrón LHD (un dispositivo de gran tamaño, capaz de generar plasmas de características cercanas a las necesarias en un reactor de fusión) y en el stellarator tipo heliac TJ-II (un dispositivo de medio tamaño, capaz de generar plasmas relativamente más fríos pero con una accesibilidad y disponibilidad de diagnósticos mayor). En particular, en LHD se observó la generación de filamentos durante las descargas realizadas en configuración de alta _ (alta presión cinética frente a magnética) mediante una cámara visible ultrarrápida, se caracterizó su comportamiento y se investigó, mediante el análisis estadístico y la comparación con modelos teóricos, el posible papel de la Criticalidad Autoorganizada en la formación de este tipo de estructuras. En TJ-II se diseñó y construyó una cabeza de sonda capaz de medir simultáneamente las fluctuaciones electrostáticas y electromagnéticas del plasma. Gracias a este nuevo diagnóstico se pudieron realizar experimentos con el fin de determinar la presencia de corriente paralela a través de los filamentos (un parámetro de gran importancia en su modelización) y relacionar los dos tipos de fluctuaciones por primera vez en un stellarator. Así mismo, también por primera vez en este tipo de dispositivo, fue posible realizar mediciones simultáneas de los tensores viscoso y magnético (Reynolds y Maxwell) de transporte de cantidad de movimiento. ABSTRACT This work has been devoted to the study of the macroscopic structures known in the literature as filaments or blobs, which have been observed universally in the edge of all kind of magnetic confinement fusion devices. These filaments, convective cells stretching along the magnetic field lines, arise from the highly turbulent plasma present in this kind of machines and seem to dominate radial transport of particles and energy in the region known as Scrapeoff Layer, in which field lines become open and plasma is directed towards the solid wall of the vacuum vessel. Although the behavior and scale laws of these structures are relatively well known, there is no generally accepted theory about the physical mechanism involved in their formation yet, which remains one of the main unsolved questions in the fusion plasmas edge transport theory and a matter of great practical importance for the development of the next generation of fusion reactors (including ITER and DEMO), since efficiency of confinement and the energy deposition levels on the wall are directly dependent of the characteristics of edge transport. This work has been realized mainly from an experimental perspective, including the observation and analysis of this kind of structures in the heliotron stellarator LHD (a large device capable of generating reactor-relevant plasma conditions) and in the heliac stellarator TJ-II (a medium-sized device, capable of relatively colder plasmas, but with greater ease of access and diagnostics availability). In particular, in LHD, the generation of filaments during high _ discharges (with high kinetic to magnetic pressure ratio) was observed by means of an ultrafast visible camera, and the behavior of this structures was characterized. Finally, the potential role of Self-Organized Criticality in the generation of filaments was investigated. In TJ-II, a probe head capable of measuring simultaneously electrostatic and electromagnetic fluctuations in the plasma was designed and built. Thanks to this new diagnostic, experiments were carried out in order to determine the presence of parallel current through filaments (one of the most important parameters in their modelization) and to related electromagnetic (EM) and electrostatic (ES) fluctuations for the first time in an stellarator. As well, also for the first time in this kind of device, measurements of the viscous and magnetic momentum transfer tensors (Reynolds and Maxwell) were performed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Membrane systems are computational equivalent to Turing machines. However, their distributed and massively parallel nature obtains polynomial solutions opposite to traditional non-polynomial ones. At this point, it is very important to develop dedicated hardware and software implementations exploiting those two membrane systems features. Dealing with distributed implementations of P systems, the bottleneck communication problem has arisen. When the number of membranes grows up, the network gets congested. The purpose of distributed architectures is to reach a compromise between the massively parallel character of the system and the needed evolution step time to transit from one configuration of the system to the next one, solving the bottleneck communication problem. The goal of this paper is twofold. Firstly, to survey in a systematic and uniform way the main results regarding the way membranes can be placed on processors in order to get a software/hardware simulation of P-Systems in a distributed environment. Secondly, we improve some results about the membrane dissolution problem, prove that it is connected, and discuss the possibility of simulating this property in the distributed model. All this yields an improvement in the system parallelism implementation since it gets an increment of the parallelism of the external communication among processors. Proposed ideas improve previous architectures to tackle the communication bottleneck problem, such as reduction of the total time of an evolution step, increase of the number of membranes that could run on a processor and reduction of the number of processors.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes new improvements for BB-MaxClique (San Segundo et al. in Comput Oper Resour 38(2):571–581, 2011 ), a leading maximum clique algorithm which uses bit strings to efficiently compute basic operations during search by bit masking. Improvements include a recently described recoloring strategy in Tomita et al. (Proceedings of the 4th International Workshop on Algorithms and Computation. Lecture Notes in Computer Science, vol 5942. Springer, Berlin, pp 191–203, 2010 ), which is now integrated in the bit string framework, as well as different optimization strategies for fast bit scanning. Reported results over DIMACS and random graphs show that the new variants improve over previous BB-MaxClique for a vast majority of cases. It is also established that recoloring is mainly useful for graphs with high densities.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We have developed a new projector model specifically tailored for fast list-mode tomographic reconstructions in Positron emission tomography (PET) scanners with parallel planar detectors. The model provides an accurate estimation of the probability distribution of coincidence events defined by pairs of scintillating crystals. This distribution is parameterized with 2D elliptical Gaussian functions defined in planes perpendicular to the main axis of the tube of response (TOR). The parameters of these Gaussian functions have been obtained by fitting Monte Carlo simulations that include positron range, acolinearity of gamma rays, as well as detector attenuation and scatter effects. The proposed model has been applied efficiently to list-mode reconstruction algorithms. Evaluation with Monte Carlo simulations over a rotating high resolution PET scanner indicates that this model allows to obtain better recovery to noise ratio in OSEM (ordered-subsets, expectation-maximization) reconstruction, if compared to list-mode reconstruction with symmetric circular Gaussian TOR model, and histogram-based OSEM with precalculated system matrix using Monte Carlo simulated models and symmetries.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The manipulation and handling of an ever increasing volume of data by current data-intensive applications require novel techniques for e?cient data management. Despite recent advances in every aspect of data management (storage, access, querying, analysis, mining), future applications are expected to scale to even higher degrees, not only in terms of volumes of data handled but also in terms of users and resources, often making use of multiple, pre-existing autonomous, distributed or heterogeneous resources.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Zernike polynomials are a well known set of functions that find many applications in image or pattern characterization because they allow to construct shape descriptors that are invariant against translations, rotations or scale changes. The concepts behind them can be extended to higher dimension spaces, making them also fit to describe volumetric data. They have been less used than their properties might suggest due to their high computational cost. We present a parallel implementation of 3D Zernike moments analysis, written in C with CUDA extensions, which makes it practical to employ Zernike descriptors in interactive applications, yielding a performance of several frames per second in voxel datasets about 2003 in size. In our contribution, we describe the challenges of implementing 3D Zernike analysis in a general-purpose GPU. These include how to deal with numerical inaccuracies, due to the high precision demands of the algorithm, or how to deal with the high volume of input data so that it does not become a bottleneck for the system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper aims to analyze the different adjustment methods commonly used to characterize indirect metrology circular features: least square circle, minimum zone circle, maximum inscribed circle and minimum circumscribed circle. The analysis was performed from images obtained by digital optical machines. The calculation algorithms, self-developed, have been implemented in Matlab® and take into consideration as study variables: the amplitude of angular sector of the circular feature, its nominal radio and the magnification used by the optical machine. Under different conditions, it was determined the radius and circularity error of different circular standards. The comparison of the results, obtained by the different methods of adjustments used, with certified values for the standards, has allowed us to determine the accuracy of each method and its scope.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper outlines the problems found in the parallelization of SPH (Smoothed Particle Hydrodynamics) algorithms using Graphics Processing Units. Different results of some parallel GPU implementations in terms of the speed-up and the scalability compared to the CPU sequential codes are shown. The most problematic stage in the GPU-SPH algorithms is the one responsible for locating neighboring particles and building the vectors where this information is stored, since these specific algorithms raise many dificulties for a data-level parallelization. Because of the fact that the neighbor location using linked lists does not show enough data-level parallelism, two new approaches have been pro- posed to minimize bank conflicts in the writing and subsequent reading of the neighbor lists. The first strategy proposes an efficient coordination between CPU-GPU, using GPU algorithms for those stages that allow a straight forward parallelization, and sequential CPU algorithms for those instructions that involve some kind of vector reduction. This coordination provides a relatively orderly reading of the neighbor lists in the interactions stage, achieving a speed-up factor of x47 in this stage. However, since the construction of the neighbor lists is quite expensive, it is achieved an overall speed-up of x41. The second strategy seeks to maximize the use of the GPU in the neighbor's location process by executing a specific vector sorting algorithm that allows some data-level parallelism. Al- though this strategy has succeeded in improving the speed-up on the stage of neighboring location, the global speed-up on the interactions stage falls, due to inefficient reading of the neighbor vectors. Some changes to these strategies are proposed, aimed at maximizing the computational load of the GPU and using the GPU texture-units, in order to reach the maximum speed-up for such codes. Different practical applications have been added to the mentioned GPU codes. First, the classical dam-break problem is studied. Second, the wave impact of the sloshing fluid contained in LNG vessel tanks is also simulated as a practical example of particle methods

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article describes a new visual servo control and strategies that are used to carry out dynamic tasks by the Robotenis platform. This platform is basically a parallel robot that is equipped with an acquisition and processing system of visual information, its main feature is that it has a completely open architecture control, and planned in order to design, implement, test and compare control strategies and algorithms (visual and actuated joint controllers). Following sections describe a new visual control strategy specially designed to track and intercept objects in 3D space. The results are compared with a controller shown in previous woks, where the end effector of the robot keeps a constant distance from the tracked object. In this work, the controller is specially designed in order to allow changes in the tracking reference. Changes in the tracking reference can be used to grip an object that is under movement, or as in this case, hitting a hanging Ping-Pong ball. Lyapunov stability is taken into account in the controller design.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The main purpose of robot calibration is the correction of the possible errors in the robot parameters. This paper presents a method for a kinematic calibration of a parallel robot that is equipped with one camera in hand. In order to preserve the mechanical configuration of the robot, the camera is utilized to acquire incremental positions of the end effector from a spherical object that is fixed in the word reference frame. The positions of the end effector are related to incremental positions of resolvers of the motors of the robot, and a kinematic model of the robot is used to find a new group of parameters which minimizes errors in the kinematic equations. Additionally, properties of the spherical object and intrinsic camera parameters are utilized to model the projection of the object in the image and improving spatial measurements. Finally, the robotic system is designed to carry out tracking tasks and the calibration of the robot is validated by means of integrating the errors of the visual controller.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Rms voltage regulation may be an attractive possibility for controlling power inverters. Combined with a Hall Effect sensor for current control, it keeps its parallel operation capability while increasing its noise immunity, which may lead to a reduction of the Total Harmonic Distortion (THD). Besides, as voltage regulation is designed in DC, a simple PI regulator can provide accurate voltage tracking. Nevertheless, this approach does not lack drawbacks. Its narrow voltage bandwidth makes transients last longer and it increases the voltage THD when feeding non-linear loads, such as rectifying stages. On the other hand, the implementation can fall into offset voltage error. Furthermore, the information of the output voltage phase is hidden for the control as well, making the synchronization of a 3-phase setup not trivial. This paper explains the concept, design and implementation of the whole control scheme, in an on board inverter able to run in parallel and within a 3-phase setup. Special attention is paid to solve the problems foreseen at implementation level: a third analog loop accounts for the offset level is added and a digital algorithm guarantees 3-phase voltage synchronization.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a theoretical analysis and an optimization method for envelope amplifier. Highly efficient envelope amplifiers based on a switching converter in parallel or series with a linear regulator have been analyzed and optimized. The results of the optimization process have been shown and these two architectures are compared regarding their complexity and efficiency. The optimization method that is proposed is based on the previous knowledge about the transmitted signal type (OFDM, WCDMA...) and it can be applied to any signal type as long as the envelope probability distribution is known. Finally, it is shown that the analyzed architectures have an inherent efficiency limit.