7 resultados para parallel algorithm
em Universidad Politécnica de Madrid
                                
Resumo:
This thesis aims to introduce some fundamental concepts underlying option valuation theory including implementation of computational tools. In many cases analytical solution for option pricing does not exist, thus the following numerical methods are used: binomial trees, Monte Carlo simulations and finite difference methods. First, an algorithm based on Hull and Wilmott is written for every method. Then these algorithms are improved in different ways. For the binomial tree both speed and memory usage is significantly improved by using only one vector instead of a whole price storing matrix. Computational time in Monte Carlo simulations is reduced by implementing a parallel algorithm (in C) which is capable of improving speed by a factor which equals the number of processors used. Furthermore, MatLab code for Monte Carlo was made faster by vectorizing simulation process. Finally, obtained option values are compared to those obtained with popular finite difference methods, and it is discussed which of the algorithms is more appropriate for which purpose.
                                
Resumo:
This paper describes new improvements for BB-MaxClique (San Segundo et al. in Comput Oper Resour 38(2):571–581, 2011 ), a leading maximum clique algorithm which uses bit strings to efficiently compute basic operations during search by bit masking. Improvements include a recently described recoloring strategy in Tomita et al. (Proceedings of the 4th International Workshop on Algorithms and Computation. Lecture Notes in Computer Science, vol 5942. Springer, Berlin, pp 191–203, 2010 ), which is now integrated in the bit string framework, as well as different optimization strategies for fast bit scanning. Reported results over DIMACS and random graphs show that the new variants improve over previous BB-MaxClique for a vast majority of cases. It is also established that recoloring is mainly useful for graphs with high densities.
                                
Resumo:
Zernike polynomials are a well known set of functions that find many applications in image or pattern characterization because they allow to construct shape descriptors that are invariant against translations, rotations or scale changes. The concepts behind them can be extended to higher dimension spaces, making them also fit to describe volumetric data. They have been less used than their properties might suggest due to their high computational cost. We present a parallel implementation of 3D Zernike moments analysis, written in C with CUDA extensions, which makes it practical to employ Zernike descriptors in interactive applications, yielding a performance of several frames per second in voxel datasets about 2003 in size. In our contribution, we describe the challenges of implementing 3D Zernike analysis in a general-purpose GPU. These include how to deal with numerical inaccuracies, due to the high precision demands of the algorithm, or how to deal with the high volume of input data so that it does not become a bottleneck for the system.
                                
Resumo:
This paper outlines the problems found in the parallelization of SPH (Smoothed Particle Hydrodynamics) algorithms using Graphics Processing Units. Different results of some parallel GPU implementations in terms of the speed-up and the scalability compared to the CPU sequential codes are shown. The most problematic stage in the GPU-SPH algorithms is the one responsible for locating neighboring particles and building the vectors where this information is stored, since these specific algorithms raise many dificulties for a data-level parallelization. Because of the fact that the neighbor location using linked lists does not show enough data-level parallelism, two new approaches have been pro- posed to minimize bank conflicts in the writing and subsequent reading of the neighbor lists. The first strategy proposes an efficient coordination between CPU-GPU, using GPU algorithms for those stages that allow a straight forward parallelization, and sequential CPU algorithms for those instructions that involve some kind of vector reduction. This coordination provides a relatively orderly reading of the neighbor lists in the interactions stage, achieving a speed-up factor of x47 in this stage. However, since the construction of the neighbor lists is quite expensive, it is achieved an overall speed-up of x41. The second strategy seeks to maximize the use of the GPU in the neighbor's location process by executing a specific vector sorting algorithm that allows some data-level parallelism. Al- though this strategy has succeeded in improving the speed-up on the stage of neighboring location, the global speed-up on the interactions stage falls, due to inefficient reading of the neighbor vectors. Some changes to these strategies are proposed, aimed at maximizing the computational load of the GPU and using the GPU texture-units, in order to reach the maximum speed-up for such codes. Different practical applications have been added to the mentioned GPU codes. First, the classical dam-break problem is studied. Second, the wave impact of the sloshing fluid contained in LNG vessel tanks is also simulated as a practical example of particle methods
                                
Resumo:
Rms voltage regulation may be an attractive possibility for controlling power inverters. Combined with a Hall Effect sensor for current control, it keeps its parallel operation capability while increasing its noise immunity, which may lead to a reduction of the Total Harmonic Distortion (THD). Besides, as voltage regulation is designed in DC, a simple PI regulator can provide accurate voltage tracking. Nevertheless, this approach does not lack drawbacks. Its narrow voltage bandwidth makes transients last longer and it increases the voltage THD when feeding non-linear loads, such as rectifying stages. On the other hand, the implementation can fall into offset voltage error. Furthermore, the information of the output voltage phase is hidden for the control as well, making the synchronization of a 3-phase setup not trivial. This paper explains the concept, design and implementation of the whole control scheme, in an on board inverter able to run in parallel and within a 3-phase setup. Special attention is paid to solve the problems foreseen at implementation level: a third analog loop accounts for the offset level is added and a digital algorithm guarantees 3-phase voltage synchronization.
                                
Resumo:
The interactions among three important issues involved in the implementation of logic programs in parallel (goal scheduling, precedence, and memory management) are discussed. A simplified, parallel memory management model and an efficient, load-balancing goal scheduling strategy are presented. It is shown how, for systems which support "don't know" non-determinism, special care has to be taken during goal scheduling if the space recovery characteristics of sequential systems are to be preserved. A solution based on selecting only "newer" goals for execution is described, and an algorithm is proposed for efficiently maintaining and determining precedence relationships and variable ages across parallel goals. It is argued that the proposed schemes and algorithms make it possible to extend the storage performance of sequential systems to parallel execution without the considerable overhead previously associated with it. The results are applicable to a wide class of parallel and coroutining systems, and they represent an efficient alternative to "all heap" or "spaghetti stack" allocation models.
                                
Resumo:
In this work, the dimensional synthesis of a spherical Parallel Manipulator (PM) with a -1S kinematic chain is presented. The goal of the synthesis is to find a set of parameters that defines the PM with the best performance in terms of workspace capabilities, dexterity and isotropy. The PM is parametrized in terms of a reference element, and a non-directed search of these parameters is carried out. First, the inverse kinematics and instantaneous kinematics of the mechanism are presented. The latter is found using the screw theory formulation. An algorithm that explores a bounded set of parameters and determines the corresponding value of global indexes is presented. The concepts of a novel global performance index and a compound index are introduced. Simulation results are shown and discussed. The best PMs found in terms of each performance index evaluated are locally analyzed in terms of its workspace and local dexterity. The relationship between the performance of the PM and its parameters is discussed, and a prototype with the best performance in terms of the compound index is presented and analyzed.
 
                    