855 resultados para Parallel Computation
Resumo:
Nowadays computing platforms consist of a very large number of components that require to be supplied with diferent voltage levels and power requirements. Even a very small platform, like a handheld computer, may contain more than twenty diferent loads and voltage regulators. The power delivery designers of these systems are required to provide, in a very short time, the right power architecture that optimizes the performance, meets electrical specifications plus cost and size targets. The appropriate selection of the architecture and converters directly defines the performance of a given solution. Therefore, the designer needs to be able to evaluate a significant number of options in order to know with good certainty whether the selected solutions meet the size, energy eficiency and cost targets. The design dificulties of selecting the right solution arise due to the wide range of power conversion products provided by diferent manufacturers. These products range from discrete components (to build converters) to complete power conversion modules that employ diferent manufacturing technologies. Consequently, in most cases it is not possible to analyze all the alternatives (combinations of power architectures and converters) that can be built. The designer has to select a limited number of converters in order to simplify the analysis. In this thesis, in order to overcome the mentioned dificulties, a new design methodology for power supply systems is proposed. This methodology integrates evolutionary computation techniques in order to make possible analyzing a large number of possibilities. This exhaustive analysis helps the designer to quickly define a set of feasible solutions and select the best trade-off in performance according to each application. The proposed approach consists of two key steps, one for the automatic generation of architectures and other for the optimized selection of components. In this thesis are detailed the implementation of these two steps. The usefulness of the methodology is corroborated by contrasting the results using real problems and experiments designed to test the limits of the algorithms.
Resumo:
We show a method for parallelizing top down dynamic programs in a straightforward way by a careful choice of a lock-free shared hash table implementation and randomization of the order in which the dynamic program computes its subproblems. This generic approach is applied to dynamic programs for knapsack, shortest paths, and RNA structure alignment, as well as to a state-of-the-art solution for minimizing the máximum number of open stacks. Experimental results are provided on three different modern multicore architectures which show that this parallelization is effective and reasonably scalable. In particular, we obtain over 10 times speedup for 32 threads on the open stacks problem.
Resumo:
Since the early days of logic programming, researchers in the field realized the potential for exploitation of parallelism present in the execution of logic programs. Their high-level nature, the presence of nondeterminism, and their referential transparency, among other characteristics, make logic programs interesting candidates for obtaining speedups through parallel execution. At the same time, the fact that the typical applications of logic programming frequently involve irregular computations, make heavy use of dynamic data structures with logical variables, and involve search and speculation, makes the techniques used in the corresponding parallelizing compilers and run-time systems potentially interesting even outside the field. The objective of this article is to provide a comprehensive survey of the issues arising in parallel execution of logic programming languages along with the most relevant approaches explored to date in the field. Focus is mostly given to the challenges emerging from the parallel execution of Prolog programs. The article describes the major techniques used for shared memory implementation of Or-parallelism, And-parallelism, and combinations of the two. We also explore some related issues, such as memory management, compile-time analysis, and execution visualization.
Resumo:
Dendritic computation is a term that has been in neuro physiological research for a long time [1]. It is still controversial and far for been clarified within the concepts of both computation and neurophysiology [2], [3]. In any case, it hasnot been integrated neither in a formal computational scheme or structure, nor into formulations of artificial neural nets. Our objective here is to formulate a type of distributed computation that resembles dendritic trees, in such a way that it shows the advantages of neural network distributed computation, mostly the reliability that is shown under the existence of holes (scotomas) in the computing net, without ?blind spots?.
Resumo:
We have developed a new projector model specifically tailored for fast list-mode tomographic reconstructions in Positron emission tomography (PET) scanners with parallel planar detectors. The model provides an accurate estimation of the probability distribution of coincidence events defined by pairs of scintillating crystals. This distribution is parameterized with 2D elliptical Gaussian functions defined in planes perpendicular to the main axis of the tube of response (TOR). The parameters of these Gaussian functions have been obtained by fitting Monte Carlo simulations that include positron range, acolinearity of gamma rays, as well as detector attenuation and scatter effects. The proposed model has been applied efficiently to list-mode reconstruction algorithms. Evaluation with Monte Carlo simulations over a rotating high resolution PET scanner indicates that this model allows to obtain better recovery to noise ratio in OSEM (ordered-subsets, expectation-maximization) reconstruction, if compared to list-mode reconstruction with symmetric circular Gaussian TOR model, and histogram-based OSEM with precalculated system matrix using Monte Carlo simulated models and symmetries.
Resumo:
The manipulation and handling of an ever increasing volume of data by current data-intensive applications require novel techniques for e?cient data management. Despite recent advances in every aspect of data management (storage, access, querying, analysis, mining), future applications are expected to scale to even higher degrees, not only in terms of volumes of data handled but also in terms of users and resources, often making use of multiple, pre-existing autonomous, distributed or heterogeneous resources.
Resumo:
Zernike polynomials are a well known set of functions that find many applications in image or pattern characterization because they allow to construct shape descriptors that are invariant against translations, rotations or scale changes. The concepts behind them can be extended to higher dimension spaces, making them also fit to describe volumetric data. They have been less used than their properties might suggest due to their high computational cost. We present a parallel implementation of 3D Zernike moments analysis, written in C with CUDA extensions, which makes it practical to employ Zernike descriptors in interactive applications, yielding a performance of several frames per second in voxel datasets about 2003 in size. In our contribution, we describe the challenges of implementing 3D Zernike analysis in a general-purpose GPU. These include how to deal with numerical inaccuracies, due to the high precision demands of the algorithm, or how to deal with the high volume of input data so that it does not become a bottleneck for the system.
Resumo:
This paper outlines the problems found in the parallelization of SPH (Smoothed Particle Hydrodynamics) algorithms using Graphics Processing Units. Different results of some parallel GPU implementations in terms of the speed-up and the scalability compared to the CPU sequential codes are shown. The most problematic stage in the GPU-SPH algorithms is the one responsible for locating neighboring particles and building the vectors where this information is stored, since these specific algorithms raise many dificulties for a data-level parallelization. Because of the fact that the neighbor location using linked lists does not show enough data-level parallelism, two new approaches have been pro- posed to minimize bank conflicts in the writing and subsequent reading of the neighbor lists. The first strategy proposes an efficient coordination between CPU-GPU, using GPU algorithms for those stages that allow a straight forward parallelization, and sequential CPU algorithms for those instructions that involve some kind of vector reduction. This coordination provides a relatively orderly reading of the neighbor lists in the interactions stage, achieving a speed-up factor of x47 in this stage. However, since the construction of the neighbor lists is quite expensive, it is achieved an overall speed-up of x41. The second strategy seeks to maximize the use of the GPU in the neighbor's location process by executing a specific vector sorting algorithm that allows some data-level parallelism. Al- though this strategy has succeeded in improving the speed-up on the stage of neighboring location, the global speed-up on the interactions stage falls, due to inefficient reading of the neighbor vectors. Some changes to these strategies are proposed, aimed at maximizing the computational load of the GPU and using the GPU texture-units, in order to reach the maximum speed-up for such codes. Different practical applications have been added to the mentioned GPU codes. First, the classical dam-break problem is studied. Second, the wave impact of the sloshing fluid contained in LNG vessel tanks is also simulated as a practical example of particle methods
Resumo:
This article describes a new visual servo control and strategies that are used to carry out dynamic tasks by the Robotenis platform. This platform is basically a parallel robot that is equipped with an acquisition and processing system of visual information, its main feature is that it has a completely open architecture control, and planned in order to design, implement, test and compare control strategies and algorithms (visual and actuated joint controllers). Following sections describe a new visual control strategy specially designed to track and intercept objects in 3D space. The results are compared with a controller shown in previous woks, where the end effector of the robot keeps a constant distance from the tracked object. In this work, the controller is specially designed in order to allow changes in the tracking reference. Changes in the tracking reference can be used to grip an object that is under movement, or as in this case, hitting a hanging Ping-Pong ball. Lyapunov stability is taken into account in the controller design.
Resumo:
The main purpose of robot calibration is the correction of the possible errors in the robot parameters. This paper presents a method for a kinematic calibration of a parallel robot that is equipped with one camera in hand. In order to preserve the mechanical configuration of the robot, the camera is utilized to acquire incremental positions of the end effector from a spherical object that is fixed in the word reference frame. The positions of the end effector are related to incremental positions of resolvers of the motors of the robot, and a kinematic model of the robot is used to find a new group of parameters which minimizes errors in the kinematic equations. Additionally, properties of the spherical object and intrinsic camera parameters are utilized to model the projection of the object in the image and improving spatial measurements. Finally, the robotic system is designed to carry out tracking tasks and the calibration of the robot is validated by means of integrating the errors of the visual controller.
Resumo:
Rms voltage regulation may be an attractive possibility for controlling power inverters. Combined with a Hall Effect sensor for current control, it keeps its parallel operation capability while increasing its noise immunity, which may lead to a reduction of the Total Harmonic Distortion (THD). Besides, as voltage regulation is designed in DC, a simple PI regulator can provide accurate voltage tracking. Nevertheless, this approach does not lack drawbacks. Its narrow voltage bandwidth makes transients last longer and it increases the voltage THD when feeding non-linear loads, such as rectifying stages. On the other hand, the implementation can fall into offset voltage error. Furthermore, the information of the output voltage phase is hidden for the control as well, making the synchronization of a 3-phase setup not trivial. This paper explains the concept, design and implementation of the whole control scheme, in an on board inverter able to run in parallel and within a 3-phase setup. Special attention is paid to solve the problems foreseen at implementation level: a third analog loop accounts for the offset level is added and a digital algorithm guarantees 3-phase voltage synchronization.
Resumo:
In this paper we will see how the efficiency of the MBS simulations can be improved in two different ways, by considering both an explicit and implicit semi-recursive formulation. The explicit method is based on a double velocity transformation that involves the solution of a redundant but compatible system of equations. The high computational cost of this operation has been drastically reduced by taking into account the sparsity pattern of the system. Regarding this, the goal of this method is the introduction of MA48, a high performance mathematical library provided by Harwell Subroutine Library. The second method proposed in this paper has the particularity that, depending on the case, between 70 and 85% of the computation time is devoted to the evaluation of forces derivatives with respect to the relative position and velocity vectors. Keeping in mind that evaluating these derivatives can be decomposed into concurrent tasks, the main goal of this paper lies on a successful and straightforward parallel implementation that have led to a substantial improvement with a speedup of 3.2 by keeping all the cores busy in a quad-core processor and distributing the workload between them, achieving on this way a huge time reduction by doing an ideal CPU usage
Resumo:
This paper presents a theoretical analysis and an optimization method for envelope amplifier. Highly efficient envelope amplifiers based on a switching converter in parallel or series with a linear regulator have been analyzed and optimized. The results of the optimization process have been shown and these two architectures are compared regarding their complexity and efficiency. The optimization method that is proposed is based on the previous knowledge about the transmitted signal type (OFDM, WCDMA...) and it can be applied to any signal type as long as the envelope probability distribution is known. Finally, it is shown that the analyzed architectures have an inherent efficiency limit.
Resumo:
Several types of parallelism can be exploited in logic programs while preserving correctness and efficiency, i.e. ensuring that the parallel execution obtains the same results as the sequential one and the amount of work performed is not greater. However, such results do not take into account a number of overheads which appear in practice, such as process creation and scheduling, which can induce a slow-down, or, at least, limit speedup, if they are not controlled in some way. This paper describes a methodology whereby the granularity of parallel tasks, i.e. the work available under them, is efficiently estimated and used to limit parallelism so that the effect of such overheads is controlled. The run-time overhead associated with the approach is usually quite small, since as much work is done at compile time as possible. Also,a number of run-time optimizations are proposed. Moreover, a static analysis of the overhead associated with the granularity control process is performed in order to decide its convenience. The performance improvements resulting from the incorporation of grain size control are shown to be quite good, specially for systems with medium to large parallel execution overheads.
Resumo:
Compilation techniques such as those portrayed by the Warren Abstract Machine(WAM) have greatly improved the speed of execution of logic programs. The research presented herein is geared towards providing additional performance to logic programs through the use of parallelism, while preserving the conventional semantics of logic languages. Two áreas to which special attention is given are the preservation of sequential performance and storage efficiency, and the use of low overhead mechanisms for controlling parallel execution. Accordingly, the techniques used for supporting parallelism are efficient extensions of those which have brought high inferencing speeds to sequential implementations. At a lower level, special attention is also given to design and simulation detail and to the architectural implications of the execution model behavior. This paper offers an overview of the basic concepts and techniques used in the parallel design, simulation tools used, and some of the results obtained to date.