774 resultados para GPGPU Parallel Computing


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Application of optimization algorithm to PDE modeling groundwater remediation can greatly reduce remediation cost. However, groundwater remediation analysis requires a computational expensive simulation, therefore, effective parallel optimization could potentially greatly reduce computational expense. The optimization algorithm used in this research is Parallel Stochastic radial basis function. This is designed for global optimization of computationally expensive functions with multiple local optima and it does not require derivatives. In each iteration of the algorithm, an RBF is updated based on all the evaluated points in order to approximate expensive function. Then the new RBF surface is used to generate the next set of points, which will be distributed to multiple processors for evaluation. The criteria of selection of next function evaluation points are estimated function value and distance from all the points known. Algorithms created for serial computing are not necessarily efficient in parallel so Parallel Stochastic RBF is different algorithm from its serial ancestor. The application for two Groundwater Superfund Remediation sites, Umatilla Chemical Depot, and Former Blaine Naval Ammunition Depot. In the study, the formulation adopted treats pumping rates as decision variables in order to remove plume of contaminated groundwater. Groundwater flow and contamination transport is simulated with MODFLOW-MT3DMS. For both problems, computation takes a large amount of CPU time, especially for Blaine problem, which requires nearly fifty minutes for a simulation for a single set of decision variables. Thus, efficient algorithm and powerful computing resource are essential in both cases. The results are discussed in terms of parallel computing metrics i.e. speedup and efficiency. We find that with use of up to 24 parallel processors, the results of the parallel Stochastic RBF algorithm are excellent with speed up efficiencies close to or exceeding 100%.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The simulated annealing optimization technique has been successfully applied to a number of electrical engineering problems, including transmission system expansion planning. The method is general in the sense that it does not assume any particular property of the problem being solved, such as linearity or convexity. Moreover, it has the ability to provide solutions arbitrarily close to an optimum (i.e. it is asymptotically convergent) as the cooling process slows down. The drawback of the approach is the computational burden: finding optimal solutions may be extremely expensive in some cases. This paper presents a Parallel Simulated Annealing, PSA, algorithm for solving the long term transmission network expansion planning problem. A strategy that does not affect the basic convergence properties of the Sequential Simulated Annealing algorithm have been implementeded and tested. The paper investigates the conditions under which the parallel algorithm is most efficient. The parallel implementations have been tested on three example networks: a small 6-bus network, and two complex real-life networks. Excellent results are reported in the test section of the paper: in addition to reductions in computing times, the Parallel Simulated Annealing algorithm proposed in the paper has shown significant improvements in solution quality for the largest of the test networks.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The increasing amount of sequences stored in genomic databases has become unfeasible to the sequential analysis. Then, the parallel computing brought its power to the Bioinformatics through parallel algorithms to align and analyze the sequences, providing improvements mainly in the running time of these algorithms. In many situations, the parallel strategy contributes to reducing the computational complexity of the big problems. This work shows some results obtained by an implementation of a parallel score estimating technique for the score matrix calculation stage, which is the first stage of a progressive multiple sequence alignment. The performance and quality of the parallel score estimating are compared with the results of a dynamic programming approach also implemented in parallel. This comparison shows a significant reduction of running time. Moreover, the quality of the final alignment, using the new strategy, is analyzed and compared with the quality of the approach with dynamic programming.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Sao Paulo State Research Foundation-FAPESP

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper presents a new parallel methodology for calculating the determinant of matrices of the order n, with computational complexity O(n), using the Gauss-Jordan Elimination Method and Chio's Rule as references. We intend to present our step-by-step methodology using clear mathematical language, where we will demonstrate how to calculate the determinant of a matrix of the order n in an analytical format. We will also present a computational model with one sequential algorithm and one parallel algorithm using a pseudo-code.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Complex networks analysis is a very popular topic in computer science. Unfortunately this networks, extracted from different contexts, are usually very large and the analysis may be very complicated: computation of metrics on these structures could be very complex. Among all metrics we analyse the extraction of subnetworks called communities: they are groups of nodes that probably play the same role within the whole structure. Communities extraction is an interesting operation in many different fields (biology, economics,...). In this work we present a parallel community detection algorithm that can operate on networks with huge number of nodes and edges. After an introduction to graph theory and high performance computing, we will explain our design strategies and our implementation. Then, we will show some performance evaluation made on a distributed memory architectures i.e. the supercomputer IBM-BlueGene/Q "Fermi" at the CINECA supercomputing center, Italy, and we will comment our results.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We describe Janus, a massively parallel FPGA-based computer optimized for the simulation of spin glasses, theoretical models for the behavior of glassy materials. FPGAs (as compared to GPUs or many-core processors) provide a complementary approach to massively parallel computing. In particular, our model problem is formulated in terms of binary variables, and floating-point operations can be (almost) completely avoided. The FPGA architecture allows us to run many independent threads with almost no latencies in memory access, thus updating up to 1024 spins per cycle. We describe Janus in detail and we summarize the physics results obtained in four years of operation of this machine; we discuss two types of physics applications: long simulations on very large systems (which try to mimic and provide understanding about the experimental non equilibrium dynamics), and low-temperature equilibrium simulations using an artificial parallel tempering dynamics. The time scale of our non-equilibrium simulations spans eleven orders of magnitude (from picoseconds to a tenth of a second). On the other hand, our equilibrium simulations are unprecedented both because of the low temperatures reached and for the large systems that we have brought to equilibrium. A finite-time scaling ansatz emerges from the detailed comparison of the two sets of simulations. Janus has made it possible to perform spin glass simulations that would take several decades on more conventional architectures. The paper ends with an assessment of the potential of possible future versions of the Janus architecture, based on state-of-the-art technology.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A parallel algorithm for image noise removal is proposed. The algorithm is based on peer group concept and uses a fuzzy metric. An optimization study on the use of the CUDA platform to remove impulsive noise using this algorithm is presented. Moreover, an implementation of the algorithm on multi-core platforms using OpenMP is presented. Performance is evaluated in terms of execution time and a comparison of the implementation parallelised in multi-core, GPUs and the combination of both is conducted. A performance analysis with large images is conducted in order to identify the amount of pixels to allocate in the CPU and GPU. The observed time shows that both devices must have work to do, leaving the most to the GPU. Results show that parallel implementations of denoising filters on GPUs and multi-cores are very advisable, and they open the door to use such algorithms for real-time processing.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The Lattice Solid Model has been used successfully as a virtual laboratory to simulate fracturing of rocks, the dynamics of faults, earthquakes and gouge processes. However, results from those simulations show that in order to make the next step towards more realistic experiments it will be necessary to use models containing a significantly larger number of particles than current models. Thus, those simulations will require a greatly increased amount of computational resources. Whereas the computing power provided by single processors can be expected to increase according to Moore's law, i.e., to double every 18-24 months, parallel computers can provide significantly larger computing power today. In order to make this computing power available for the simulation of the microphysics of earthquakes, a parallel version of the Lattice Solid Model has been implemented. Benchmarks using large models with several millions of particles have shown that the parallel implementation of the Lattice Solid Model can achieve a high parallel-efficiency of about 80% for large numbers of processors on different computer architectures.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Experimental and theoretical studies have shown the importance of stochastic processes in genetic regulatory networks and cellular processes. Cellular networks and genetic circuits often involve small numbers of key proteins such as transcriptional factors and signaling proteins. In recent years stochastic models have been used successfully for studying noise in biological pathways, and stochastic modelling of biological systems has become a very important research field in computational biology. One of the challenge problems in this field is the reduction of the huge computing time in stochastic simulations. Based on the system of the mitogen-activated protein kinase cascade that is activated by epidermal growth factor, this work give a parallel implementation by using OpenMP and parallelism across the simulation. Special attention is paid to the independence of the generated random numbers in parallel computing, that is a key criterion for the success of stochastic simulations. Numerical results indicate that parallel computers can be used as an efficient tool for simulating the dynamics of large-scale genetic regulatory networks and cellular processes

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A scenario-based two-stage stochastic programming model for gas production network planning under uncertainty is usually a large-scale nonconvex mixed-integer nonlinear programme (MINLP), which can be efficiently solved to global optimality with nonconvex generalized Benders decomposition (NGBD). This paper is concerned with the parallelization of NGBD to exploit multiple available computing resources. Three parallelization strategies are proposed, namely, naive scenario parallelization, adaptive scenario parallelization, and adaptive scenario and bounding parallelization. Case study of two industrial natural gas production network planning problems shows that, while the NGBD without parallelization is already faster than a state-of-the-art global optimization solver by an order of magnitude, the parallelization can improve the efficiency by several times on computers with multicore processors. The adaptive scenario and bounding parallelization achieves the best overall performance among the three proposed parallelization strategies.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

As the complexity of parallel applications increase, the performance limitations resulting from computational load imbalance become dominant. Mapping the problem space to the processors in a parallel machine in a manner that balances the workload of each processors will typically reduce the run-time. In many cases the computation time required for a given calculation cannot be predetermined even at run-time and so static partition of the problem returns poor performance. For problems in which the computational load across the discretisation is dynamic and inhomogeneous, for example multi-physics problems involving fluid and solid mechanics with phase changes, the workload for a static subdomain will change over the course of a computation and cannot be estimated beforehand. For such applications the mapping of loads to process is required to change dynamically, at run-time in order to maintain reasonable efficiency. The issue of dynamic load balancing are examined in the context of PHYSICA, a three dimensional unstructured mesh multi-physics continuum mechanics computational modelling code.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The central product of the DRAMA (Dynamic Re-Allocation of Meshes for parallel Finite Element Applications) project is a library comprising a variety of tools for dynamic re-partitioning of unstructured Finite Element (FE) applications. The input to the DRAMA library is the computational mesh, and corresponding costs, partitioned into sub-domains. The core library functions then perform a parallel computation of a mesh re-allocation that will re-balance the costs based on the DRAMA cost model. We discuss the basic features of this cost model, which allows a general approach to load identification, modelling and imbalance minimisation. Results from crash simulations are presented which show the necessity for multi-phase/multi-constraint partitioning components.