Biblioteca Digital

190 resultados para PARALLEL COMPUTING

Performance metrics in a hybrid MPI-OpenMP based molecular dynamics simulation with short-range interactions

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We discuss the computational bottlenecks in molecular dynamics (MD) and describe the challenges in parallelizing the computation-intensive tasks. We present a hybrid algorithm using MPI (Message Passing Interface) with OpenMP threads for parallelizing a generalized MD computation scheme for systems with short range interatomic interactions. The algorithm is discussed in the context of nano-indentation of Chromium films with carbon indenters using the Embedded Atom Method potential for Cr-Cr interaction and the Morse potential for Cr-C interactions. We study the performance of our algorithm for a range of MPI-thread combinations and find the performance to depend strongly on the computational task and load sharing in the multi-core processor. The algorithm scaled poorly with MPI and our hybrid schemes were observed to outperform the pure message passing scheme, despite utilizing the same number of processors or cores in the cluster. Speed-up achieved by our algorithm compared favorably with that achieved by standard MD packages. (C) 2013 Elsevier Inc. All rights reserved.

Simulation of inhomogeneous distributions of ultracold atoms in an optical lattice via a massively parallel implementation of nonequilibrium strong-coupling perturbation theory

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a nonequilibrium strong-coupling approach to inhomogeneous systems of ultracold atoms in optical lattices. We demonstrate its application to the Mott-insulating phase of a two-dimensional Fermi-Hubbard model in the presence of a trap potential. Since the theory is formulated self-consistently, the numerical implementation relies on a massively parallel evaluation of the self-energy and the Green's function at each lattice site, employing thousands of CPUs. While the computation of the self-energy is straightforward to parallelize, the evaluation of the Green's function requires the inversion of a large sparse 10(d) x 10(d) matrix, with d > 6. As a crucial ingredient, our solution heavily relies on the smallness of the hopping as compared to the interaction strength and yields a widely scalable realization of a rapidly converging iterative algorithm which evaluates all elements of the Green's function. Results are validated by comparing with the homogeneous case via the local-density approximation. These calculations also show that the local-density approximation is valid in nonequilibrium setups without mass transport.

Parallel Flow-Sensitive Pointer Analysis by Graph-Rewriting

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Precise pointer analysis is a problem of interest to both the compiler and the program verification community. Flow-sensitivity is an important dimension of pointer analysis that affects the precision of the final result computed. Scaling flow-sensitive pointer analysis to millions of lines of code is a major challenge. Recently, staged flow-sensitive pointer analysis has been proposed, which exploits a sparse representation of program code created by staged analysis. In this paper we formulate the staged flow-sensitive pointer analysis as a graph-rewriting problem. Graph-rewriting has already been used for flow-insensitive analysis. However, formulating flow-sensitive pointer analysis as a graph-rewriting problem adds additional challenges due to the nature of flow-sensitivity. We implement our parallel algorithm using Intel Threading Building Blocks and demonstrate considerable scaling (upto 2.6x) for 8 threads on a set of 10 benchmarks. Compared to the sequential implementation of staged flow-sensitive analysis, a single threaded execution of our implementation performs better in 8 of the benchmarks.

REPRESENTING A CUBIC GRAPH AS THE INTERSECTION GRAPH OF AXIS-PARALLEL BOXES IN THREE DIMENSIONS

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We show that every graph of maximum degree 3 can be represented as the intersection graph of axis parallel boxes in three dimensions, that is, every vertex can be mapped to an axis parallel box such that two boxes intersect if and only if their corresponding vertices are adjacent. In fact, we construct a representation in which any two intersecting boxes touch just at their boundaries.

A constant factor approximation algorithm for boxicity of circular arc graphs

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The boxicity (resp. cubicity) of a graph G(V, E) is the minimum integer k such that G can be represented as the intersection graph of axis parallel boxes (resp. cubes) in R-k. Equivalently, it is the minimum number of interval graphs (resp. unit interval graphs) on the vertex set V, such that the intersection of their edge sets is E. The problem of computing boxicity (resp. cubicity) is known to be inapproximable, even for restricted graph classes like bipartite, co-bipartite and split graphs, within an O(n(1-epsilon))-factor for any epsilon > 0 in polynomial time, unless NP = ZPP. For any well known graph class of unbounded boxicity, there is no known approximation algorithm that gives n(1-epsilon)-factor approximation algorithm for computing boxicity in polynomial time, for any epsilon > 0. In this paper, we consider the problem of approximating the boxicity (cubicity) of circular arc graphs intersection graphs of arcs of a circle. Circular arc graphs are known to have unbounded boxicity, which could be as large as Omega(n). We give a (2 + 1/k) -factor (resp. (2 + log n]/k)-factor) polynomial time approximation algorithm for computing the boxicity (resp. cubicity) of any circular arc graph, where k >= 1 is the value of the optimum solution. For normal circular arc (NCA) graphs, with an NCA model given, this can be improved to an additive two approximation algorithm. The time complexity of the algorithms to approximately compute the boxicity (resp. cubicity) is O(mn + n(2)) in both these cases, and in O(mn + kn(2)) = O(n(3)) time we also get their corresponding box (resp. cube) representations, where n is the number of vertices of the graph and m is its number of edges. Our additive two approximation algorithm directly works for any proper circular arc graph, since their NCA models can be computed in polynomial time. (C) 2014 Elsevier B.V. All rights reserved.

Vertex Cover Gets Faster and Harder on Low Degree Graphs

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The problem of finding an optimal vertex cover in a graph is a classic NP-complete problem, and is a special case of the hitting set question. On the other hand, the hitting set problem, when asked in the context of induced geometric objects, often turns out to be exactly the vertex cover problem on restricted classes of graphs. In this work we explore a particular instance of such a phenomenon. We consider the problem of hitting all axis-parallel slabs induced by a point set P, and show that it is equivalent to the problem of finding a vertex cover on a graph whose edge set is the union of two Hamiltonian Paths. We show the latter problem to be NP-complete, and also give an algorithm to find a vertex cover of size at most k, on graphs of maximum degree four, whose running time is 1.2637(k) n(O(1)).

An open source massively parallel solver for Richards equation: Mechanistic modelling of water fluxes at the watershed scale

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we present a massively parallel open source solver for Richards equation, named the RichardsFOAM solver. This solver has been developed in the framework of the open source generalist computational fluid dynamics tool box OpenFOAM (R) and is capable to deal with large scale problems in both space and time. The source code for RichardsFOAM may be downloaded from the CPC program library website. It exhibits good parallel performances (up to similar to 90% parallel efficiency with 1024 processors both in strong and weak scaling), and the conditions required for obtaining such performances are analysed and discussed. These performances enable the mechanistic modelling of water fluxes at the scale of experimental watersheds (up to few square kilometres of surface area), and on time scales of decades to a century. Such a solver can be useful in various applications, such as environmental engineering for long term transport of pollutants in soils, water engineering for assessing the impact of land settlement on water resources, or in the study of weathering processes on the watersheds. (C) 2014 Elsevier B.V. All rights reserved.

Ligand 5,10,15,20-Tetra(N-methyl-4-pyridyl)porphine (TMPyP4) Prefers the Parallel Propeller-Type Human Telomeric G-Quadruplex DNA over Its Other Polymorphs

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The binding of ligand 5,10,15,20-tetra(N-methyl-4-pyridyl)porphine (TMPyP4) with telomeric and genomic G-quadruplex DNA has been extensively studied. However, a comparative study of interactions of TMPyP4 with different conformations of human telomeric G-quadruplex DNA, namely, parallel propeller-type (PP), antiparallel basket-type (AB), and mixed hybrid-type (MH) G-quadruplex DNA, has not been done. We considered all the possible binding sites in each of the G-quadruplex DNA structures and docked TMPyP4 to each one of them. The resultant most potent sites for binding were analyzed from the mean binding free energy of the complexes. Molecular dynamics simulations were then carried out, and analysis of the binding free energy of the TMPyP4-G-quadruplex complex showed that the binding of TMPyP4 with parallel propeller-type G-quadruplex DNA is preferred over the other two G-quadruplex DNA conformations. The results obtained from the change in solvent excluded surface area (SESA) and solvent accessible surface area (SASA) also support the more pronounced binding of the ligand with the parallel propeller-type G-quadruplex DNA.

Compiler/Runtime Framework for Dynamic Dataflow Parallelization of Tiled Programs

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Task-parallel languages are increasingly popular. Many of them provide expressive mechanisms for intertask synchronization. For example, OpenMP 4.0 will integrate data-driven execution semantics derived from the StarSs research language. Compared to the more restrictive data-parallel and fork-join concurrency models, the advanced features being introduced into task-parallelmodels in turn enable improved scalability through load balancing, memory latency hiding, mitigation of the pressure on memory bandwidth, and, as a side effect, reduced power consumption. In this article, we develop a systematic approach to compile loop nests into concurrent, dynamically constructed graphs of dependent tasks. We propose a simple and effective heuristic that selects the most profitable parallelization idiom for every dependence type and communication pattern. This heuristic enables the extraction of interband parallelism (cross-barrier parallelism) in a number of numerical computations that range from linear algebra to structured grids and image processing. The proposed static analysis and code generation alleviates the burden of a full-blown dependence resolver to track the readiness of tasks at runtime. We evaluate our approach and algorithms in the PPCG compiler, targeting OpenStream, a representative dataflow task-parallel language with explicit intertask dependences and a lightweight runtime. Experimental results demonstrate the effectiveness of the approach.

Multispectral Bayesian reconstruction technique for real-time two color fluorescence microscopy

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We have developed a real-time imaging method for two-color wide-field fluorescence microscopy using a combined approach that integrates multi-spectral imaging and Bayesian image reconstruction technique. To enable simultaneous observation of two dyes (primary and secondary), we exploit their spectral properties that allow parallel recording in both the channels. The key advantage of this technique is the use of a single wavelength of light to excite both the primary dye and the secondary dye. The primary and secondary dyes respectively give rise to fluorescence and bleed-through signal, which after normalization were merged to obtain two-color 3D images. To realize real-time imaging, we employed maximum likelihood (ML) and maximum a posteriori (MAP) techniques on a high-performance computing platform (GPU). The results show two-fold improvement in contrast while the signal-to-background ratio (SBR) is improved by a factor of 4. We report a speed boost of 52 and 350 for 2D and 3D images respectively. Using this system, we have studied the real-time protein aggregation in yeast cells and HeLa cells that exhibits dot-like protein distribution. The proposed technique has the ability to temporally resolve rapidly occurring biological events.

Common-Mode Injection PWM for Parallel Converters

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The ac-side terminal voltages of parallel-connected converters are different if the line reactive drops of the individual converters are different. This could result either from differences in per-phase inductances or from differences in the line currents of the converters. In such cases, the modulating signals are different for the converters. Hence, the common-mode (CM) voltages for the converters, injected by conventional space vector pulsewidth modulation (CSVPWM) to increase dc-bus utilization, are different. Consequently, significant low-frequency zero-sequence circulating currents result. This paper proposes a new modulation method for parallel-connected converters with unequal terminal voltages. This method does not cause low-frequency zero-sequence circulating currents and is comparable with CSVPWM in terms of dc-bus utilization and device power loss. Experimental results are presented at a power level of 150 kVA from a circulating-power test setup, where the differences in converter terminal voltages are quite significant.

3-D GPU Based Real Time Diffuse Optical Tomographic System

Relevância:

20.00% 20.00%

Publicador:

Resumo:

3-Dimensional Diffuse Optical Tomographic (3-D DOT) image reconstruction algorithm is computationally complex and requires excessive matrix computations and thus hampers reconstruction in real time. In this paper, we present near real time 3D DOT image reconstruction that is based on Broyden approach for updating Jacobian matrix. The Broyden method simplifies the algorithm by avoiding re-computation of the Jacobian matrix in each iteration. We have developed CPU and heterogeneous CPU/GPU code for 3D DOT image reconstruction in C and MatLab programming platform. We have used Compute Unified Device Architecture (CUDA) programming framework and CUDA linear algebra library (CULA) to utilize the massively parallel computational power of GPUs (NVIDIA Tesla K20c). The computation time achieved for C program based implementation for a CPU/GPU system for 3 planes measurement and FEM mesh size of 19172 tetrahedral elements is 806 milliseconds for an iteration.

History of Computing in India: 1955-2010

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The history of computing in India is inextricably intertwined with two interacting forces: the political climate determined by the political party in power) and the government policies mainly driven by the technocrats and bureaucrats who acted within the boundaries drawn by the political party in power. There were four break points (which occurred in 1970, 1978, 1991 and 1998) that changed the direction of the development of computers and their applications. This article explains why these breaks occurred and how they affected the history of computing in India.

A 0.5-2.0 GHz injection locked oscillator cascade for parallel wideband RF spectrum sensing

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An area-efficient, wideband RF frequency synthesizer, which simultaneously generates multiple local oscillator (LO) signals, is designed. It is suitable for parallel wideband RF spectrum sensing in cognitive radios. The frequency synthesizer consists of an injection locked oscillator cascade (ILOC) where all the LO signals are derived from a single reference oscillator. The ILOC is implemented in a 130-nm technology with an active area of . It generates 4 uniformly spaced LO carrier frequencies from 500 MHz to 2 GHz. This design is the first known implementation of a CMOS based ILOC for wide-band RF spectrum sensing applications.

Prediction of Queue Waiting Times for Metascheduling on Parallel Batch Systems

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Prediction of queue waiting times of jobs submitted to production parallel batch systems is important to provide overall estimates to users and can also help meta-schedulers make scheduling decisions. In this work, we have developed a framework for predicting ranges of queue waiting times for jobs by employing multi-class classification of similar jobs in history. Our hierarchical prediction strategy first predicts the point wait time of a job using dynamic k-Nearest Neighbor (kNN) method. It then performs a multi-class classification using Support Vector Machines (SVMs) among all the classes of the jobs. The probabilities given by the SVM for the class predicted using k-NN and its neighboring classes are used to provide a set of ranges of predicted wait times with probabilities. We have used these predictions and probabilities in a meta-scheduling strategy that distributes jobs to different queues/sites in a multi-queue/grid environment for minimizing wait times of the jobs. Experiments with different production supercomputer job traces show that our prediction strategies can give correct predictions for about 77-87% of the jobs, and also result in about 12% improved accuracy when compared to the next best existing method. Experiments with our meta-scheduling strategy using different production and synthetic job traces for various system sizes, partitioning schemes and different workloads, show that the meta-scheduling strategy gives much improved performance when compared to existing scheduling policies by reducing the overall average queue waiting times of the jobs by about 47%.

«
1
2
...
5
6
7
8
9
10
11
12
13
»