163 resultados para structured parallel computations


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we introduce an analytical technique based on queueing networks and Petri nets for making a performance analysis of dataflow computations when executed on the Manchester machine. This technique is also applicable for the analysis of parallel computations on multiprocessors. We characterize the parallelism in dataflow computations through a four-parameter characterization, namely, the minimum parallelism, the maximum parallelism, the average parallelism and the variance in parallelism. We observe through detailed investigation of our analytical models that the average parallelism is a good characterization of the dataflow computations only as long as the variance in parallelism is small. However, significant difference in performance measures will result when the variance in parallelism is comparable to or higher than the average parallelism.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Parallel execution of computational mechanics codes requires efficient mesh-partitioning techniques. These mesh-partitioning techniques divide the mesh into specified number of submeshes of approximately the same size and at the same time, minimise the interface nodes of the submeshes. This paper describes a new mesh partitioning technique, employing Genetic Algorithms. The proposed algorithm operates on the deduced graph (dual or nodal graph) of the given finite element mesh rather than directly on the mesh itself. The algorithm works by first constructing a coarse graph approximation using an automatic graph coarsening method. The coarse graph is partitioned and the results are interpolated onto the original graph to initialise an optimisation of the graph partition problem. In practice, hierarchy of (usually more than two) graphs are used to obtain the final graph partition. The proposed partitioning algorithm is applied to graphs derived from unstructured finite element meshes describing practical engineering problems and also several example graphs related to finite element meshes given in the literature. The test results indicate that the proposed GA based graph partitioning algorithm generates high quality partitions and are superior to spectral and multilevel graph partitioning algorithms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper may be considered as a sequel to one of our earlier works pertaining to the development of an upwind algorithm for meshless solvers. While the earlier work dealt with the development of an inviscid solution procedure, the present work focuses on its extension to viscous flows. A robust viscous discretization strategy is chosen based on positivity of a discrete Laplacian. This work projects meshless solver as a viable cartesian grid methodology. The point distribution required for the meshless solver is obtained from a hybrid cartesian gridding strategy. Particularly considering the importance of an hybrid cartesian mesh for RANS computations, the difficulties encountered in a conventional least squares based discretization strategy are highlighted. In this context, importance of discretization strategies which exploit the local structure in the grid is presented, along with a suitable point sorting strategy. Of particular interest is the proposed discretization strategies (both inviscid and viscous) within the structured grid block; a rotated update for the inviscid part and a Green-Gauss procedure based positive update for the viscous part. Both these procedures conveniently avoid the ill-conditioning associated with a conventional least squares procedure in the critical region of structured grid block. The robustness and accuracy of such a strategy is demonstrated on a number of standard test cases including a case of a multi-element airfoil. The computational efficiency of the proposed meshless solver is also demonstrated. (C) 2010 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we develop a multithreaded VLSI processor linear array architecture to render complex environments based on the radiosity approach. The processing elements are identical and multithreaded. They work in Single Program Multiple Data (SPMD) mode. A new algorithm to do the radiosity computations based on the progressive refinement approach[2] is proposed. Simulation results indicate that the architecture is latency tolerant and scalable. It is shown that a linear array of 128 uni-threaded processing elements sustains a throughput close to 0.4 million patches/sec.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Morse-Smale complex is a topological structure that captures the behavior of the gradient of a scalar function on a manifold. This paper discusses scalable techniques to compute the Morse-Smale complex of scalar functions defined on large three-dimensional structured grids. Computing the Morse-Smale complex of three-dimensional domains is challenging as compared to two-dimensional domains because of the non-trivial structure introduced by the two types of saddle criticalities. We present a parallel shared-memory algorithm to compute the Morse-Smale complex based on Forman's discrete Morse theory. The algorithm achieves scalability via synergistic use of the CPU and the GPU. We first prove that the discrete gradient on the domain can be computed independently for each cell and hence can be implemented on the GPU. Second, we describe a two-step graph traversal algorithm to compute the 1-saddle-2-saddle connections efficiently and in parallel on the CPU. Simultaneously, the extremasaddle connections are computed using a tree traversal algorithm on the GPU.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Critical applications like cyclone tracking and earthquake modeling require simultaneous high-performance simulations and online visualization for timely analysis. Faster simulations and simultaneous visualization enable scientists provide real-time guidance to decision makers. In this work, we have developed an integrated user-driven and automated steering framework that simultaneously performs numerical simulations and efficient online remote visualization of critical weather applications in resource-constrained environments. It considers application dynamics like the criticality of the application and resource dynamics like the storage space, network bandwidth and available number of processors to adapt various application and resource parameters like simulation resolution, simulation rate and the frequency of visualization. We formulate the problem of finding an optimal set of simulation parameters as a linear programming problem. This leads to 30% higher simulation rate and 25-50% lesser storage consumption than a naive greedy approach. The framework also provides the user control over various application parameters like region of interest and simulation resolution. We have also devised an adaptive algorithm to reduce the lag between the simulation and visualization times. Using experiments with different network bandwidths, we find that our adaptive algorithm is able to reduce lag as well as visualize the most representative frames.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Adaptive Mesh Refinement is a method which dynamically varies the spatio-temporal resolution of localized mesh regions in numerical simulations, based on the strength of the solution features. In-situ visualization plays an important role for analyzing the time evolving characteristics of the domain structures. Continuous visualization of the output data for various timesteps results in a better study of the underlying domain and the model used for simulating the domain. In this paper, we develop strategies for continuous online visualization of time evolving data for AMR applications executed on GPUs. We reorder the meshes for computations on the GPU based on the users input related to the subdomain that he wants to visualize. This makes the data available for visualization at a faster rate. We then perform asynchronous executions of the visualization steps and fix-up operations on the CPUs while the GPU advances the solution. By performing experiments on Tesla S1070 and Fermi C2070 clusters, we found that our strategies result in 60% improvement in response time and 16% improvement in the rate of visualization of frames over the existing strategy of performing fix-ups and visualization at the end of the timesteps.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The contour tree is a topological abstraction of a scalar field that captures evolution in level set connectivity. It is an effective representation for visual exploration and analysis of scientific data. We describe a work-efficient, output sensitive, and scalable parallel algorithm for computing the contour tree of a scalar field defined on a domain that is represented using either an unstructured mesh or a structured grid. A hybrid implementation of the algorithm using the GPU and multi-core CPU can compute the contour tree of an input containing 16 million vertices in less than ten seconds with a speedup factor of upto 13. Experiments based on an implementation in a multi-core CPU environment show near-linear speedup for large data sets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Task-parallel languages are increasingly popular. Many of them provide expressive mechanisms for intertask synchronization. For example, OpenMP 4.0 will integrate data-driven execution semantics derived from the StarSs research language. Compared to the more restrictive data-parallel and fork-join concurrency models, the advanced features being introduced into task-parallelmodels in turn enable improved scalability through load balancing, memory latency hiding, mitigation of the pressure on memory bandwidth, and, as a side effect, reduced power consumption. In this article, we develop a systematic approach to compile loop nests into concurrent, dynamically constructed graphs of dependent tasks. We propose a simple and effective heuristic that selects the most profitable parallelization idiom for every dependence type and communication pattern. This heuristic enables the extraction of interband parallelism (cross-barrier parallelism) in a number of numerical computations that range from linear algebra to structured grids and image processing. The proposed static analysis and code generation alleviates the burden of a full-blown dependence resolver to track the readiness of tasks at runtime. We evaluate our approach and algorithms in the PPCG compiler, targeting OpenStream, a representative dataflow task-parallel language with explicit intertask dependences and a lightweight runtime. Experimental results demonstrate the effectiveness of the approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A block-structured adaptive mesh refinement (AMR) technique has been used to obtain numerical solutions for many scientific applications. Some block-structured AMR approaches have focused on forming patches of non-uniform sizes where the size of a patch can be tuned to the geometry of a region of interest. In this paper, we develop strategies for adaptive execution of block-structured AMR applications on GPUs, for hyperbolic directionally split solvers. While effective hybrid execution strategies exist for applications with uniform patches, our work considers efficient execution of non-uniform patches with different workloads. Our techniques include bin-packing work units to load balance GPU computations, adaptive asynchronism between CPU and GPU executions using a knapsack formulation, and scheduling communications for multi-GPU executions. Our experiments with synthetic and real data, for single-GPU and multi-GPU executions, on Tesla S1070 and Fermi C2070 clusters, show that our strategies result in up to a 3.23 speedup in performance over existing strategies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We have designed a four-helix protein that is expected to tetramerize in the membrane to form an ion channel with a structurally well defined pore. A synthetic peptide corresponding to the channel lining helix facilitates ion transport across liposomal membranes and largely helical in membranes. Detailed circular dichroism studies of the peptide in methanol, water and methanal-water mixtures reveal that it is helical in methanol, beta-structured in 97.5% water and a combination of these two structures at intermediate compositions of methanol and water. A fluorescence resonance energy transfer study of the peptide shows that the peptide is monomeric in methanol but undergoes extensive anti-parallel aggregation in aqueous solution.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The unsteady incompressible viscous fluid flow between two parallel infinite disks which are located at a distance h(t*) at time t* has been studied. The upper disk moves towards the lower disk with velocity h'(t*). The lower disk is porous and rotates with angular velocity Omega(t*). A magnetic field B(t*) is applied perpendicular to the two disks. It has been found that the governing Navier-Stokes equations reduce to a set of ordinary differential equations if h(t*), a(t*) and B(t*) vary with time t* in a particular manner, i.e. h(t*) = H(1 - alpha t*)(1/2), Omega(t*) = Omega(0)(1 - alpha t*)(-1), B(t*) = B-0(1 - alpha t*)(-1/2). These ordinary differential equations have been solved numerically using a shooting method. For small Reynolds numbers, analytical solutions have been obtained using a regular perturbation technique. The effects of squeeze Reynolds numbers, Hartmann number and rotation of the disk on the flow pattern, normal force or load and torque have been studied in detail

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The elastodynamic response of a pair of parallel rigid strips embedded in an infinite orthotropic medium due to elastic waves incident normally on the strips has been investigated. The mixed boundary value problem has been solved by the Integral Equation method. The normal stress and the vertical displacement have been derived in closed form. Numerical values of stress intensity factors at inner and outer edges of the strips and vertical displacement at points in the plane of the strips for several orthotropic materials have been calculated and plotted graphically to show the effect of material orthotropy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A computational study for the convergence acceleration of Euler and Navier-Stokes computations with upwind schemes has been conducted in a unified framework. It involves the flux-vector splitting algorithms due to Steger-Warming and Van Leer, the flux-difference splitting algorithms due to Roe and Osher and the hybrid algorithms, AUSM (Advection Upstream Splitting Method) and HUS (Hybrid Upwind Splitting). Implicit time integration with line Gauss-Seidel relaxation and multigrid are among the procedures which have been systematically investigated on an individual as well as cumulative basis. The upwind schemes have been tested in various implicit-explicit operator combinations such that the optimal among them can be determined based on extensive computations for two-dimensional flows in subsonic, transonic, supersonic and hypersonic flow regimes. In this study, the performance of these implicit time-integration procedures has been systematically compared with those corresponding to a multigrid accelerated explicit Runge-Kutta method. It has been demonstrated that a multigrid method employed in conjunction with an implicit time-integration scheme yields distinctly superior convergence as compared to those associated with either of the acceleration procedures provided that effective smoothers, which have been identified in this investigation, are prescribed in the implicit operator.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider the problem of deciding whether the output of a boolean circuit is determined by a partial assignment to its inputs. This problem is easily shown to be hard, i.e., co-Image Image -complete. However, many of the consequences of a partial input assignment may be determined in linear time, by iterating the following step: if we know the values of some inputs to a gate, we can deduce the values of some outputs of that gate. This process of iteratively deducing some of the consequences of a partial assignment is called propagation. This paper explores the parallel complexity of propagation, i.e., the complexity of determining whether the output of a given boolean circuit is determined by propagating a given partial input assignment. We give a complete classification of the problem into those cases that are Image -complete and those that are unlikely to be Image complete.