831 resultados para Parallel numerical algotithms
Resumo:
Femtosecond laser microfabrication has emerged over the last decade as a 3D flexible technology in photonics. Numerical simulations provide an important insight into spatial and temporal beam and pulse shaping during the course of extremely intricate nonlinear propagation (see e.g. [1,2]). Electromagnetics of such propagation is typically described in the form of the generalized Non-Linear Schrdinger Equation (NLSE) coupled with Drude model for plasma [3]. In this paper we consider a multi-threaded parallel numerical solution for a specific model which describes femtosecond laser pulse propagation in transparent media [4, 5]. However our approach can be extended to similar models. The numerical code is implemented in NVIDIA Graphics Processing Unit (GPU) which provides an effitient hardware platform for multi-threded computing. We compare the performance of the described below parallel code implementated for GPU using CUDA programming interface [3] with a serial CPU version used in our previous papers [4,5]. © 2011 IEEE.
Resumo:
Solving linear systems is an important problem for scientific computing. Exploiting parallelism is essential for solving complex systems, and this traditionally involves writing parallel algorithms on top of a library such as MPI. The SPIKE family of algorithms is one well-known example of a parallel solver for linear systems. The Hierarchically Tiled Array data type extends traditional data-parallel array operations with explicit tiling and allows programmers to directly manipulate tiles. The tiles of the HTA data type map naturally to the block nature of many numeric computations, including the SPIKE family of algorithms. The higher level of abstraction of the HTA enables the same program to be portable across different platforms. Current implementations target both shared-memory and distributed-memory models. In this thesis we present a proof-of-concept for portable linear solvers. We implement two algorithms from the SPIKE family using the HTA library. We show that our implementations of SPIKE exploit the abstractions provided by the HTA to produce a compact, clean code that can run on both shared-memory and distributed-memory models without modification. We discuss how we map the algorithms to HTA programs as well as examine their performance. We compare the performance of our HTA codes to comparable codes written in MPI as well as current state-of-the-art linear algebra routines.
Resumo:
Magdeburg, Univ., Fak. für Verfahrens- und Systemtechnik, Diss., 2012
Resumo:
In this paper we present an algorithm for the numerical simulation of the cavitation in the hydrodynamic lubrication of journal bearings. Despite the fact that this physical process is usually modelled as a free boundary problem, we adopted the equivalent variational inequality formulation. We propose a two-level iterative algorithm, where the outer iteration is associated to the penalty method, used to transform the variational inequality into a variational equation, and the inner iteration is associated to the conjugate gradient method, used to solve the linear system generated by applying the finite element method to the variational equation. This inner part was implemented using the element by element strategy, which is easily parallelized. We analyse the behavior of two physical parameters and discuss some numerical results. Also, we analyse some results related to the performance of a parallel implementation of the algorithm.
Resumo:
The focus of this study is development of parallelised version of severely sequential and iterative numerical algorithms based on multi-threaded parallel platform such as a graphics processing unit. This requires design and development of a platform-specific numerical solution that can benefit from the parallel capabilities of the chosen platform. Graphics processing unit was chosen as a parallel platform for design and development of a numerical solution for a specific physical model in non-linear optics. This problem appears in describing ultra-short pulse propagation in bulk transparent media that has recently been subject to several theoretical and numerical studies. The mathematical model describing this phenomenon is a challenging and complex problem and its numerical modeling limited on current modern workstations. Numerical modeling of this problem requires a parallelisation of an essentially serial algorithms and elimination of numerical bottlenecks. The main challenge to overcome is parallelisation of the globally non-local mathematical model. This thesis presents a numerical solution for elimination of numerical bottleneck associated with the non-local nature of the mathematical model. The accuracy and performance of the parallel code is identified by back-to-back testing with a similar serial version.
Resumo:
MSC subject classification: 65C05, 65U05.
Resumo:
Magnetic field inhomogeneity results in image artifacts including signal loss, image blurring and distortions, leading to decreased diagnostic accuracy. Conventional multi-coil (MC) shimming method employs both RF coils and shimming coils, whose mutual interference induces a tradeoff between RF signal-to-noise (SNR) ratio and shimming performance. To address this issue, RF coils were integrated with direct-current (DC) shim coils to shim field inhomogeneity while concurrently emitting and receiving RF signal without being blocked by the shim coils. The currents applied to the new coils, termed iPRES (integrated parallel reception, excitation and shimming), were optimized in the numerical simulation to improve the shimming performance. The objectives of this work is to offer a guideline for designing the optimal iPRES coil arrays to shim the abdomen.
In this thesis work, the main field () inhomogeneity was evaluated by root mean square error (RMSE). To investigate the shimming abilities of iPRES coil arrays, a set of the human abdomen MRI data was collected for the numerical simulations. Thereafter, different simplified iPRES(N) coil arrays were numerically modeled, including a 1-channel iPRES coil and 8-channel iPRES coil arrays. For 8-channel iPRES coil arrays, each RF coil was split into smaller DC loops in the x, y and z direction to provide extra shimming freedom. Additionally, the number of DC loops in a RF coil was increased from 1 to 5 to find the optimal divisions in z direction. Furthermore, switches were numerically implemented into iPRES coils to reduce the number of power supplies while still providing similar shimming performance with equivalent iPRES coil arrays.
The optimizations demonstrate that the shimming ability of an iPRES coil array increases with number of DC loops per RF coil. Furthermore, the z direction divisions tend to be more effective in reducing field inhomogeneity than the x and y divisions. Moreover, the shimming performance of an iPRES coil array gradually reach to a saturation level when the number of DC loops per RF coil is large enough. Finally, when switches were numerically implemented in the iPRES(4) coil array, the number of power supplies can be reduced from 32 to 8 while keeping the shimming performance similar to iPRES(3) and better than iPRES(1). This thesis work offers a guidance for the designs of iPRES coil arrays.
Resumo:
In this work, a series of two-dimensional plane-strain finite element analyses was conducted to further understand the stress distribution during tensile tests on coated systems. Besides the film and the substrate, the finite element model also considered a number of cracks perpendicular to the film/substrate interface. Different from analyses commonly found in the literature, the mechanical behavior of both film and substrate was considered elastic-perfectly plastic in part of the analyses. Together with the film yield stress and the number of film cracks, other variables that were considered were crack tip geometry, the distance between two consecutive cracks and the presence of an interlayer. The analysis was based on the normal stresses parallel to the loading axis (sigma(xx)), which are responsible for cohesive failures that are observed in the film during this type of test. Results indicated that some configurations studied in this work have significantly reduced the value of sigma(xx) at the film/substrate interface and close to the pre-defined crack tips. Furthermore, in all the cases studied the values of sigma(xx) were systematically larger at the film/substrate interface than at the film surface. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
A kinetic theory based Navier-Stokes solver has been implemented on a parallel supercomputer (Intel iPSC Touchstone Delta) to study the leeward flowfield of a blunt nosed delta wing at 30-deg incidence at hypersonic speeds (similar to the proposed HERMES aerospace plane). Computational results are presented for a series of grids for both inviscid and laminar viscous flows at Reynolds numbers of 225,000 and 2.25 million. In addition, comparisons are made between the present and two independent calculations of the some flows (by L. LeToullec and P. Guillen, and S. Menne) which were presented at the Workshop on Hypersonic Flows for Re-entry Problems, Antibes, France, 1991.
Resumo:
Simulations provide a powerful means to help gain the understanding of crustal fault system physics required to progress towards the goal of earthquake forecasting. Cellular Automata are efficient enough to probe system dynamics but their simplifications render interpretations questionable. In contrast, sophisticated elasto-dynamic models yield more convincing results but are too computationally demanding to explore phase space. To help bridge this gap, we develop a simple 2D elastodynamic model of parallel fault systems. The model is discretised onto a triangular lattice and faults are specified as split nodes along horizontal rows in the lattice. A simple numerical approach is presented for calculating the forces at medium and split nodes such that general nonlinear frictional constitutive relations can be modeled along faults. Single and multi-fault simulation examples are presented using a nonlinear frictional relation that is slip and slip-rate dependent in order to illustrate the model.
Resumo:
Numerical methods related to Krylov subspaces are widely used in large sparse numerical linear algebra. Vectors in these subspaces are manipulated via their representation onto orthonormal bases. Nowadays, on serial computers, the method of Arnoldi is considered as a reliable technique for constructing such bases. However, although easily parallelizable, this technique is not as scalable as expected for communications. In this work we examine alternative methods aimed at overcoming this drawback. Since they retrieve upon completion the same information as Arnoldi's algorithm does, they enable us to design a wide family of stable and scalable Krylov approximation methods for various parallel environments. We present timing results obtained from their implementation on two distributed-memory multiprocessor supercomputers: the Intel Paragon and the IBM Scalable POWERparallel SP2. (C) 1997 by John Wiley & Sons, Ltd.
Resumo:
A previously developed model is used to numerically simulate real clinical cases of the surgical correction of scoliosis. This model consists of one-dimensional finite elements with spatial deformation in which (i) the column is represented by its axis; (ii) the vertebrae are assumed to be rigid; and (iii) the deformability of the column is concentrated in springs that connect the successive rigid elements. The metallic rods used for the surgical correction are modeled by beam elements with linear elastic behavior. To obtain the forces at the connections between the metallic rods and the vertebrae geometrically, non-linear finite element analyses are performed. The tightening sequence determines the magnitude of the forces applied to the patient column, and it is desirable to keep those forces as small as possible. In this study, a Genetic Algorithm optimization is applied to this model in order to determine the sequence that minimizes the corrective forces applied during the surgery. This amounts to find the optimal permutation of integers 1, ... , n, n being the number of vertebrae involved. As such, we are faced with a combinatorial optimization problem isomorph to the Traveling Salesman Problem. The fitness evaluation requires one computing intensive Finite Element Analysis per candidate solution and, thus, a parallel implementation of the Genetic Algorithm is developed.