918 resultados para parallel execution
Resumo:
<正>生物力学研究的趋势十分明显的是,由宏观方面的研究转向细观和微观方面的研究。人们从个体、器官和组织的生物力学方面,转向细胞甚至分子水平的研究。在力的作用下,细胞的形态、生理作用等发生的变化引起了人们极大的兴趣,其中流体流动时剪应力对细胞的作用尤为人们所特别关注,因为有血液在血管中流动时的剪应力对血管内皮细胞的作用这样的实际生理背景。剪应力不但可以影响内皮细胞的形态结构,而且对在细胞诸多生理方面有影响。
Resumo:
A three-dimensional MHD solver is described in the paper. The solver simulates reacting flows with nonequilibrium between translational-rotational, vibrational and electron translational modes. The conservation equations are discretized with implicit time marching and the second-order modified Steger-Warming scheme, and the resulted linear system is solved iteratively with Newton-Krylov-Schwarz method that is implemented by PETSc package. The results of convergence tests are plotted, which show good scalability and convergence around twice faster when compared with the DPLR method. Then five test runs are conducted simulating the experiments done at the NASA Ames MHD channel, and the calculated pressures, temperatures, electrical conductivity, back EMF, load factors and flow accelerations are shown to agree with the experimental data. Our computation shows that the electrical conductivity distribution is not uniform in the powered section of the MHD channel, and that it is important to include Joule heating in order to calculate the correct conductivity and the MHD acceleration.
Resumo:
It has long been recognized that many direct parallel tridiagonal solvers are only efficient for solving a single tridiagonal equation of large sizes, and they become inefficient when naively used in a three-dimensional ADI solver. In order to improve the parallel efficiency of an ADI solver using a direct parallel solver, we implement the single parallel partition (SPP) algorithm in conjunction with message vectorization, which aggregates several communication messages into one to reduce the communication costs. The measured performances show that the longest allowable message vector length (MVL) is not necessarily the best choice. To understand this observation and optimize the performance, we propose an improved model that takes the cache effect into consideration. The optimal MVL for achieving the best performance is shown to depend on number of processors and grid sizes. Similar dependence of the optimal MVL is also found for the popular block pipelined method.
Resumo:
The dispersion of an isolated, spherical, Brownian particle immersed in a Newtonian fluid between infinite parallel plates is investigated. Expressions are developed for both a 'molecular' contribution to dispersion, which arises from random thermal fluctuations, and a 'convective' contribution, arising when a shear flow is applied between the plates. These expressions are evaluated numerically for all sizes of the particle relative to the bounding plates, and the method of matched asymptotic expansions is used to develop analytical expressions for the dispersion coefficients as a function of particle size to plate spacing ratio for small values of this parameter.
It is shown that both the molecular and convective dispersion coefficients decrease as the size of the particle relative to the bounding plates increase. When the particle is small compared to the plate spacing, the coefficients decrease roughly proportional to the particle size to plate spacing ratio. When the particle closely fills the space between the plates, the molecular dispersion coefficient approaches zero slowly as an inverse logarithmic function of the particle size to plate spacing ratio, and the convective dispersion coefficent approaches zero approximately proportional to the width of the gap between the edges of the sphere and the bounding plates.
Resumo:
Hartree-Fock (HF) calculations have had remarkable success in describing large nuclei at high spin, temperature and deformation. To allow full range of possible deformations, the Skyrme HF equations can be discretized on a three-dimensional mesh. However, such calculations are currently limited by the computational resources provided by traditional supercomputers. To take advantage of recent developments in massively parallel computing technology, we have implemented the LLNL Skyrme-force static and rotational HF codes on Intel's DELTA and GAMMA systems at Caltech.
We decomposed the HF code by assigning a portion of the mesh to each node, with nearest neighbor meshes assigned to nodes connected by communication· channels. This kind of decomposition is well-suited for the DELTA and the GAMMA architecture because the only non-local operations are wave function orthogonalization and the boundary conditions of the Poisson equation for the Coulomb field.
Our first application of the HF code on parallel computers has been the study of identical superdeformed (SD) rotational bands in the Hg region. In the last ten years, many SD rotational bands have been found experimentally. One very surprising feature found in these SD rotational bands is that many pairs of bands in nuclei that differ by one or two mass units have nearly identical deexcitation gamma-ray energies. Our calculations of the five rotational bands in ^(192)Hg and ^(194)Pb show that the filling of specific orbitals can lead to bands with deexcitation gamma-ray energies differing by at most 2 keV in nuclei differing by two mass units and over a range of angular momenta comparable to that observed experimentally. Our calculations of SD rotational bands in the Dy region also show that twinning can be achieved by filling or emptying some specific orbitals.
The interpretation of future precise experiments on atomic parity nonconservation (PNC) in terms of parameters of the Standard Model could be hampered by uncertainties in the atomic and nuclear structure. As a further application of the massively parallel HF calculations, we calculated the proton and neutron densities of the Cesium isotopes from A = 125 to A = 139. Based on our good agreement with experimental charge radii, binding energies, and ground state spins, we conclude that the uncertainties in the ratios of weak charges are less than 10^(-3), comfortably smaller than the anticipated experimental error.
Resumo:
We investigate the 2d O(3) model with the standard action by Monte Carlo simulation at couplings β up to 2.05. We measure the energy density, mass gap and susceptibility of the model, and gather high statistics on lattices of size L ≤ 1024 using the Floating Point Systems T-series vector hypercube and the Thinking Machines Corp.'s Connection Machine 2. Asymptotic scaling does not appear to set in for this action, even at β = 2.10, where the correlation length is 420. We observe a 20% difference between our estimate m/Λ^─_(Ms) = 3.52(6) at this β and the recent exact analytical result . We use the overrelaxation algorithm interleaved with Metropolis updates and show that decorrelation time scales with the correlation length and the number of overrelaxation steps per sweep. We determine its effective dynamical critical exponent to be z' = 1.079(10); thus critical slowing down is reduced significantly for this local algorithm that is vectorizable and parallelizable.
We also use the cluster Monte Carlo algorithms, which are non-local Monte Carlo update schemes which can greatly increase the efficiency of computer simulations of spin models. The major computational task in these algorithms is connected component labeling, to identify clusters of connected sites on a lattice. We have devised some new SIMD component labeling algorithms, and implemented them on the Connection Machine. We investigate their performance when applied to the cluster update of the two dimensional Ising spin model.
Finally we use a Monte Carlo Renormalization Group method to directly measure the couplings of block Hamiltonians at different blocking levels. For the usual averaging block transformation we confirm the renormalized trajectory (RT) observed by Okawa. For another improved probabilistic block transformation we find the RT, showing that it is much closer to the Standard Action. We then use this block transformation to obtain the discrete β-function of the model which we compare to the perturbative result. We do not see convergence, except when using a rescaled coupling β_E to effectively resum the series. For the latter case we see agreement for m/ Λ^─_(Ms) at , β = 2.14, 2.26, 2.38 and 2.50. To three loops m/Λ^─_(Ms) = 3.047(35) at β = 2.50, which is very close to the exact value m/ Λ^─_(Ms) = 2.943. Our last point at β = 2.62 disagrees with this estimate however.
Resumo:
167 p.
MODIFIED DIRECT TWOS-COMPLEMENT PARALLEL ARRAY MULTIPLICATION ALGORITHM FOR COMPLEX MATRIX OPERATION
Resumo:
A direct twos-complement parallel array multiplication algorithm is introduced and modified for digital optical numerical computation. The modified version overcomes the problems encountered in the conventional optical twos-complement algorithm. In the array, all the summands are generated in parallel, and the relevant summands having the same weights are added simultaneously without carries, resulting in the product expressed in a mixed twos-complement system. In a two-stage array, complex multiplication is possible with using four real subarrays. Furthermore, with a three-stage array architecture, complex matrix operation is straightforwardly accomplished. In the experiment, parallel two-stage array complex multiplication with liquid-crystal panels is demonstrated.
Resumo:
On the basis of signed-digit negabinary representation, parallel two-step addition and one-step subtraction can be performed for arbitrary-length negabinary operands.; The arithmetic is realized by signed logic operations and optically implemented by spatial encoding and decoding techniques. The proposed algorithm and optical system are simple, reliable, and practicable, and they have the property of parallel processing of two-dimensional data. This leads to an efficient design for the optical arithmetic and logic unit. (C) 1997 Optical Society of America.