6 resultados para structured parallel computations

em CaltechTHESIS


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The dispersion of an isolated, spherical, Brownian particle immersed in a Newtonian fluid between infinite parallel plates is investigated. Expressions are developed for both a 'molecular' contribution to dispersion, which arises from random thermal fluctuations, and a 'convective' contribution, arising when a shear flow is applied between the plates. These expressions are evaluated numerically for all sizes of the particle relative to the bounding plates, and the method of matched asymptotic expansions is used to develop analytical expressions for the dispersion coefficients as a function of particle size to plate spacing ratio for small values of this parameter.

It is shown that both the molecular and convective dispersion coefficients decrease as the size of the particle relative to the bounding plates increase. When the particle is small compared to the plate spacing, the coefficients decrease roughly proportional to the particle size to plate spacing ratio. When the particle closely fills the space between the plates, the molecular dispersion coefficient approaches zero slowly as an inverse logarithmic function of the particle size to plate spacing ratio, and the convective dispersion coefficent approaches zero approximately proportional to the width of the gap between the edges of the sphere and the bounding plates.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Hartree-Fock (HF) calculations have had remarkable success in describing large nuclei at high spin, temperature and deformation. To allow full range of possible deformations, the Skyrme HF equations can be discretized on a three-dimensional mesh. However, such calculations are currently limited by the computational resources provided by traditional supercomputers. To take advantage of recent developments in massively parallel computing technology, we have implemented the LLNL Skyrme-force static and rotational HF codes on Intel's DELTA and GAMMA systems at Caltech.

We decomposed the HF code by assigning a portion of the mesh to each node, with nearest neighbor meshes assigned to nodes connected by communication· channels. This kind of decomposition is well-suited for the DELTA and the GAMMA architecture because the only non-local operations are wave function orthogonalization and the boundary conditions of the Poisson equation for the Coulomb field.

Our first application of the HF code on parallel computers has been the study of identical superdeformed (SD) rotational bands in the Hg region. In the last ten years, many SD rotational bands have been found experimentally. One very surprising feature found in these SD rotational bands is that many pairs of bands in nuclei that differ by one or two mass units have nearly identical deexcitation gamma-ray energies. Our calculations of the five rotational bands in ^(192)Hg and ^(194)Pb show that the filling of specific orbitals can lead to bands with deexcitation gamma-ray energies differing by at most 2 keV in nuclei differing by two mass units and over a range of angular momenta comparable to that observed experimentally. Our calculations of SD rotational bands in the Dy region also show that twinning can be achieved by filling or emptying some specific orbitals.

The interpretation of future precise experiments on atomic parity nonconservation (PNC) in terms of parameters of the Standard Model could be hampered by uncertainties in the atomic and nuclear structure. As a further application of the massively parallel HF calculations, we calculated the proton and neutron densities of the Cesium isotopes from A = 125 to A = 139. Based on our good agreement with experimental charge radii, binding energies, and ground state spins, we conclude that the uncertainties in the ratios of weak charges are less than 10^(-3), comfortably smaller than the anticipated experimental error.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We investigate the 2d O(3) model with the standard action by Monte Carlo simulation at couplings β up to 2.05. We measure the energy density, mass gap and susceptibility of the model, and gather high statistics on lattices of size L ≤ 1024 using the Floating Point Systems T-series vector hypercube and the Thinking Machines Corp.'s Connection Machine 2. Asymptotic scaling does not appear to set in for this action, even at β = 2.10, where the correlation length is 420. We observe a 20% difference between our estimate m/Λ^─_(Ms) = 3.52(6) at this β and the recent exact analytical result . We use the overrelaxation algorithm interleaved with Metropolis updates and show that decorrelation time scales with the correlation length and the number of overrelaxation steps per sweep. We determine its effective dynamical critical exponent to be z' = 1.079(10); thus critical slowing down is reduced significantly for this local algorithm that is vectorizable and parallelizable.

We also use the cluster Monte Carlo algorithms, which are non-local Monte Carlo update schemes which can greatly increase the efficiency of computer simulations of spin models. The major computational task in these algorithms is connected component labeling, to identify clusters of connected sites on a lattice. We have devised some new SIMD component labeling algorithms, and implemented them on the Connection Machine. We investigate their performance when applied to the cluster update of the two dimensional Ising spin model.

Finally we use a Monte Carlo Renormalization Group method to directly measure the couplings of block Hamiltonians at different blocking levels. For the usual averaging block transformation we confirm the renormalized trajectory (RT) observed by Okawa. For another improved probabilistic block transformation we find the RT, showing that it is much closer to the Standard Action. We then use this block transformation to obtain the discrete β-function of the model which we compare to the perturbative result. We do not see convergence, except when using a rescaled coupling β_E to effectively resum the series. For the latter case we see agreement for m/ Λ^─_(Ms) at , β = 2.14, 2.26, 2.38 and 2.50. To three loops m/Λ^─_(Ms) = 3.047(35) at β = 2.50, which is very close to the exact value m/ Λ^─_(Ms) = 2.943. Our last point at β = 2.62 disagrees with this estimate however.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Signal processing techniques play important roles in the design of digital communication systems. These include information manipulation, transmitter signal processing, channel estimation, channel equalization and receiver signal processing. By interacting with communication theory and system implementing technologies, signal processing specialists develop efficient schemes for various communication problems by wisely exploiting various mathematical tools such as analysis, probability theory, matrix theory, optimization theory, and many others. In recent years, researchers realized that multiple-input multiple-output (MIMO) channel models are applicable to a wide range of different physical communications channels. Using the elegant matrix-vector notations, many MIMO transceiver (including the precoder and equalizer) design problems can be solved by matrix and optimization theory. Furthermore, the researchers showed that the majorization theory and matrix decompositions, such as singular value decomposition (SVD), geometric mean decomposition (GMD) and generalized triangular decomposition (GTD), provide unified frameworks for solving many of the point-to-point MIMO transceiver design problems.

In this thesis, we consider the transceiver design problems for linear time invariant (LTI) flat MIMO channels, linear time-varying narrowband MIMO channels, flat MIMO broadcast channels, and doubly selective scalar channels. Additionally, the channel estimation problem is also considered. The main contributions of this dissertation are the development of new matrix decompositions, and the uses of the matrix decompositions and majorization theory toward the practical transmit-receive scheme designs for transceiver optimization problems. Elegant solutions are obtained, novel transceiver structures are developed, ingenious algorithms are proposed, and performance analyses are derived.

The first part of the thesis focuses on transceiver design with LTI flat MIMO channels. We propose a novel matrix decomposition which decomposes a complex matrix as a product of several sets of semi-unitary matrices and upper triangular matrices in an iterative manner. The complexity of the new decomposition, generalized geometric mean decomposition (GGMD), is always less than or equal to that of geometric mean decomposition (GMD). The optimal GGMD parameters which yield the minimal complexity are derived. Based on the channel state information (CSI) at both the transmitter (CSIT) and receiver (CSIR), GGMD is used to design a butterfly structured decision feedback equalizer (DFE) MIMO transceiver which achieves the minimum average mean square error (MSE) under the total transmit power constraint. A novel iterative receiving detection algorithm for the specific receiver is also proposed. For the application to cyclic prefix (CP) systems in which the SVD of the equivalent channel matrix can be easily computed, the proposed GGMD transceiver has K/log_2(K) times complexity advantage over the GMD transceiver, where K is the number of data symbols per data block and is a power of 2. The performance analysis shows that the GGMD DFE transceiver can convert a MIMO channel into a set of parallel subchannels with the same bias and signal to interference plus noise ratios (SINRs). Hence, the average bit rate error (BER) is automatically minimized without the need for bit allocation. Moreover, the proposed transceiver can achieve the channel capacity simply by applying independent scalar Gaussian codes of the same rate at subchannels.

In the second part of the thesis, we focus on MIMO transceiver design for slowly time-varying MIMO channels with zero-forcing or MMSE criterion. Even though the GGMD/GMD DFE transceivers work for slowly time-varying MIMO channels by exploiting the instantaneous CSI at both ends, their performance is by no means optimal since the temporal diversity of the time-varying channels is not exploited. Based on the GTD, we develop space-time GTD (ST-GTD) for the decomposition of linear time-varying flat MIMO channels. Under the assumption that CSIT, CSIR and channel prediction are available, by using the proposed ST-GTD, we develop space-time geometric mean decomposition (ST-GMD) DFE transceivers under the zero-forcing or MMSE criterion. Under perfect channel prediction, the new system minimizes both the average MSE at the detector in each space-time (ST) block (which consists of several coherence blocks), and the average per ST-block BER in the moderate high SNR region. Moreover, the ST-GMD DFE transceiver designed under an MMSE criterion maximizes Gaussian mutual information over the equivalent channel seen by each ST-block. In general, the newly proposed transceivers perform better than the GGMD-based systems since the super-imposed temporal precoder is able to exploit the temporal diversity of time-varying channels. For practical applications, a novel ST-GTD based system which does not require channel prediction but shares the same asymptotic BER performance with the ST-GMD DFE transceiver is also proposed.

The third part of the thesis considers two quality of service (QoS) transceiver design problems for flat MIMO broadcast channels. The first one is the power minimization problem (min-power) with a total bitrate constraint and per-stream BER constraints. The second problem is the rate maximization problem (max-rate) with a total transmit power constraint and per-stream BER constraints. Exploiting a particular class of joint triangularization (JT), we are able to jointly optimize the bit allocation and the broadcast DFE transceiver for the min-power and max-rate problems. The resulting optimal designs are called the minimum power JT broadcast DFE transceiver (MPJT) and maximum rate JT broadcast DFE transceiver (MRJT), respectively. In addition to the optimal designs, two suboptimal designs based on QR decomposition are proposed. They are realizable for arbitrary number of users.

Finally, we investigate the design of a discrete Fourier transform (DFT) modulated filterbank transceiver (DFT-FBT) with LTV scalar channels. For both cases with known LTV channels and unknown wide sense stationary uncorrelated scattering (WSSUS) statistical channels, we show how to optimize the transmitting and receiving prototypes of a DFT-FBT such that the SINR at the receiver is maximized. Also, a novel pilot-aided subspace channel estimation algorithm is proposed for the orthogonal frequency division multiplexing (OFDM) systems with quasi-stationary multi-path Rayleigh fading channels. Using the concept of a difference co-array, the new technique can construct M^2 co-pilots from M physical pilot tones with alternating pilot placement. Subspace methods, such as MUSIC and ESPRIT, can be used to estimate the multipath delays and the number of identifiable paths is up to O(M^2), theoretically. With the delay information, a MMSE estimator for frequency response is derived. It is shown through simulations that the proposed method outperforms the conventional subspace channel estimator when the number of multipaths is greater than or equal to the number of physical pilots minus one.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis studies three classes of randomized numerical linear algebra algorithms, namely: (i) randomized matrix sparsification algorithms, (ii) low-rank approximation algorithms that use randomized unitary transformations, and (iii) low-rank approximation algorithms for positive-semidefinite (PSD) matrices.

Randomized matrix sparsification algorithms set randomly chosen entries of the input matrix to zero. When the approximant is substituted for the original matrix in computations, its sparsity allows one to employ faster sparsity-exploiting algorithms. This thesis contributes bounds on the approximation error of nonuniform randomized sparsification schemes, measured in the spectral norm and two NP-hard norms that are of interest in computational graph theory and subset selection applications.

Low-rank approximations based on randomized unitary transformations have several desirable properties: they have low communication costs, are amenable to parallel implementation, and exploit the existence of fast transform algorithms. This thesis investigates the tradeoff between the accuracy and cost of generating such approximations. State-of-the-art spectral and Frobenius-norm error bounds are provided.

The last class of algorithms considered are SPSD "sketching" algorithms. Such sketches can be computed faster than approximations based on projecting onto mixtures of the columns of the matrix. The performance of several such sketching schemes is empirically evaluated using a suite of canonical matrices drawn from machine learning and data analysis applications, and a framework is developed for establishing theoretical error bounds.

In addition to studying these algorithms, this thesis extends the Matrix Laplace Transform framework to derive Chernoff and Bernstein inequalities that apply to all the eigenvalues of certain classes of random matrices. These inequalities are used to investigate the behavior of the singular values of a matrix under random sampling, and to derive convergence rates for each individual eigenvalue of a sample covariance matrix.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

FRAME3D, a program for the nonlinear seismic analysis of steel structures, has previously been used to study the collapse mechanisms of steel buildings up to 20 stories tall. The present thesis is inspired by the need to conduct similar analysis for much taller structures. It improves FRAME3D in two primary ways.

First, FRAME3D is revised to address specific nonlinear situations involving large displacement/rotation increments, the backup-subdivide algorithm, element failure, and extremely narrow joint hysteresis. The revisions result in superior convergence capabilities when modeling earthquake-induced collapse. The material model of a steel fiber is also modified to allow for post-rupture compressive strength.

Second, a parallel FRAME3D (PFRAME3D) is developed. The serial code is optimized and then parallelized. A distributed-memory divide-and-conquer approach is used for both the global direct solver and element-state updates. The result is an implicit finite-element hybrid-parallel program that takes advantage of the narrow-band nature of very tall buildings and uses nearest-neighbor-only communication patterns.

Using three structures of varied sized, PFRAME3D is shown to compute reproducible results that agree with that of the optimized 1-core version (displacement time-history response root-mean-squared errors are ~〖10〗^(-5) m) with much less wall time (e.g., a dynamic time-history collapse simulation of a 60-story building is computed in 5.69 hrs with 128 cores—a speedup of 14.7 vs. the optimized 1-core version). The maximum speedups attained are shown to increase with building height (as the total number of cores used also increases), and the parallel framework can be expected to be suitable for buildings taller than the ones presented here.

PFRAME3D is used to analyze a hypothetical 60-story steel moment-frame tube building (fundamental period of 6.16 sec) designed according to the 1994 Uniform Building Code. Dynamic pushover and time-history analyses are conducted. Multi-story shear-band collapse mechanisms are observed around mid-height of the building. The use of closely-spaced columns and deep beams is found to contribute to the building's “somewhat brittle” behavior (ductility ratio ~2.0). Overall building strength is observed to be sensitive to whether a model is fracture-capable.