9 resultados para parallel algorithm
em CaltechTHESIS
Resumo:
We investigate the 2d O(3) model with the standard action by Monte Carlo simulation at couplings β up to 2.05. We measure the energy density, mass gap and susceptibility of the model, and gather high statistics on lattices of size L ≤ 1024 using the Floating Point Systems T-series vector hypercube and the Thinking Machines Corp.'s Connection Machine 2. Asymptotic scaling does not appear to set in for this action, even at β = 2.10, where the correlation length is 420. We observe a 20% difference between our estimate m/Λ^─_(Ms) = 3.52(6) at this β and the recent exact analytical result . We use the overrelaxation algorithm interleaved with Metropolis updates and show that decorrelation time scales with the correlation length and the number of overrelaxation steps per sweep. We determine its effective dynamical critical exponent to be z' = 1.079(10); thus critical slowing down is reduced significantly for this local algorithm that is vectorizable and parallelizable.
We also use the cluster Monte Carlo algorithms, which are non-local Monte Carlo update schemes which can greatly increase the efficiency of computer simulations of spin models. The major computational task in these algorithms is connected component labeling, to identify clusters of connected sites on a lattice. We have devised some new SIMD component labeling algorithms, and implemented them on the Connection Machine. We investigate their performance when applied to the cluster update of the two dimensional Ising spin model.
Finally we use a Monte Carlo Renormalization Group method to directly measure the couplings of block Hamiltonians at different blocking levels. For the usual averaging block transformation we confirm the renormalized trajectory (RT) observed by Okawa. For another improved probabilistic block transformation we find the RT, showing that it is much closer to the Standard Action. We then use this block transformation to obtain the discrete β-function of the model which we compare to the perturbative result. We do not see convergence, except when using a rescaled coupling β_E to effectively resum the series. For the latter case we see agreement for m/ Λ^─_(Ms) at , β = 2.14, 2.26, 2.38 and 2.50. To three loops m/Λ^─_(Ms) = 3.047(35) at β = 2.50, which is very close to the exact value m/ Λ^─_(Ms) = 2.943. Our last point at β = 2.62 disagrees with this estimate however.
Resumo:
FRAME3D, a program for the nonlinear seismic analysis of steel structures, has previously been used to study the collapse mechanisms of steel buildings up to 20 stories tall. The present thesis is inspired by the need to conduct similar analysis for much taller structures. It improves FRAME3D in two primary ways.
First, FRAME3D is revised to address specific nonlinear situations involving large displacement/rotation increments, the backup-subdivide algorithm, element failure, and extremely narrow joint hysteresis. The revisions result in superior convergence capabilities when modeling earthquake-induced collapse. The material model of a steel fiber is also modified to allow for post-rupture compressive strength.
Second, a parallel FRAME3D (PFRAME3D) is developed. The serial code is optimized and then parallelized. A distributed-memory divide-and-conquer approach is used for both the global direct solver and element-state updates. The result is an implicit finite-element hybrid-parallel program that takes advantage of the narrow-band nature of very tall buildings and uses nearest-neighbor-only communication patterns.
Using three structures of varied sized, PFRAME3D is shown to compute reproducible results that agree with that of the optimized 1-core version (displacement time-history response root-mean-squared errors are ~〖10〗^(-5) m) with much less wall time (e.g., a dynamic time-history collapse simulation of a 60-story building is computed in 5.69 hrs with 128 cores—a speedup of 14.7 vs. the optimized 1-core version). The maximum speedups attained are shown to increase with building height (as the total number of cores used also increases), and the parallel framework can be expected to be suitable for buildings taller than the ones presented here.
PFRAME3D is used to analyze a hypothetical 60-story steel moment-frame tube building (fundamental period of 6.16 sec) designed according to the 1994 Uniform Building Code. Dynamic pushover and time-history analyses are conducted. Multi-story shear-band collapse mechanisms are observed around mid-height of the building. The use of closely-spaced columns and deep beams is found to contribute to the building's “somewhat brittle” behavior (ductility ratio ~2.0). Overall building strength is observed to be sensitive to whether a model is fracture-capable.
Resumo:
The dispersion of an isolated, spherical, Brownian particle immersed in a Newtonian fluid between infinite parallel plates is investigated. Expressions are developed for both a 'molecular' contribution to dispersion, which arises from random thermal fluctuations, and a 'convective' contribution, arising when a shear flow is applied between the plates. These expressions are evaluated numerically for all sizes of the particle relative to the bounding plates, and the method of matched asymptotic expansions is used to develop analytical expressions for the dispersion coefficients as a function of particle size to plate spacing ratio for small values of this parameter.
It is shown that both the molecular and convective dispersion coefficients decrease as the size of the particle relative to the bounding plates increase. When the particle is small compared to the plate spacing, the coefficients decrease roughly proportional to the particle size to plate spacing ratio. When the particle closely fills the space between the plates, the molecular dispersion coefficient approaches zero slowly as an inverse logarithmic function of the particle size to plate spacing ratio, and the convective dispersion coefficent approaches zero approximately proportional to the width of the gap between the edges of the sphere and the bounding plates.
Resumo:
This thesis presents a novel framework for state estimation in the context of robotic grasping and manipulation. The overall estimation approach is based on fusing various visual cues for manipulator tracking, namely appearance and feature-based, shape-based, and silhouette-based visual cues. Similarly, a framework is developed to fuse the above visual cues, but also kinesthetic cues such as force-torque and tactile measurements, for in-hand object pose estimation. The cues are extracted from multiple sensor modalities and are fused in a variety of Kalman filters.
A hybrid estimator is developed to estimate both a continuous state (robot and object states) and discrete states, called contact modes, which specify how each finger contacts a particular object surface. A static multiple model estimator is used to compute and maintain this mode probability. The thesis also develops an estimation framework for estimating model parameters associated with object grasping. Dual and joint state-parameter estimation is explored for parameter estimation of a grasped object's mass and center of mass. Experimental results demonstrate simultaneous object localization and center of mass estimation.
Dual-arm estimation is developed for two arm robotic manipulation tasks. Two types of filters are explored; the first is an augmented filter that contains both arms in the state vector while the second runs two filters in parallel, one for each arm. These two frameworks and their performance is compared in a dual-arm task of removing a wheel from a hub.
This thesis also presents a new method for action selection involving touch. This next best touch method selects an available action for interacting with an object that will gain the most information. The algorithm employs information theory to compute an information gain metric that is based on a probabilistic belief suitable for the task. An estimation framework is used to maintain this belief over time. Kinesthetic measurements such as contact and tactile measurements are used to update the state belief after every interactive action. Simulation and experimental results are demonstrated using next best touch for object localization, specifically a door handle on a door. The next best touch theory is extended for model parameter determination. Since many objects within a particular object category share the same rough shape, principle component analysis may be used to parametrize the object mesh models. These parameters can be estimated using the action selection technique that selects the touching action which best both localizes and estimates these parameters. Simulation results are then presented involving localizing and determining a parameter of a screwdriver.
Lastly, the next best touch theory is further extended to model classes. Instead of estimating parameters, object class determination is incorporated into the information gain metric calculation. The best touching action is selected in order to best discern between the possible model classes. Simulation results are presented to validate the theory.
Resumo:
Hartree-Fock (HF) calculations have had remarkable success in describing large nuclei at high spin, temperature and deformation. To allow full range of possible deformations, the Skyrme HF equations can be discretized on a three-dimensional mesh. However, such calculations are currently limited by the computational resources provided by traditional supercomputers. To take advantage of recent developments in massively parallel computing technology, we have implemented the LLNL Skyrme-force static and rotational HF codes on Intel's DELTA and GAMMA systems at Caltech.
We decomposed the HF code by assigning a portion of the mesh to each node, with nearest neighbor meshes assigned to nodes connected by communication· channels. This kind of decomposition is well-suited for the DELTA and the GAMMA architecture because the only non-local operations are wave function orthogonalization and the boundary conditions of the Poisson equation for the Coulomb field.
Our first application of the HF code on parallel computers has been the study of identical superdeformed (SD) rotational bands in the Hg region. In the last ten years, many SD rotational bands have been found experimentally. One very surprising feature found in these SD rotational bands is that many pairs of bands in nuclei that differ by one or two mass units have nearly identical deexcitation gamma-ray energies. Our calculations of the five rotational bands in ^(192)Hg and ^(194)Pb show that the filling of specific orbitals can lead to bands with deexcitation gamma-ray energies differing by at most 2 keV in nuclei differing by two mass units and over a range of angular momenta comparable to that observed experimentally. Our calculations of SD rotational bands in the Dy region also show that twinning can be achieved by filling or emptying some specific orbitals.
The interpretation of future precise experiments on atomic parity nonconservation (PNC) in terms of parameters of the Standard Model could be hampered by uncertainties in the atomic and nuclear structure. As a further application of the massively parallel HF calculations, we calculated the proton and neutron densities of the Cesium isotopes from A = 125 to A = 139. Based on our good agreement with experimental charge radii, binding energies, and ground state spins, we conclude that the uncertainties in the ratios of weak charges are less than 10^(-3), comfortably smaller than the anticipated experimental error.
Resumo:
Signal processing techniques play important roles in the design of digital communication systems. These include information manipulation, transmitter signal processing, channel estimation, channel equalization and receiver signal processing. By interacting with communication theory and system implementing technologies, signal processing specialists develop efficient schemes for various communication problems by wisely exploiting various mathematical tools such as analysis, probability theory, matrix theory, optimization theory, and many others. In recent years, researchers realized that multiple-input multiple-output (MIMO) channel models are applicable to a wide range of different physical communications channels. Using the elegant matrix-vector notations, many MIMO transceiver (including the precoder and equalizer) design problems can be solved by matrix and optimization theory. Furthermore, the researchers showed that the majorization theory and matrix decompositions, such as singular value decomposition (SVD), geometric mean decomposition (GMD) and generalized triangular decomposition (GTD), provide unified frameworks for solving many of the point-to-point MIMO transceiver design problems.
In this thesis, we consider the transceiver design problems for linear time invariant (LTI) flat MIMO channels, linear time-varying narrowband MIMO channels, flat MIMO broadcast channels, and doubly selective scalar channels. Additionally, the channel estimation problem is also considered. The main contributions of this dissertation are the development of new matrix decompositions, and the uses of the matrix decompositions and majorization theory toward the practical transmit-receive scheme designs for transceiver optimization problems. Elegant solutions are obtained, novel transceiver structures are developed, ingenious algorithms are proposed, and performance analyses are derived.
The first part of the thesis focuses on transceiver design with LTI flat MIMO channels. We propose a novel matrix decomposition which decomposes a complex matrix as a product of several sets of semi-unitary matrices and upper triangular matrices in an iterative manner. The complexity of the new decomposition, generalized geometric mean decomposition (GGMD), is always less than or equal to that of geometric mean decomposition (GMD). The optimal GGMD parameters which yield the minimal complexity are derived. Based on the channel state information (CSI) at both the transmitter (CSIT) and receiver (CSIR), GGMD is used to design a butterfly structured decision feedback equalizer (DFE) MIMO transceiver which achieves the minimum average mean square error (MSE) under the total transmit power constraint. A novel iterative receiving detection algorithm for the specific receiver is also proposed. For the application to cyclic prefix (CP) systems in which the SVD of the equivalent channel matrix can be easily computed, the proposed GGMD transceiver has K/log_2(K) times complexity advantage over the GMD transceiver, where K is the number of data symbols per data block and is a power of 2. The performance analysis shows that the GGMD DFE transceiver can convert a MIMO channel into a set of parallel subchannels with the same bias and signal to interference plus noise ratios (SINRs). Hence, the average bit rate error (BER) is automatically minimized without the need for bit allocation. Moreover, the proposed transceiver can achieve the channel capacity simply by applying independent scalar Gaussian codes of the same rate at subchannels.
In the second part of the thesis, we focus on MIMO transceiver design for slowly time-varying MIMO channels with zero-forcing or MMSE criterion. Even though the GGMD/GMD DFE transceivers work for slowly time-varying MIMO channels by exploiting the instantaneous CSI at both ends, their performance is by no means optimal since the temporal diversity of the time-varying channels is not exploited. Based on the GTD, we develop space-time GTD (ST-GTD) for the decomposition of linear time-varying flat MIMO channels. Under the assumption that CSIT, CSIR and channel prediction are available, by using the proposed ST-GTD, we develop space-time geometric mean decomposition (ST-GMD) DFE transceivers under the zero-forcing or MMSE criterion. Under perfect channel prediction, the new system minimizes both the average MSE at the detector in each space-time (ST) block (which consists of several coherence blocks), and the average per ST-block BER in the moderate high SNR region. Moreover, the ST-GMD DFE transceiver designed under an MMSE criterion maximizes Gaussian mutual information over the equivalent channel seen by each ST-block. In general, the newly proposed transceivers perform better than the GGMD-based systems since the super-imposed temporal precoder is able to exploit the temporal diversity of time-varying channels. For practical applications, a novel ST-GTD based system which does not require channel prediction but shares the same asymptotic BER performance with the ST-GMD DFE transceiver is also proposed.
The third part of the thesis considers two quality of service (QoS) transceiver design problems for flat MIMO broadcast channels. The first one is the power minimization problem (min-power) with a total bitrate constraint and per-stream BER constraints. The second problem is the rate maximization problem (max-rate) with a total transmit power constraint and per-stream BER constraints. Exploiting a particular class of joint triangularization (JT), we are able to jointly optimize the bit allocation and the broadcast DFE transceiver for the min-power and max-rate problems. The resulting optimal designs are called the minimum power JT broadcast DFE transceiver (MPJT) and maximum rate JT broadcast DFE transceiver (MRJT), respectively. In addition to the optimal designs, two suboptimal designs based on QR decomposition are proposed. They are realizable for arbitrary number of users.
Finally, we investigate the design of a discrete Fourier transform (DFT) modulated filterbank transceiver (DFT-FBT) with LTV scalar channels. For both cases with known LTV channels and unknown wide sense stationary uncorrelated scattering (WSSUS) statistical channels, we show how to optimize the transmitting and receiving prototypes of a DFT-FBT such that the SINR at the receiver is maximized. Also, a novel pilot-aided subspace channel estimation algorithm is proposed for the orthogonal frequency division multiplexing (OFDM) systems with quasi-stationary multi-path Rayleigh fading channels. Using the concept of a difference co-array, the new technique can construct M^2 co-pilots from M physical pilot tones with alternating pilot placement. Subspace methods, such as MUSIC and ESPRIT, can be used to estimate the multipath delays and the number of identifiable paths is up to O(M^2), theoretically. With the delay information, a MMSE estimator for frequency response is derived. It is shown through simulations that the proposed method outperforms the conventional subspace channel estimator when the number of multipaths is greater than or equal to the number of physical pilots minus one.
Resumo:
This thesis presents a new approach for the numerical solution of three-dimensional problems in elastodynamics. The new methodology, which is based on a recently introduced Fourier continuation (FC) algorithm for the solution of Partial Differential Equations on the basis of accurate Fourier expansions of possibly non-periodic functions, enables fast, high-order solutions of the time-dependent elastic wave equation in a nearly dispersionless manner, and it requires use of CFL constraints that scale only linearly with spatial discretizations. A new FC operator is introduced to treat Neumann and traction boundary conditions, and a block-decomposed (sub-patch) overset strategy is presented for implementation of general, complex geometries in distributed-memory parallel computing environments. Our treatment of the elastic wave equation, which is formulated as a complex system of variable-coefficient PDEs that includes possibly heterogeneous and spatially varying material constants, represents the first fully-realized three-dimensional extension of FC-based solvers to date. Challenges for three-dimensional elastodynamics simulations such as treatment of corners and edges in three-dimensional geometries, the existence of variable coefficients arising from physical configurations and/or use of curvilinear coordinate systems and treatment of boundary conditions, are all addressed. The broad applicability of our new FC elasticity solver is demonstrated through application to realistic problems concerning seismic wave motion on three-dimensional topographies as well as applications to non-destructive evaluation where, for the first time, we present three-dimensional simulations for comparison to experimental studies of guided-wave scattering by through-thickness holes in thin plates.
Resumo:
A neural network is a highly interconnected set of simple processors. The many connections allow information to travel rapidly through the network, and due to their simplicity, many processors in one network are feasible. Together these properties imply that we can build efficient massively parallel machines using neural networks. The primary problem is how do we specify the interconnections in a neural network. The various approaches developed so far such as outer product, learning algorithm, or energy function suffer from the following deficiencies: long training/ specification times; not guaranteed to work on all inputs; requires full connectivity.
Alternatively we discuss methods of using the topology and constraints of the problems themselves to design the topology and connections of the neural solution. We define several useful circuits-generalizations of the Winner-Take-All circuitthat allows us to incorporate constraints using feedback in a controlled manner. These circuits are proven to be stable, and to only converge on valid states. We use the Hopfield electronic model since this is close to an actual implementation. We also discuss methods for incorporating these circuits into larger systems, neural and nonneural. By exploiting regularities in our definition, we can construct efficient networks. To demonstrate the methods, we look to three problems from communications. We first discuss two applications to problems from circuit switching; finding routes in large multistage switches, and the call rearrangement problem. These show both, how we can use many neurons to build massively parallel machines, and how the Winner-Take-All circuits can simplify our designs.
Next we develop a solution to the contention arbitration problem of high-speed packet switches. We define a useful class of switching networks and then design a neural network to solve the contention arbitration problem for this class. Various aspects of the neural network/switch system are analyzed to measure the queueing performance of this method. Using the basic design, a feasible architecture for a large (1024-input) ATM packet switch is presented. Using the massive parallelism of neural networks, we can consider algorithms that were previously computationally unattainable. These now viable algorithms lead us to new perspectives on switch design.
Resumo:
Fast radio bursts (FRBs), a novel type of radio pulse, whose physics is not yet understood at all. Only a handful of FRBs had been detected when we started this project. Taking account of the scant observations, we put physical constraints on FRBs. We excluded proposals of a galactic origin for their extraordinarily high dispersion measures (DM), in particular stellar coronas and HII regions. Therefore our work supports an extragalactic origin for FRBs. We show that the resolved scattering tail of FRB 110220 is unlikely to be due to propagation through the intergalactic plasma. Instead the scattering is probably caused by the interstellar medium in the FRB's host galaxy, and indicates that this burst sits in the central region of that galaxy. Pulse durations of order $\ms$ constrain source sizes of FRBs implying enormous brightness temperatures and thus coherent emission. Electric fields near FRBs at cosmological distances would be so strong that they could accelerate free electrons from rest to relativistic energies in a single wave period. When we worked on FRBs, it was unclear whether they were genuine astronomical signals as distinct from `perytons', clearly terrestrial radio bursts, sharing some common properties with FRBs. Recently, in April 2015, astronomers discovered that perytons were emitted by microwave ovens. Radio chirps similar to FRBs were emitted when their doors opened while they were still heating. Evidence for the astronomical nature of FRBs has strengthened since our paper was published. Some bursts have been found to show linear and circular polarizations and Faraday rotation of the linear polarization has also been detected. I hope to resume working on FRBs in the near future. But after we completed our FRB paper, I decided to pause this project because of the lack of observational constraints.
The pulsar triple system, J0733+1715, has its orbital parameters fitted to high accuracy owing to the precise timing of the central $\ms$ pulsar. The two orbits are highly hierarchical, namely $P_{\mathrm{orb,1}}\ll P_{\mathrm{orb,2}}$, where 1 and 2 label the inner and outer white dwarf (WD) companions respectively. Moreover, their orbital planes almost coincide, providing a unique opportunity to study secular interaction associated purely with eccentricity beyond the solar system. Secular interaction only involves effect averaged over many orbits. Thus each companion can be represented by an elliptical wire with its mass distributed inversely proportional to its local orbital speed. Generally there exists a mutual torque, which vanishes only when their apsidal lines are parallel or anti-parallel. To maintain either mode, the eccentricity ratio, $e_1/e_2$, must be of the proper value, so that both apsidal lines precess together. For J0733+1715, $e_1\ll e_2$ for the parallel mode, while $e_1\gg e_2$ for the anti-parallel one. We show that the former precesses $\sim 10$ times slower than the latter. Currently the system is dominated by the parallel mode. Although only a little anti-parallel mode survives, both eccentricities especially $e_1$ oscillate on $\sim 10^3\yr$ timescale. Detectable changes would occur within $\sim 1\yr$. We demonstrate that the anti-parallel mode gets damped $\sim 10^4$ times faster than its parallel brother by any dissipative process diminishing $e_1$. If it is the tidal damping in the inner WD, we proceed to estimate its tidal quantity parameter ($Q$) to be $\sim 10^6$, which was poorly constrained by observations. However, tidal damping may also happen during the preceding low-mass X-ray binary (LMXB) phase or hydrogen thermal nuclear flashes. But, in both cases, the inner companion fills its Roche lobe and probably suffers mass/angular momentum loss, which might cause $e_1$ to grow rather than decay.
Several pairs of solar system satellites occupy mean motion resonances (MMRs). We divide these into two groups according to their proximity to exact resonance. Proximity is measured by the existence of a separatrix in phase space. MMRs between Io-Europa, Europa-Ganymede and Enceladus-Dione are too distant from exact resonance for a separatrix to appear. A separatrix is present only in the phase spaces of the Mimas-Tethys and Titan-Hyperion MMRs and their resonant arguments are the only ones to exhibit substantial librations. When a separatrix is present, tidal damping of eccentricity or inclination excites overstable librations that can lead to passage through resonance on the damping timescale. However, after investigation, we conclude that the librations in the Mimas-Tethys and Titan-Hyperion MMRs are fossils and do not result from overstability.
Rubble piles are common in the solar system. Monolithic elements touch their neighbors in small localized areas. Voids occupy a significant fraction of the volume. In a fluid-free environment, heat cannot conduct through voids; only radiation can transfer energy across them. We model the effective thermal conductivity of a rubble pile and show that it is proportional the square root of the pressure, $P$, for $P\leq \epsy^3\mu$ where $\epsy$ is the material's yield strain and $\mu$ its shear modulus. Our model provides an excellent fit to the depth dependence of the thermal conductivity in the top $140\,\mathrm{cm}$ of the lunar regolith. It also offers an explanation for the low thermal inertias of rocky asteroids and icy satellites. Lastly, we discuss how rubble piles slow down the cooling of small bodies such as asteroids.
Electromagnetic (EM) follow-up observations of gravitational wave (GW) events will help shed light on the nature of the sources, and more can be learned if the EM follow-ups can start as soon as the GW event becomes observable. In this paper, we propose a computationally efficient time-domain algorithm capable of detecting gravitational waves (GWs) from coalescing binaries of compact objects with nearly zero time delay. In case when the signal is strong enough, our algorithm also has the flexibility to trigger EM observation {\it before} the merger. The key to the efficiency of our algorithm arises from the use of chains of so-called Infinite Impulse Response (IIR) filters, which filter time-series data recursively. Computational cost is further reduced by a template interpolation technique that requires filtering to be done only for a much coarser template bank than otherwise required to sufficiently recover optimal signal-to-noise ratio. Towards future detectors with sensitivity extending to lower frequencies, our algorithm's computational cost is shown to increase rather insignificantly compared to the conventional time-domain correlation method. Moreover, at latencies of less than hundreds to thousands of seconds, this method is expected to be computationally more efficient than the straightforward frequency-domain method.