2 resultados para Complete K-ary Tree
em Massachusetts Institute of Technology
Resumo:
As the number of processors in distributed-memory multiprocessors grows, efficiently supporting a shared-memory programming model becomes difficult. We have designed the Protocol for Hierarchical Directories (PHD) to allow shared-memory support for systems containing massive numbers of processors. PHD eliminates bandwidth problems by using a scalable network, decreases hot-spots by not relying on a single point to distribute blocks, and uses a scalable amount of space for its directories. PHD provides a shared-memory model by synthesizing a global shared memory from the local memories of processors. PHD supports sequentially consistent read, write, and test- and-set operations. This thesis also introduces a method of describing locality for hierarchical protocols and employs this method in the derivation of an abstract model of the protocol behavior. An embedded model, based on the work of Johnson[ISCA19], describes the protocol behavior when mapped to a k-ary n-cube. The thesis uses these two models to study the average height in the hierarchy that operations reach, the longest path messages travel, the number of messages that operations generate, the inter-transaction issue time, and the protocol overhead for different locality parameters, degrees of multithreading, and machine sizes. We determine that multithreading is only useful for approximately two to four threads; any additional interleaving does not decrease the overall latency. For small machines and high locality applications, this limitation is due mainly to the length of the running threads. For large machines with medium to low locality, this limitation is due mainly to the protocol overhead being too large. Our study using the embedded model shows that in situations where the run length between references to shared memory is at least an order of magnitude longer than the time to process a single state transition in the protocol, applications exhibit good performance. If separate controllers for processing protocol requests are included, the protocol scales to 32k processor machines as long as the application exhibits hierarchical locality: at least 22% of the global references must be able to be satisfied locally; at most 35% of the global references are allowed to reach the top level of the hierarchy.
Resumo:
In this paper a precorrected FFT-Fast Multipole Tree (pFFT-FMT) method for solving the potential flow around arbitrary three dimensional bodies is presented. The method takes advantage of the efficiency of the pFFT and FMT algorithms to facilitate more demanding computations such as automatic wake generation and hands-off steady and unsteady aerodynamic simulations. The velocity potential on the body surfaces and in the domain is determined using a pFFT Boundary Element Method (BEM) approach based on the Green’s Theorem Boundary Integral Equation. The vorticity trailing all lifting surfaces in the domain is represented using a Fast Multipole Tree, time advected, vortex participle method. Some simple steady state flow solutions are performed to demonstrate the basic capabilities of the solver. Although this paper focuses primarily on steady state solutions, it should be noted that this approach is designed to be a robust and efficient unsteady potential flow simulation tool, useful for rapid computational prototyping.