3 resultados para Parallelism
em Digital Commons - Michigan Tech
Resumo:
As the performance gap between microprocessors and memory continues to increase, main memory accesses result in long latencies which become a factor limiting system performance. Previous studies show that main memory access streams contain significant localities and SDRAM devices provide parallelism through multiple banks and channels. These locality and parallelism have not been exploited thoroughly by conventional memory controllers. In this thesis, SDRAM address mapping techniques and memory access reordering mechanisms are studied and applied to memory controller design with the goal of reducing observed main memory access latency. The proposed bit-reversal address mapping attempts to distribute main memory accesses evenly in the SDRAM address space to enable bank parallelism. As memory accesses to unique banks are interleaved, the access latencies are partially hidden and therefore reduced. With the consideration of cache conflict misses, bit-reversal address mapping is able to direct potential row conflicts to different banks, further improving the performance. The proposed burst scheduling is a novel access reordering mechanism, which creates bursts by clustering accesses directed to the same rows of the same banks. Subjected to a threshold, reads are allowed to preempt writes and qualified writes are piggybacked at the end of the bursts. A sophisticated access scheduler selects accesses based on priorities and interleaves accesses to maximize the SDRAM data bus utilization. Consequentially burst scheduling reduces row conflict rate, increasing and exploiting the available row locality. Using a revised SimpleScalar and M5 simulator, both techniques are evaluated and compared with existing academic and industrial solutions. With SPEC CPU2000 benchmarks, bit-reversal reduces the execution time by 14% on average over traditional page interleaving address mapping. Burst scheduling also achieves a 15% reduction in execution time over conventional bank in order scheduling. Working constructively together, bit-reversal and burst scheduling successfully achieve a 19% speedup across simulated benchmarks.
Resumo:
Self-stabilization is a property of a distributed system such that, regardless of the legitimacy of its current state, the system behavior shall eventually reach a legitimate state and shall remain legitimate thereafter. The elegance of self-stabilization stems from the fact that it distinguishes distributed systems by a strong fault tolerance property against arbitrary state perturbations. The difficulty of designing and reasoning about self-stabilization has been witnessed by many researchers; most of the existing techniques for the verification and design of self-stabilization are either brute-force, or adopt manual approaches non-amenable to automation. In this dissertation, we first investigate the possibility of automatically designing self-stabilization through global state space exploration. In particular, we develop a set of heuristics for automating the addition of recovery actions to distributed protocols on various network topologies. Our heuristics equally exploit the computational power of a single workstation and the available parallelism on computer clusters. We obtain existing and new stabilizing solutions for classical protocols like maximal matching, ring coloring, mutual exclusion, leader election and agreement. Second, we consider a foundation for local reasoning about self-stabilization; i.e., study the global behavior of the distributed system by exploring the state space of just one of its components. It turns out that local reasoning about deadlocks and livelocks is possible for an interesting class of protocols whose proof of stabilization is otherwise complex. In particular, we provide necessary and sufficient conditions – verifiable in the local state space of every process – for global deadlock- and livelock-freedom of protocols on ring topologies. Local reasoning potentially circumvents two fundamental problems that complicate the automated design and verification of distributed protocols: (1) state explosion and (2) partial state information. Moreover, local proofs of convergence are independent of the number of processes in the network, thereby enabling our assertions about deadlocks and livelocks to apply on rings of arbitrary sizes without worrying about state explosion.
Resumo:
This thesis develops high performance real-time signal processing modules for direction of arrival (DOA) estimation for localization systems. It proposes highly parallel algorithms for performing subspace decomposition and polynomial rooting, which are otherwise traditionally implemented using sequential algorithms. The proposed algorithms address the emerging need for real-time localization for a wide range of applications. As the antenna array size increases, the complexity of signal processing algorithms increases, making it increasingly difficult to satisfy the real-time constraints. This thesis addresses real-time implementation by proposing parallel algorithms, that maintain considerable improvement over traditional algorithms, especially for systems with larger number of antenna array elements. Singular value decomposition (SVD) and polynomial rooting are two computationally complex steps and act as the bottleneck to achieving real-time performance. The proposed algorithms are suitable for implementation on field programmable gated arrays (FPGAs), single instruction multiple data (SIMD) hardware or application specific integrated chips (ASICs), which offer large number of processing elements that can be exploited for parallel processing. The designs proposed in this thesis are modular, easily expandable and easy to implement. Firstly, this thesis proposes a fast converging SVD algorithm. The proposed method reduces the number of iterations it takes to converge to correct singular values, thus achieving closer to real-time performance. A general algorithm and a modular system design are provided making it easy for designers to replicate and extend the design to larger matrix sizes. Moreover, the method is highly parallel, which can be exploited in various hardware platforms mentioned earlier. A fixed point implementation of proposed SVD algorithm is presented. The FPGA design is pipelined to the maximum extent to increase the maximum achievable frequency of operation. The system was developed with the objective of achieving high throughput. Various modern cores available in FPGAs were used to maximize the performance and details of these modules are presented in detail. Finally, a parallel polynomial rooting technique based on Newton’s method applicable exclusively to root-MUSIC polynomials is proposed. Unique characteristics of root-MUSIC polynomial’s complex dynamics were exploited to derive this polynomial rooting method. The technique exhibits parallelism and converges to the desired root within fixed number of iterations, making this suitable for polynomial rooting of large degree polynomials. We believe this is the first time that complex dynamics of root-MUSIC polynomial were analyzed to propose an algorithm. In all, the thesis addresses two major bottlenecks in a direction of arrival estimation system, by providing simple, high throughput, parallel algorithms.