3 resultados para Pipeline Design
em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast
Resumo:
Quantum-dot Cellular Automata (QCA) technology is a promising potential alternative to CMOS technology. To explore the characteristics of QCA and suitable design methodologies, digital circuit design approaches have been investigated. Due to the inherent wire delay in QCA, pipelined architectures appear to be a particularly suitable design technique. Also, because of the pipeline nature of QCA technology, it is not suitable for complicated control system design. Systolic arrays take advantage of pipelining, parallelism and simple local control. Therefore, an investigation into these architectures in QCA technology is provided in this paper. Two case studies, (a matrix multiplier and a Galois Field multiplier) are designed and analyzed based on both multilayer and coplanar crossings. The performance of these two types of interconnections are compared and it is found that even though coplanar crossings are currently more practical, they tend to occupy a larger design area and incur slightly more delay. A general semi-conductor QCA systolic array design methodology is also proposed. It is found that by applying a systolic array structure in QCA design, significant benefits can be achieved particularly with large systolic arrays, even more so than when applied in CMOS-based technology.
Resumo:
Traditional static analysis fails to auto-parallelize programs with a complex control and data flow. Furthermore, thread-level parallelism in such programs is often restricted to pipeline parallelism, which can be hard to discover by a programmer. In this paper we propose a tool that, based on profiling information, helps the programmer to discover parallelism. The programmer hand-picks the code transformations from among the proposed candidates which are then applied by automatic code transformation techniques.
This paper contributes to the literature by presenting a profiling tool for discovering thread-level parallelism. We track dependencies at the whole-data structure level rather than at the element level or byte level in order to limit the profiling overhead. We perform a thorough analysis of the needs and costs of this technique. Furthermore, we present and validate the belief that programs with complex control and data flow contain significant amounts of exploitable coarse-grain pipeline parallelism in the program’s outer loops. This observation validates our approach to whole-data structure dependencies. As state-of-the-art compilers focus on loops iterating over data structure members, this observation also explains why our approach finds coarse-grain pipeline parallelism in cases that have remained out of reach for state-of-the-art compilers. In cases where traditional compilation techniques do find parallelism, our approach allows to discover higher degrees of parallelism, allowing a 40% speedup over traditional compilation techniques. Moreover, we demonstrate real speedups on multiple hardware platforms.
Resumo:
In this paper we describe the design of a parallel solution of the inhomogeneous Schrodinger equation, which arises in the construction of continuum orbitals in the R-matrix theory of atomic continuum processes. A prototype system is described which has been programmed in occam2 and implemented on a bi-directional pipeline of transputers. Some timing results for the prototype system are presented, and the development of a full production system is discussed.