967 resultados para Design Automation
Resumo:
The physical design of a VLSI circuit involves circuit partitioning as a subtask. Typically, it is necessary to partition a large electrical circuit into several smaller circuits such that the total cross-wiring is minimized. This problem is a variant of the more general graph partitioning problem, and it is known that there does not exist a polynomial time algorithm to obtain an optimal partition. The heuristic procedure proposed by Kernighan and Lin1,2 requires O(n2 log2n) time to obtain a near-optimal two-way partition of a circuit with n modules. In the VLSI context, due to the large problem size involved, this computational requirement is unacceptably high. This paper is concerned with the hardware acceleration of the Kernighan-Lin procedure on an SIMD architecture. The proposed parallel partitioning algorithm requires O(n) processors, and has a time complexity of O(n log2n). In the proposed scheme, the reduced array architecture is employed with due considerations towards cost effectiveness and VLSI realizability of the architecture.The authors are not aware of any earlier attempts to parallelize a circuit partitioning algorithm in general or the Kernighan-Lin algorithm in particular. The use of the reduced array architecture is novel and opens up the possibilities of using this computing structure for several other applications in electronic design automation.
Resumo:
Massively parallel SIMD computing is applied to obtain an order of magnitude improvement in the executional speed of an important algorithm in VLSI design automation. The physical design of a VLSI circuit involves logic module placement as a subtask. The paper is concerned with accelerating the well known Min-cut placement technique for logic cell placement. The inherent parallelism of the Min-cut algorithm is identified, and it is shown that a parallel machine based on the efficient execution of the placement procedure.
Resumo:
Fault-tolerance is due to the semiconductor technology development important, not only for safety-critical systems but also for general-purpose (non-safety critical) systems. However, instead of guaranteeing that deadlines always are met, it is for general-purpose systems important to minimize the average execution time (AET) while ensuring fault-tolerance. For a given job and a soft (transient) error probability, we define mathematical formulas for AET that includes bus communication overhead for both voting (active replication) and rollback-recovery with checkpointing (RRC). And, for a given multi-processor system-on-chip (MPSoC), we define integer linear programming (ILP) models that minimize AET including bus communication overhead when: (1) selecting the number of checkpoints when using RRC, (2) finding the number of processors and job-to-processor assignment when using voting, and (3) defining fault-tolerance scheme (voting or RRC) per job and defining its usage for each job. Experiments demonstrate significant savings in AET.
Resumo:
The memory subsystem is a major contributor to the performance, power, and area of complex SoCs used in feature rich multimedia products. Hence, memory architecture of the embedded DSP is complex and usually custom designed with multiple banks of single-ported or dual ported on-chip scratch pad memory and multiple banks of off-chip memory. Building software for such large complex memories with many of the software components as individually optimized software IPs is a big challenge. In order to obtain good performance and a reduction in memory stalls, the data buffers of the application need to be placed carefully in different types of memory. In this paper we present a unified framework (MODLEX) that combines different data layout optimizations to address the complex DSP memory architectures. Our method models the data layout problem as multi-objective genetic algorithm (GA) with performance and power being the objectives and presents a set of solution points which is attractive from a platform design viewpoint. While most of the work in the literature assumes that performance and power are non-conflicting objectives, our work demonstrates that there is significant trade-off (up to 70%) that is possible between power and performance.
Resumo:
Instruction reuse is a microarchitectural technique that improves the execution time of a program by removing redundant computations at run-time. Although this is the job of an optimizing compiler, they do not succeed many a time due to limited knowledge of run-time data. In this paper we examine instruction reuse of integer ALU and load instructions in network processing applications. Specifically, this paper attempts to answer the following questions: (1) How much of instruction reuse is inherent in network processing applications?, (2) Can reuse be improved by reducing interference in the reuse buffer?, (3) What characteristics of network applications can be exploited to improve reuse?, and (4) What is the effect of reuse on resource contention and memory accesses? We propose an aggregation scheme that combines the high-level concept of network traffic i.e. "flows" with a low level microarchitectural feature of programs i.e. repetition of instructions and data along with an architecture that exploits temporal locality in incoming packet data to improve reuse. We find that for the benchmarks considered, 1% to 50% of instructions are reused while the speedup achieved varies between 1% and 24%. As a side effect, instruction reuse reduces memory traffic and can therefore be considered as a scheme for low power.
Resumo:
Moore's Law has driven the semiconductor revolution enabling over four decades of scaling in frequency, size, complexity, and power. However, the limits of physics are preventing further scaling of speed, forcing a paradigm shift towards multicore computing and parallelization. In effect, the system is taking over the role that the single CPU was playing: high-speed signals running through chips but also packages and boards connect ever more complex systems. High-speed signals making their way through the entire system cause new challenges in the design of computing hardware. Inductance, phase shifts and velocity of light effects, material resonances, and wave behavior become not only prevalent but need to be calculated accurately and rapidly to enable short design cycle times. In essence, to continue scaling with Moore's Law requires the incorporation of Maxwell's equations in the design process. Incorporating Maxwell's equations into the design flow is only possible through the combined power that new algorithms, parallelization and high-speed computing provide. At the same time, incorporation of Maxwell-based models into circuit and system-level simulation presents a massive accuracy, passivity, and scalability challenge. In this tutorial, we navigate through the often confusing terminology and concepts behind field solvers, show how advances in field solvers enable integration into EDA flows, present novel methods for model generation and passivity assurance in large systems, and demonstrate the power of cloud computing in enabling the next generation of scalable Maxwell solvers and the next generation of Moore's Law scaling of systems. We intend to show the truly symbiotic growing relationship between Maxwell and Moore!
Resumo:
With the proliferation of mobile wireless communication and embedded systems, the energy efficiency becomes a major design constraint. The dissipated energy is often referred as the product of power dissipation and the input-output delay. Most of electronic design automation techniques focus on optimising only one of these parameters either power or delay. Industry standard design flows integrate systematic methods of optimising either area or timing while for power consumption optimisation one often employs heuristics which are characteristic to a specific design. In this work we answer three questions in our quest to provide a systematic approach to joint power and delay Optimisation. The first question of our research is: How to build a design flow which incorporates academic and industry standard design flows for power optimisation? To address this question, we use a reference design flow provided by Synopsys and integrate in this flow academic tools and methodologies. The proposed design flow is used as a platform for analysing some novel algorithms and methodologies for optimisation in the context of digital circuits. The second question we answer is: Is possible to apply a systematic approach for power optimisation in the context of combinational digital circuits? The starting point is a selection of a suitable data structure which can easily incorporate information about delay, power, area and which then allows optimisation algorithms to be applied. In particular we address the implications of a systematic power optimisation methodologies and the potential degradation of other (often conflicting) parameters such as area or the delay of implementation. Finally, the third question which this thesis attempts to answer is: Is there a systematic approach for multi-objective optimisation of delay and power? A delay-driven power and power-driven delay optimisation is proposed in order to have balanced delay and power values. This implies that each power optimisation step is not only constrained by the decrease in power but also the increase in delay. Similarly, each delay optimisation step is not only governed with the decrease in delay but also the increase in power. The goal is to obtain multi-objective optimisation of digital circuits where the two conflicting objectives are power and delay. The logic synthesis and optimisation methodology is based on AND-Inverter Graphs (AIGs) which represent the functionality of the circuit. The switching activities and arrival times of circuit nodes are annotated onto an AND-Inverter Graph under the zero and a non-zero-delay model. We introduce then several reordering rules which are applied on the AIG nodes to minimise switching power or longest path delay of the circuit at the pre-technology mapping level. The academic Electronic Design Automation (EDA) tool ABC is used for the manipulation of AND-Inverter Graphs. We have implemented various combinatorial optimisation algorithms often used in Electronic Design Automation such as Simulated Annealing and Uniform Cost Search Algorithm. Simulated Annealing (SMA) is a probabilistic meta heuristic for the global optimization problem of locating a good approximation to the global optimum of a given function in a large search space. We used SMA to probabilistically decide between moving from one optimised solution to another such that the dynamic power is optimised under given delay constraints and the delay is optimised under given power constraints. A good approximation to the global optimum solution of energy constraint is obtained. Uniform Cost Search (UCS) is a tree search algorithm used for traversing or searching a weighted tree, tree structure, or graph. We have used Uniform Cost Search Algorithm to search within the AIG network, a specific AIG node order for the reordering rules application. After the reordering rules application, the AIG network is mapped to an AIG netlist using specific library cells. Our approach combines network re-structuring, AIG nodes reordering, dynamic power and longest path delay estimation and optimisation and finally technology mapping to an AIG netlist. A set of MCNC Benchmark circuits and large combinational circuits up to 100,000 gates have been used to validate our methodology. Comparisons for power and delay optimisation are made with the best synthesis scripts used in ABC. Reduction of 23% in power and 15% in delay with minimal overhead is achieved, compared to the best known ABC results. Also, our approach is also implemented on a number of processors with combinational and sequential components and significant savings are achieved.
Resumo:
A new method of dielectric-constant measurement is developed. The dielectric constant epsilon(r) RF/microwave substrate is extracted by combining the microstrip ring resonator measurement with Ansoft HFSS electromagnetic simulation software. The developed method has two advantages: (i) characterization of dielectric constant versus multiple frequency points, and (ii) compatibility with electronics design automation (EDA) software tools. This characterization method can reduce the design cycle of microwave circuits and devices. (C) 2004 Wiley Periodicals, Inc.
Resumo:
An overview of research on reconfigurable architectures for network processing applications within the Institute of Electronics, Communications and Information Technology (ECIT) is presented. Three key network processing topics, namely node throughput, Quality of Service (QoS) and security are examined where custom reconfigurability allows network nodes to adapt to fluctuating network traffic and customer demands. Various architectural possibilities have been investigated in order to explore the options and tradeoffs available when using reconfigurability for packet/frame processing, packet-scheduling and data encryption/decryption. This research has shown there is no common approach that can be applied. Rather the methodologies used and the cost-benefits for incorporation of reconfigurability depend on each of the functions considered, for example being well suited to encryption/decryption but not packet/frame processing. © 2005 IEEE.
Resumo:
2-D Discrete Cosine Transform (DCT) is widely used as the core of digital image and video compression. In this paper, we present a novel DCT architecture that allows aggressive voltage scaling by exploiting the fact that not all intermediate computations are equally important in a DCT system to obtain "good" image quality with Peak Signal to Noise Ratio(PSNR) > 30 dB. This observation has led us to propose a DCT architecture where the signal paths that are less contributive to PSNR improvement are designed to be longer than the paths that are more contributive to PSNR improvement. It should also be noted that robustness with respect to parameter variations and low power operation typically impose contradictory requirements in terms of architecture design. However, the proposed architecture lends itself to aggressive voltage scaling for low-power dissipation even under process parameter variations. Under a scaled supply voltage and/or variations in process parameters, any possible delay errors would only appear from the long paths that are less contributive towards PSNR improvement, providing large improvement in power dissipation with small PSNR degradation. Results show that even under large process variation and supply voltage scaling (0.8V), there is a gradual degradation of image quality with considerable power savings (62.8%) for the proposed architecture when compared to existing implementations in 70 nm process technology.