61 resultados para reconfigurable computing

em Indian Institute of Science - Bangalore - Índia


Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper we develop compilation techniques for the realization of applications described in a High Level Language (HLL) onto a Runtime Reconfigurable Architecture. The compiler determines Hyper Operations (HyperOps) that are subgraphs of a data flow graph (of an application) and comprise elementary operations that have strong producer-consumer relationship. These HyperOps are hosted on computation structures that are provisioned on demand at runtime. We also report compiler optimizations that collectively reduce the overheads of data-driven computations in runtime reconfigurable architectures. On an average, HyperOps offer a 44% reduction in total execution time and a 18% reduction in management overheads as compared to using basic blocks as coarse grained operations. We show that HyperOps formed using our compiler are suitable to support data flow software pipelining.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Conventional hardware implementation techniques for FIR filters require the computation of filter coefficients in software and have them stored in memory. This approach is static in the sense that any further fine tuning of the filter requires computation of new coefficients in software. In this paper, we propose an alternate technique for implementing FIR filters in hardware. We store a considerably large number of impulse response coefficients of the ideal filter (having box type frequency response) in memory. We then do the windowing process, on these coefficients, in hardware using integer sequences as window functions. The integer sequences are also generated in hardware. This approach offers the flexibility in fine tuning the filter, like varying the transition bandwidth around a particular cutoff frequency.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Emerging embedded applications are based on evolving standards (e.g., MPEG2/4, H.264/265, IEEE802.11a/b/g/n). Since most of these applications run on handheld devices, there is an increasing need for a single chip solution that can dynamically interoperate between different standards and their derivatives. In order to achieve high resource utilization and low power dissipation, we propose REDEFINE, a polymorphic ASIC in which specialized hardware units are replaced with basic hardware units that can create the same functionality by runtime re-composition. It is a ``future-proof'' custom hardware solution for multiple applications and their derivatives in a domain. In this article, we describe a compiler framework and supporting hardware comprising compute, storage, and communication resources. Applications described in high-level language (e.g., C) are compiled into application substructures. For each application substructure, a set of compute elements on the hardware are interconnected during runtime to form a pattern that closely matches the communication pattern of that particular application. The advantage is that the bounded CEs are neither processor cores nor logic elements as in FPGAs. Hence, REDEFINE offers the power and performance advantage of an ASIC and the hardware reconfigurability and programmability of that of an FPGA/instruction set processor. In addition, the hardware supports custom instruction pipelining. Existing instruction-set extensible processors determine a sequence of instructions that repeatedly occur within the application to create custom instructions at design time to speed up the execution of this sequence. We extend this scheme further, where a kernel is compiled into custom instructions that bear strong producer-consumer relationship (and not limited to frequently occurring sequences of instructions). Custom instructions, realized as hardware compositions effected at runtime, allow several instances of the same to be active in parallel. A key distinguishing factor in majority of the emerging embedded applications is stream processing. To reduce the overheads of data transfer between custom instructions, direct communication paths are employed among custom instructions. In this article, we present the overview of the hardware-aware compiler framework, which determines the NoC-aware schedule of transports of the data exchanged between the custom instructions on the interconnect. The results for the FFT kernel indicate a 25% reduction in the number of loads/stores, and throughput improves by log(n) for n-point FFT when compared to sequential implementation. Overall, REDEFINE offers flexibility and a runtime reconfigurability at the expense of 1.16x in power and 8x in area when compared to an ASIC. REDEFINE implementation consumes 0.1x the power of an FPGA implementation. In addition, the configuration overhead of the FPGA implementation is 1,000x more than that of REDEFINE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the world of high performance computing huge efforts have been put to accelerate Numerical Linear Algebra (NLA) kernels like QR Decomposition (QRD) with the added advantage of reconfigurability and scalability. While popular custom hardware solution in form of systolic arrays can deliver high performance, they are not scalable, and hence not commercially viable. In this paper, we show how systolic solutions of QRD can be realized efficiently on REDEFINE, a scalable runtime reconfigurable hardware platform. We propose various enhancements to REDEFINE to meet the custom need of accelerating NLA kernels. We further do the design space exploration of the proposed solution for any arbitrary application of size n × n. We determine the right size of the sub-array in accordance with the optimal pipeline depth of the core execution units and the number of such units to be used per sub-array.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A multi phase, delay-locked loop (DLL) based frequency synthesizer is designed for harmonic rejection mixing in reconfigurable radios. This frequency synthesizer uses a 1 GHz input reference frequency, and achieves <= 20ns settling time by utilizing a wide loop bandwidth. The circuit has been designed in 0.13-mu m CMOS technology. It is designed for a frequency range of 500 MHz to 3 GHz with stuck/harmonic lock removal assist. Index Terms-stuck lock, harmonic lock, delay-locked loops, multi phase, phase detector, frequency synthesis

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we present a decentralized dynamic load scheduling/balancing algorithm called ELISA (Estimated Load Information Scheduling Algorithm) for general purpose distributed computing systems. ELISA uses estimated state information based upon periodic exchange of exact state information between neighbouring nodes to perform load scheduling. The primary objective of the algorithm is to cut down on the communication and load transfer overheads by minimizing the frequency of status exchange and by restricting the load transfer and status exchange within the buddy set of a processor. It is shown that the resulting algorithm performs almost as well as a perfect information algorithm and is superior to other load balancing schemes based on the random sharing and Ni-Hwang algorithms. A sensitivity analysis to study the effect of various design parameters on the effectiveness of load balancing is also carried out. Finally, the algorithm's performance is tested on large dimensional hypercubes in the presence of time-varying load arrival process and is shown to perform well in comparison to other algorithms. This makes ELISA a viable and implementable load balancing algorithm for use in general purpose distributed computing systems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

High end network security applications demand high speed operation and large rule set support. Packet classification is the core functionality that demands high throughput in such applications. This paper proposes a packet classification architecture to meet such high throughput. We have implemented a Firewall with this architecture in reconflgurable hardware. We propose an extension to Distributed Crossproducting of Field Labels (DCFL) technique to achieve scalable and high performance architecture. The implemented Firewall takes advantage of inherent structure and redundancy of rule set by using our DCFL Extended (DCFLE) algorithm. The use of DCFLE algorithm results in both speed and area improvement when it is implemented in hardware. Although we restrict ourselves to standard 5-tuple matching, the architecture supports additional fields. High throughput classification invariably uses Ternary Content Addressable Memory (TCAM) for prefix matching, though TCAM fares poorly in terms of area and power efficiency. Use of TCAM for port range matching is expensive, as the range to prefix conversion results in large number of prefixes leading to storage inefficiency. Extended TCAM (ETCAM) is fast and the most storage efficient solution for range matching. We present for the first time a reconfigurable hardware implementation of ETCAM. We have implemented our Firewall as an embedded system on Virtex-II Pro FPGA based platform, running Linux with the packet classification in hardware. The Firewall was tested in real time with 1 Gbps Ethernet link and 128 sample rules. The packet classification hardware uses a quarter of logic resources and slightly over one third of memory resources of XC2VP30 FPGA. It achieves a maximum classification throughput of 50 million packet/s corresponding to 16 Gbps link rate for the worst case packet size. The Firewall rule update involves only memory re-initialization in software without any hardware change.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

High end network security applications demand high speed operation and large rule set support. Packet classification is the core functionality that demands high throughput in such applications. This paper proposes a packet classification architecture to meet such high throughput. We have Implemented a Firewall with this architecture in reconfigurable hardware. We propose an extension to Distributed Crossproducting of Field Labels (DCFL) technique to achieve scalable and high performance architecture. The implemented Firewall takes advantage of inherent structure and redundancy of rule set by using, our DCFL Extended (DCFLE) algorithm. The use of DCFLE algorithm results In both speed and area Improvement when It is Implemented in hardware. Although we restrict ourselves to standard 5-tuple matching, the architecture supports additional fields.High throughput classification Invariably uses Ternary Content Addressable Memory (TCAM) for prefix matching, though TCAM fares poorly In terms of area and power efficiency. Use of TCAM for port range matching is expensive, as the range to prefix conversion results in large number of prefixes leading to storage inefficiency. Extended TCAM (ETCAM) is fast and the most storage efficient solution for range matching. We present for the first time a reconfigurable hardware Implementation of ETCAM. We have implemented our Firewall as an embedded system on Virtex-II Pro FPGA based platform, running Linux with the packet classification in hardware. The Firewall was tested in real time with 1 Gbps Ethernet link and 128 sample rules. The packet classification hardware uses a quarter of logic resources and slightly over one third of memory resources of XC2VP30 FPGA. It achieves a maximum classification throughput of 50 million packet/s corresponding to 16 Gbps link rate for file worst case packet size. The Firewall rule update Involves only memory re-initialiization in software without any hardware change.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Simultaneous consideration of both performance and reliability issues is important in the choice of computer architectures for real-time aerospace applications. One of the requirements for such a fault-tolerant computer system is the characteristic of graceful degradation. A shared and replicated resources computing system represents such an architecture. In this paper, a combinatorial model is used for the evaluation of the instruction execution rate of a degradable, replicated resources computing system such as a modular multiprocessor system. Next, a method is presented to evaluate the computation reliability of such a system utilizing a reliability graph model and the instruction execution rate. Finally, this computation reliability measure, which simultaneously describes both performance and reliability, is applied as a constraint in an architecture optimization model for such computing systems. Index Terms-Architecture optimization, computation

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper is aimed at reviewing the notion of Byzantine-resilient distributed computing systems, the relevant protocols and their possible applications as reported in the literature. The three agreement problems, namely, the consensus problem, the interactive consistency problem, and the generals problem have been discussed. Various agreement protocols for the Byzantine generals problem have been summarized in terms of their performance and level of fault-tolerance. The three classes of Byzantine agreement protocols discussed are the deterministic, randomized, and approximate agreement protocols. Finally, application of the Byzantine agreement protocols to clock synchronization is highlighted.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We report the design and characterization of a circuit technique to measure the on-chip delay of an individual logic gate (both inverting and noninverting) in its unmodified form. The test circuit comprises of digitally reconfigurable ring oscillator (RO). The gate under test is embedded in each stage of the ring oscillator. A system of linear equations is then formed with different configuration settings of the RO, relating the individual gate delay to the measured period of the RO, whose solution gives the delay of the individual gates. Experimental results from a test chip in 65-nm process node show the feasibility of measuring the delay of an individual inverter to within 1 ps accuracy. Delay measurements of different nominally identicall inverters in close physical proximity show variations of up to 28% indicating the large impact of local variations. As a demonstration of this technique, we have studied delay variation with poly-pitch, length of diffusion (LOD) and different orientations of layout in silicon. The proposed technique is quite suitable for early process characterization, monitoring mature process in manufacturing and correlating model-to-hardware.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Reeb graph tracks topology changes in level sets of a scalar function and finds applications in scientific visualization and geometric modeling. We describe an algorithm that constructs the Reeb graph of a Morse function defined on a 3-manifold. Our algorithm maintains connected components of the two dimensional levels sets as a dynamic graph and constructs the Reeb graph in O(nlogn+nlogg(loglogg)3) time, where n is the number of triangles in the tetrahedral mesh representing the 3-manifold and g is the maximum genus over all level sets of the function. We extend this algorithm to construct Reeb graphs of d-manifolds in O(nlogn(loglogn)3) time, where n is the number of triangles in the simplicial complex that represents the d-manifold. Our result is a significant improvement over the previously known O(n2) algorithm. Finally, we present experimental results of our implementation and demonstrate that our algorithm for 3-manifolds performs efficiently in practice.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Onboard spacecraft computing system is a case of a functionally distributed system that requires continuous interaction among the nodes to control the operations at different nodes. A simple and reliable protocol is desired for such an application. This paper discusses a formal approach to specify the computing system with respect to some important issues encountered in the design and development of a protocol for the onboard distributed system. The issues considered in this paper are concurrency, exclusiveness and sequencing relationships among the various processes at different nodes. A 6-tuple model is developed for the precise specification of the system. The model also enables us to check the consistency of specification and deadlock caused due to improper specification. An example is given to illustrate the use of the proposed methodology for a typical spacecraft configuration. Although the theory is motivated by a specific application the same may be applied to other distributed computing system such as those encountered in process control industries, power plant control and other similar environments.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The existing internet computing resource, Biomolecules Segment Display Device (BSDD), has been updated with several additional useful features. An advanced option is provided to superpose the structural motifs obtained from a search on the Protein Data Bank (PDB) in order to see if the three-dimensional structures adopted by identical or similar sequence motifs are the same. Furthermore, the options to display structural aspects like inter- and intra-molecular interactions, ion-pairs, disulphide bonds, etc. have been provided.The updated resource is interfaced with an up-to-date copy of the public domain PDB as well as 25 and 90% non-redundant protein structures. Further, users can upload the three-dimensional atomic coordinates (PDB format) from the client machine. A free molecular graphics program, JMol, is interfaced with it to display the three-dimensional structures.