58 resultados para Field programmable gate arrays (FPGA)
em Indian Institute of Science - Bangalore - Índia
Resumo:
In this article, a Field Programmable Gate Array (FPGA)-based hardware accelerator for 3D electromagnetic extraction, using Method of Moments (MoM) is presented. As the number of nets or ports in a system increases, leading to a corresponding increase in the number of right-hand-side (RHS) vectors, the computational cost for multiple matrix-vector products presents a time bottleneck in a linear-complexity fast solver framework. In this work, an FPGA-based hardware implementation is proposed toward a two-level parallelization scheme: (i) matrix level parallelization for single RHS and (ii) pipelining for multiple-RHS. The method is applied to accelerate electrostatic parasitic capacitance extraction of multiple nets in a Ball Grid Array (BGA) package. The acceleration is shown to be linearly scalable with FPGA resources and speed-ups over 10x against equivalent software implementation on a 2.4GHz Intel Core i5 processor is achieved using a Virtex-6 XC6VLX240T FPGA on Xilinx's ML605 board with the implemented design operating at 200MHz clock frequency. (c) 2016 Wiley Periodicals, Inc. Microwave Opt Technol Lett 58:776-783, 2016
Resumo:
This paper presents a comparative evaluation of the average and switching models of a dc-dc boost converter from the point of view of real-time simulation. Both the models are used to simulate the converter in real-time on a Field Programmable Gate Array (FPGA) platform. The converter is considered to function over a wide range of operating conditions, and could do transition between continuous conduction mode (CCM) and discontinuous conduction mode (DCM). While the average model is known to be computationally efficient from the perspective of off-line simulation, the same is shown here to consume more logical resources than the switching model for real-time simulation of the dc-dc converter. Further, evaluation of the boundary condition between CCM and DCM is found to be the main reason for the increased consumption of resources by the average model.
Resumo:
A Field Programmable Gate Array (FPGA) based hardware accelerator for multi-conductor parasitic capacitance extraction, using Method of Moments (MoM), is presented in this paper. Due to the prohibitive cost of solving a dense algebraic system formed by MoM, linear complexity fast solver algorithms have been developed in the past to expedite the matrix-vector product computation in a Krylov sub-space based iterative solver framework. However, as the number of conductors in a system increases leading to a corresponding increase in the number of right-hand-side (RHS) vectors, the computational cost for multiple matrix-vector products present a time bottleneck, especially for ill-conditioned system matrices. In this work, an FPGA based hardware implementation is proposed to parallelize the iterative matrix solution for multiple RHS vectors in a low-rank compression based fast solver scheme. The method is applied to accelerate electrostatic parasitic capacitance extraction of multiple conductors in a Ball Grid Array (BGA) package. Speed-ups up to 13x over equivalent software implementation on an Intel Core i5 processor for dense matrix-vector products and 12x for QR compressed matrix-vector products is achieved using a Virtex-6 XC6VLX240T FPGA on Xilinx's ML605 board.
Resumo:
This paper presents the programming an FPGA (Field Programmable Gate Array) to emulate the dynamics of DC machines. FPGA allows high speed real time simulation with high precision. The described design includes block diagram representation of DC machine, which contain all arithmetic and logical operations. The real time simulation of the machine in FPGA is controlled by user interfaces they are Keypad interface, LCD display on-line and digital to analog converter. This approach provides emulation of electrical machine by changing the parameters. Separately Exited DC machine implemented and experimental results are presented.
Resumo:
This paper presents the new trend of FPGA (Field programmable Gate Array) based digital platform for the control of power electronic systems. There is a rising interest in using digital controllers in power electronic applications as they provide many advantages over their analog counterparts. A board comprising of Cyclone device EP1C12Q240C8 of Altera is used for developing this platform. The details of this board are presented. This developed platform can be used for the controller applications such as UPS, Induction Motor drives and front end converters. A real time simulation of a system can also be done. An open-loop induction motor drive has been implemented using this board and experimental results are presented.
Resumo:
Support vector machines (SVM) are a popular class of supervised models in machine learning. The associated compute intensive learning algorithm limits their use in real-time applications. This paper presents a fully scalable architecture of a coprocessor, which can compute multiple rows of the kernel matrix in parallel. Further, we propose an extended variant of the popular decomposition technique, sequential minimal optimization, which we call hybrid working set (HWS) algorithm, to effectively utilize the benefits of cached kernel columns and the parallel computational power of the coprocessor. The coprocessor is implemented on Xilinx Virtex 7 field-programmable gate array-based VC707 board and achieves a speedup of upto 25x for kernel computation over single threaded computation on Intel Core i5. An application speedup of upto 15x over software implementation of LIBSVM and speedup of upto 23x over SVMLight is achieved using the HWS algorithm in unison with the coprocessor. The reduction in the number of iterations and sensitivity of the optimization time to variation in cache size using the HWS algorithm are also shown.
Resumo:
In this paper, the validity of'single fault assumption in deriving diagnostic test sets is examined with respect to crosspoint faults in programmable logic arrays (PLA's). The control input procedure developed here can be used to convert PLA's having undetectable crosspoint faults to crosspoint-irredundant PLA's for testing purposes. All crosspoints will be testable in crosspoint-irredundant PLA's. The control inputs are used as extra variables during testing. They are maintained at logic I during normal operation. A useful heuristic for obtaining a near-minimal number of control inputs is suggested. Expressions for calculating bounds on the number of control inputs have also been obtained.
Resumo:
An on-line algorithm is developed for the location of single cross point faults in a PLA (FPLA). The main feature of the algorithm is the determination of a fault set corresponding to the response obtained for a failed test. For the apparently small number of faults in this set, all other tests are generated and a fault table is formed. Subsequently, an adaptive procedure is used to diagnose the fault. Functional equivalence test is carried out to determine the actual fault class if the adaptive testing results in a set of faults with identical tests. The large amount of computation time and storage required in the determination, a priori, of all the fault equivalence classes or in the construction of a fault dictionary are not needed here. A brief study of functional equivalence among the cross point faults is also made.
Resumo:
An on-line algorithm is developed for the location of single cross point faults in a PLA (FPLA). The main feature of the valgorithm is the determination of a fault set corresponding to the response obtained for a failed test. For the apparently small number of faults in this set, all other tests are generated and a fault table is formed. Subsequently, an adaptive procedure is used to diagnose the fault. Functional equivalence test is carried out to determine the actual fault class if the adaptive testing results in a set of faults with identical tests. The large amount of computation time and storage required in the determination, a priori, of all the fault equivalence classes or in the construction of a fault dictionary are not needed here. A brief study of functional equivalence among the cross point faults is also made.
Resumo:
This paper describes a switching theoretic algorithm for the folding of programmable logic arrays (PLA). The algorithm is valid for both column and row folding, although it has been presented considering only the simple column folding. The pairwise compatibility relations among all the pairs of the columns of the PLA are mapped into a square matrix, called the compatibility matrix of the PLA. A foldable compatibility matrix (FCM), a new concept introduced by the author, is then derived from the compatibility matrix. A new theorem called the folding theorem is then proved. The theorem states that the existence of an m by 2m FCM is both necessary and sufficient to fold 2m columns of the n column PLA (2m ≤ n). Once an FCM is obtained, the ordered pairs of foldable columns and the re-ordering of the rows are readily determined.
Resumo:
Instability in conventional haptic rendering destroys the perception of rigid objects in virtual environments. Inherent limitations in the conventional haptic loop restrict the maximum stiffness that can be rendered. In this paper we present a method to render virtual walls that are much stiffer than those achieved by conventional techniques. By removing the conventional digital haptic loop and replacing it with a part-continuous and part-discrete time hybrid haptic loop, we were able to render stiffer walls. The control loop is implemented as a combinational logic circuit on an field-programmable gate array. We compared the performance of the conventional haptic loop and our hybrid haptic loop on the same haptic device, and present mathematical analysis to show the limit of stability of our device. Our hybrid method removes the computer-intensive haptic loop from the CPU-this can free a significant amount of resources that can be used for other purposes such as graphical rendering and physics modeling. It is our hope that, in the future, similar designs will lead to a haptics processing unit (HPU).
Resumo:
High end network security applications demand high speed operation and large rule set support. Packet classification is the core functionality that demands high throughput in such applications. This paper proposes a packet classification architecture to meet such high throughput. We have implemented a Firewall with this architecture in reconflgurable hardware. We propose an extension to Distributed Crossproducting of Field Labels (DCFL) technique to achieve scalable and high performance architecture. The implemented Firewall takes advantage of inherent structure and redundancy of rule set by using our DCFL Extended (DCFLE) algorithm. The use of DCFLE algorithm results in both speed and area improvement when it is implemented in hardware. Although we restrict ourselves to standard 5-tuple matching, the architecture supports additional fields. High throughput classification invariably uses Ternary Content Addressable Memory (TCAM) for prefix matching, though TCAM fares poorly in terms of area and power efficiency. Use of TCAM for port range matching is expensive, as the range to prefix conversion results in large number of prefixes leading to storage inefficiency. Extended TCAM (ETCAM) is fast and the most storage efficient solution for range matching. We present for the first time a reconfigurable hardware implementation of ETCAM. We have implemented our Firewall as an embedded system on Virtex-II Pro FPGA based platform, running Linux with the packet classification in hardware. The Firewall was tested in real time with 1 Gbps Ethernet link and 128 sample rules. The packet classification hardware uses a quarter of logic resources and slightly over one third of memory resources of XC2VP30 FPGA. It achieves a maximum classification throughput of 50 million packet/s corresponding to 16 Gbps link rate for the worst case packet size. The Firewall rule update involves only memory re-initialization in software without any hardware change.
Resumo:
High end network security applications demand high speed operation and large rule set support. Packet classification is the core functionality that demands high throughput in such applications. This paper proposes a packet classification architecture to meet such high throughput. We have Implemented a Firewall with this architecture in reconfigurable hardware. We propose an extension to Distributed Crossproducting of Field Labels (DCFL) technique to achieve scalable and high performance architecture. The implemented Firewall takes advantage of inherent structure and redundancy of rule set by using, our DCFL Extended (DCFLE) algorithm. The use of DCFLE algorithm results In both speed and area Improvement when It is Implemented in hardware. Although we restrict ourselves to standard 5-tuple matching, the architecture supports additional fields.High throughput classification Invariably uses Ternary Content Addressable Memory (TCAM) for prefix matching, though TCAM fares poorly In terms of area and power efficiency. Use of TCAM for port range matching is expensive, as the range to prefix conversion results in large number of prefixes leading to storage inefficiency. Extended TCAM (ETCAM) is fast and the most storage efficient solution for range matching. We present for the first time a reconfigurable hardware Implementation of ETCAM. We have implemented our Firewall as an embedded system on Virtex-II Pro FPGA based platform, running Linux with the packet classification in hardware. The Firewall was tested in real time with 1 Gbps Ethernet link and 128 sample rules. The packet classification hardware uses a quarter of logic resources and slightly over one third of memory resources of XC2VP30 FPGA. It achieves a maximum classification throughput of 50 million packet/s corresponding to 16 Gbps link rate for file worst case packet size. The Firewall rule update Involves only memory re-initialiization in software without any hardware change.
Resumo:
With ever increasing network speed, scalable and reliable detection of network port scans has become a major challenge. In this paper, we present a scalable and flexible architecture and a novel algorithm, to detect and block port scans in real time. The proposed architecture detects fast scanners as well as stealth scanners having large inter-probe periods. FPGA implementation of the proposed system gives an average throughput of 2 Gbps with a system clock frequency of 100 MHz on Xilinx Virtex-II Pro FPGA. Experimental results on real network trace show the effectiveness of the proposed system in detecting and blocking network scans with very low false positives and false negatives.
Resumo:
The letter reports an algorithm for the folding of programmable logic arrays. The algorithm is valid for both column and row folding, although it has been presented considering only the simple column folding. The pairwise compatibility relations among all the pairs of the columns of the PLA are plotted in a matrix called the compatibility matrix of the PLA. A foldable compatibility matrix (FCM), a new concept defined in the letter, is then derived from the compatibility matrix. Once an FCM is obtained, the ordered pairs of fold-able columns and the reordering of the rows are readily determined