53 resultados para Field-Programmable Gate Array (FPGA)
Resumo:
In this article, a Field Programmable Gate Array (FPGA)-based hardware accelerator for 3D electromagnetic extraction, using Method of Moments (MoM) is presented. As the number of nets or ports in a system increases, leading to a corresponding increase in the number of right-hand-side (RHS) vectors, the computational cost for multiple matrix-vector products presents a time bottleneck in a linear-complexity fast solver framework. In this work, an FPGA-based hardware implementation is proposed toward a two-level parallelization scheme: (i) matrix level parallelization for single RHS and (ii) pipelining for multiple-RHS. The method is applied to accelerate electrostatic parasitic capacitance extraction of multiple nets in a Ball Grid Array (BGA) package. The acceleration is shown to be linearly scalable with FPGA resources and speed-ups over 10x against equivalent software implementation on a 2.4GHz Intel Core i5 processor is achieved using a Virtex-6 XC6VLX240T FPGA on Xilinx's ML605 board with the implemented design operating at 200MHz clock frequency. (c) 2016 Wiley Periodicals, Inc. Microwave Opt Technol Lett 58:776-783, 2016
Resumo:
This paper presents a comparative evaluation of the average and switching models of a dc-dc boost converter from the point of view of real-time simulation. Both the models are used to simulate the converter in real-time on a Field Programmable Gate Array (FPGA) platform. The converter is considered to function over a wide range of operating conditions, and could do transition between continuous conduction mode (CCM) and discontinuous conduction mode (DCM). While the average model is known to be computationally efficient from the perspective of off-line simulation, the same is shown here to consume more logical resources than the switching model for real-time simulation of the dc-dc converter. Further, evaluation of the boundary condition between CCM and DCM is found to be the main reason for the increased consumption of resources by the average model.
Resumo:
A Field Programmable Gate Array (FPGA) based hardware accelerator for multi-conductor parasitic capacitance extraction, using Method of Moments (MoM), is presented in this paper. Due to the prohibitive cost of solving a dense algebraic system formed by MoM, linear complexity fast solver algorithms have been developed in the past to expedite the matrix-vector product computation in a Krylov sub-space based iterative solver framework. However, as the number of conductors in a system increases leading to a corresponding increase in the number of right-hand-side (RHS) vectors, the computational cost for multiple matrix-vector products present a time bottleneck, especially for ill-conditioned system matrices. In this work, an FPGA based hardware implementation is proposed to parallelize the iterative matrix solution for multiple RHS vectors in a low-rank compression based fast solver scheme. The method is applied to accelerate electrostatic parasitic capacitance extraction of multiple conductors in a Ball Grid Array (BGA) package. Speed-ups up to 13x over equivalent software implementation on an Intel Core i5 processor for dense matrix-vector products and 12x for QR compressed matrix-vector products is achieved using a Virtex-6 XC6VLX240T FPGA on Xilinx's ML605 board.
Resumo:
This paper presents the programming an FPGA (Field Programmable Gate Array) to emulate the dynamics of DC machines. FPGA allows high speed real time simulation with high precision. The described design includes block diagram representation of DC machine, which contain all arithmetic and logical operations. The real time simulation of the machine in FPGA is controlled by user interfaces they are Keypad interface, LCD display on-line and digital to analog converter. This approach provides emulation of electrical machine by changing the parameters. Separately Exited DC machine implemented and experimental results are presented.
Resumo:
This paper presents the new trend of FPGA (Field programmable Gate Array) based digital platform for the control of power electronic systems. There is a rising interest in using digital controllers in power electronic applications as they provide many advantages over their analog counterparts. A board comprising of Cyclone device EP1C12Q240C8 of Altera is used for developing this platform. The details of this board are presented. This developed platform can be used for the controller applications such as UPS, Induction Motor drives and front end converters. A real time simulation of a system can also be done. An open-loop induction motor drive has been implemented using this board and experimental results are presented.
Resumo:
Support vector machines (SVM) are a popular class of supervised models in machine learning. The associated compute intensive learning algorithm limits their use in real-time applications. This paper presents a fully scalable architecture of a coprocessor, which can compute multiple rows of the kernel matrix in parallel. Further, we propose an extended variant of the popular decomposition technique, sequential minimal optimization, which we call hybrid working set (HWS) algorithm, to effectively utilize the benefits of cached kernel columns and the parallel computational power of the coprocessor. The coprocessor is implemented on Xilinx Virtex 7 field-programmable gate array-based VC707 board and achieves a speedup of upto 25x for kernel computation over single threaded computation on Intel Core i5. An application speedup of upto 15x over software implementation of LIBSVM and speedup of upto 23x over SVMLight is achieved using the HWS algorithm in unison with the coprocessor. The reduction in the number of iterations and sensitivity of the optimization time to variation in cache size using the HWS algorithm are also shown.
Resumo:
Instability in conventional haptic rendering destroys the perception of rigid objects in virtual environments. Inherent limitations in the conventional haptic loop restrict the maximum stiffness that can be rendered. In this paper we present a method to render virtual walls that are much stiffer than those achieved by conventional techniques. By removing the conventional digital haptic loop and replacing it with a part-continuous and part-discrete time hybrid haptic loop, we were able to render stiffer walls. The control loop is implemented as a combinational logic circuit on an field-programmable gate array. We compared the performance of the conventional haptic loop and our hybrid haptic loop on the same haptic device, and present mathematical analysis to show the limit of stability of our device. Our hybrid method removes the computer-intensive haptic loop from the CPU-this can free a significant amount of resources that can be used for other purposes such as graphical rendering and physics modeling. It is our hope that, in the future, similar designs will lead to a haptics processing unit (HPU).
Resumo:
High end network security applications demand high speed operation and large rule set support. Packet classification is the core functionality that demands high throughput in such applications. This paper proposes a packet classification architecture to meet such high throughput. We have implemented a Firewall with this architecture in reconflgurable hardware. We propose an extension to Distributed Crossproducting of Field Labels (DCFL) technique to achieve scalable and high performance architecture. The implemented Firewall takes advantage of inherent structure and redundancy of rule set by using our DCFL Extended (DCFLE) algorithm. The use of DCFLE algorithm results in both speed and area improvement when it is implemented in hardware. Although we restrict ourselves to standard 5-tuple matching, the architecture supports additional fields. High throughput classification invariably uses Ternary Content Addressable Memory (TCAM) for prefix matching, though TCAM fares poorly in terms of area and power efficiency. Use of TCAM for port range matching is expensive, as the range to prefix conversion results in large number of prefixes leading to storage inefficiency. Extended TCAM (ETCAM) is fast and the most storage efficient solution for range matching. We present for the first time a reconfigurable hardware implementation of ETCAM. We have implemented our Firewall as an embedded system on Virtex-II Pro FPGA based platform, running Linux with the packet classification in hardware. The Firewall was tested in real time with 1 Gbps Ethernet link and 128 sample rules. The packet classification hardware uses a quarter of logic resources and slightly over one third of memory resources of XC2VP30 FPGA. It achieves a maximum classification throughput of 50 million packet/s corresponding to 16 Gbps link rate for the worst case packet size. The Firewall rule update involves only memory re-initialization in software without any hardware change.
Resumo:
High end network security applications demand high speed operation and large rule set support. Packet classification is the core functionality that demands high throughput in such applications. This paper proposes a packet classification architecture to meet such high throughput. We have Implemented a Firewall with this architecture in reconfigurable hardware. We propose an extension to Distributed Crossproducting of Field Labels (DCFL) technique to achieve scalable and high performance architecture. The implemented Firewall takes advantage of inherent structure and redundancy of rule set by using, our DCFL Extended (DCFLE) algorithm. The use of DCFLE algorithm results In both speed and area Improvement when It is Implemented in hardware. Although we restrict ourselves to standard 5-tuple matching, the architecture supports additional fields.High throughput classification Invariably uses Ternary Content Addressable Memory (TCAM) for prefix matching, though TCAM fares poorly In terms of area and power efficiency. Use of TCAM for port range matching is expensive, as the range to prefix conversion results in large number of prefixes leading to storage inefficiency. Extended TCAM (ETCAM) is fast and the most storage efficient solution for range matching. We present for the first time a reconfigurable hardware Implementation of ETCAM. We have implemented our Firewall as an embedded system on Virtex-II Pro FPGA based platform, running Linux with the packet classification in hardware. The Firewall was tested in real time with 1 Gbps Ethernet link and 128 sample rules. The packet classification hardware uses a quarter of logic resources and slightly over one third of memory resources of XC2VP30 FPGA. It achieves a maximum classification throughput of 50 million packet/s corresponding to 16 Gbps link rate for file worst case packet size. The Firewall rule update Involves only memory re-initialiization in software without any hardware change.
Resumo:
With ever increasing network speed, scalable and reliable detection of network port scans has become a major challenge. In this paper, we present a scalable and flexible architecture and a novel algorithm, to detect and block port scans in real time. The proposed architecture detects fast scanners as well as stealth scanners having large inter-probe periods. FPGA implementation of the proposed system gives an average throughput of 2 Gbps with a system clock frequency of 100 MHz on Xilinx Virtex-II Pro FPGA. Experimental results on real network trace show the effectiveness of the proposed system in detecting and blocking network scans with very low false positives and false negatives.
Resumo:
The problem of determining a minimal number of control inputs for converting a programmable logic array (PLA) with undetectable faults to crosspoint-irredundant PLA for testing has been formulated as a nonstandard set covering problem. By representing subsets of sets as cubes, this problem has been reformulated as familiar problems. It is noted that this result has significance because a crosspoint-irredundant PLA can be converted to a completely testable PLA in a straightforward fashion, thus achieving very good fault coverage and easy testability.
Resumo:
In this paper we propose the architecture of a SoC fabric onto which applications described in a HLL are synthesized. The fabric is a homogeneous layout of computation, storage and communication resources on silicon. Through a process of composition of resources (as opposed to decomposition of applications), application specific computational structures are defined on the fabric at runtime to realize different modules of the applications in hardware. Applications synthesized on this fabric offers performance comparable to ASICs while retaining the programmability of processing cores. We outline the application synthesis methodology through examples, and compare our results with software implementations on traditional platforms with unbounded resources.
Resumo:
Fluctuation of field emission current from carbon nanotubes (CNTs) poses certain difficulties for their use in nanobiomedical X-ray devices and imaging probes. This problem arises due to deformation of the CNTs due to electrodynamic force field and electron-phonon interaction. It is of great importance to have precise control of emitted electron beams very near the CNT tips. In this paper, a new array configuration with stacked array of CNTs is analysed and it is shown that the current density distribution is greatly localised at the middle of the array, that the scatter due to electrodynamic force field is minimised and that the temperature transients are much smaller compared to those in an array with random height distribution.
Resumo:
Fluctuation of field emission in carbon nanotubes (CNTs) is riot desirable in many applications and the design of biomedical x-ray devices is one of them. In these applications, it is of great importance to have precise control of electron beams over multiple spatio-temporal scales. In this paper, a new design is proposed in order to optimize the field emission performance of CNT arrays. A diode configuration is used for analysis, where arrays of CNTs act as cathode. The results indicate that the linear height distribution of CNTs, as proposed in this study, shows more stable performance than the conventionally used unifrom distribution.
Resumo:
Context sensitive pointer analyses based on Whaley and Lam’s bddbddb system have been shown to scale to large Java programs. We provide a technique to incorporate flow sensitivity for Java fields into one such analysis and obtain an escape analysis based on it. First, we express an intraprocedural field flow sensitive analysis, using Fink et al.’s Heap Array SSA form in Datalog. We then extend this analysis interprocedurally by introducing two new φ functions for Heap Array SSA Form and adding deduction rules corresponding to them. Adding a few more rules gives us an escape analysis. We describe two types of field flow sensitivity: partial (PFFS) and full (FFFS), the former without strong updates to fields and the latter with strong updates. We compare these analyses with two different (field flow insensitive) versions of Whaley-Lam analysis: one of which is flow sensitive for locals (FS) and the other, flow insensitive for locals (FIS). We have implemented this analysis on the bddbddb system while using the SOOT open source framework as a front end. We have run our analysis on a set of 15 Java programs. Our experimental results show that the time taken by our field flow sensitive analyses is comparable to that of the field flow insensitive versions while doing much better in some cases. Our PFFS analysis achieves average reductions of about 23% and 30% in the size of the points-to sets at load and store statements respectively and discovers 71% more “caller-captured” objects than FIS.