972 resultados para hardware implementation
Resumo:
An algorithm for computing dense correspondences between images of a stereo pair or image sequence is presented. The algorithm can make use of both standard matching metrics and the rank and census filters, two filters based on order statistics which have been applied to the image matching problem. Their advantages include robustness to radiometric distortion and amenability to hardware implementation. Results obtained using both real stereo pairs and a synthetic stereo pair with ground truth were compared. The rank and census filters were shown to significantly improve performance in the case of radiometric distortion. In all cases, the results obtained were comparable to, if not better than, those obtained using standard matching metrics. Furthermore, the rank and census have the additional advantage that their computational overhead is less than these metrics. For all techniques tested, the difference between the results obtained for the synthetic stereo pair, and the ground truth results was small.
Resumo:
The rank and census are two filters based on order statistics which have been applied to the image matching problem for stereo pairs. Advantages of these filters include their robustness to radiometric distortion and small amounts of random noise, and their amenability to hardware implementation. In this paper, a new matching algorithm is presented, which provides an overall framework for matching, and is used to compare the rank and census techniques with standard matching metrics. The algorithm was tested using both real stereo pairs and a synthetic pair with ground truth. The rank and census filters were shown to significantly improve performance in the case of radiometric distortion. In all cases, the results obtained were comparable to, if not better than, those obtained using standard matching metrics. Furthermore, the rank and census have the additional advantage that their computational overhead is less than these metrics. For all techniques tested, the difference between the results obtained for the synthetic stereo pair, and the ground truth results was small.
Resumo:
High end network security applications demand high speed operation and large rule set support. Packet classification is the core functionality that demands high throughput in such applications. This paper proposes a packet classification architecture to meet such high throughput. We have implemented a Firewall with this architecture in reconflgurable hardware. We propose an extension to Distributed Crossproducting of Field Labels (DCFL) technique to achieve scalable and high performance architecture. The implemented Firewall takes advantage of inherent structure and redundancy of rule set by using our DCFL Extended (DCFLE) algorithm. The use of DCFLE algorithm results in both speed and area improvement when it is implemented in hardware. Although we restrict ourselves to standard 5-tuple matching, the architecture supports additional fields. High throughput classification invariably uses Ternary Content Addressable Memory (TCAM) for prefix matching, though TCAM fares poorly in terms of area and power efficiency. Use of TCAM for port range matching is expensive, as the range to prefix conversion results in large number of prefixes leading to storage inefficiency. Extended TCAM (ETCAM) is fast and the most storage efficient solution for range matching. We present for the first time a reconfigurable hardware implementation of ETCAM. We have implemented our Firewall as an embedded system on Virtex-II Pro FPGA based platform, running Linux with the packet classification in hardware. The Firewall was tested in real time with 1 Gbps Ethernet link and 128 sample rules. The packet classification hardware uses a quarter of logic resources and slightly over one third of memory resources of XC2VP30 FPGA. It achieves a maximum classification throughput of 50 million packet/s corresponding to 16 Gbps link rate for the worst case packet size. The Firewall rule update involves only memory re-initialization in software without any hardware change.
Resumo:
High end network security applications demand high speed operation and large rule set support. Packet classification is the core functionality that demands high throughput in such applications. This paper proposes a packet classification architecture to meet such high throughput. We have Implemented a Firewall with this architecture in reconfigurable hardware. We propose an extension to Distributed Crossproducting of Field Labels (DCFL) technique to achieve scalable and high performance architecture. The implemented Firewall takes advantage of inherent structure and redundancy of rule set by using, our DCFL Extended (DCFLE) algorithm. The use of DCFLE algorithm results In both speed and area Improvement when It is Implemented in hardware. Although we restrict ourselves to standard 5-tuple matching, the architecture supports additional fields.High throughput classification Invariably uses Ternary Content Addressable Memory (TCAM) for prefix matching, though TCAM fares poorly In terms of area and power efficiency. Use of TCAM for port range matching is expensive, as the range to prefix conversion results in large number of prefixes leading to storage inefficiency. Extended TCAM (ETCAM) is fast and the most storage efficient solution for range matching. We present for the first time a reconfigurable hardware Implementation of ETCAM. We have implemented our Firewall as an embedded system on Virtex-II Pro FPGA based platform, running Linux with the packet classification in hardware. The Firewall was tested in real time with 1 Gbps Ethernet link and 128 sample rules. The packet classification hardware uses a quarter of logic resources and slightly over one third of memory resources of XC2VP30 FPGA. It achieves a maximum classification throughput of 50 million packet/s corresponding to 16 Gbps link rate for file worst case packet size. The Firewall rule update Involves only memory re-initialiization in software without any hardware change.
Resumo:
This paper describes a hardware implementation of a two-way converter logic by which conversion between numbers from positive to negative binary representation is possible. Index terms: (i) Negative radix, (ii) Positive radix, (iii) Two-way conversion.
Resumo:
Conventional hardware implementation techniques for FIR filters require the computation of filter coefficients in software and have them stored in memory. This approach is static in the sense that any further fine tuning of the filter requires computation of new coefficients in software. In this paper, we propose an alternate technique for implementing FIR filters in hardware. We store a considerably large number of impulse response coefficients of the ideal filter (having box type frequency response) in memory. We then do the windowing process, on these coefficients, in hardware using integer sequences as window functions. The integer sequences are also generated in hardware. This approach offers the flexibility in fine tuning the filter, like varying the transition bandwidth around a particular cutoff frequency.
Resumo:
We address the problem of sampling and reconstruction of two-dimensional (2-D) finite-rate-of-innovation (FRI) signals. We propose a three-channel sampling method for efficiently solving the problem. We consider the sampling of a stream of 2-D Dirac impulses and a sum of 2-D unit-step functions. We propose a 2-D causal exponential function as the sampling kernel. By causality in 2-D, we mean that the function has its support restricted to the first quadrant. The advantage of using a multichannel sampling method with causal exponential sampling kernel is that standard annihilating filter or root-finding algorithms are not required. Further, the proposed method has inexpensive hardware implementation and is numerically stable as the number of Dirac impulses increases.
Resumo:
Using the spatial modulation approach, where only one transmit antenna is active at a time, we propose two transmission schemes for two-way relay channel using physical layer network coding with space time coding using coordinate interleaved orthogonal designs (CIODs). It is shown that using two uncorrelated transmit antennas at the nodes, but using only one RF transmit chain and space-time coding across these antennas can give a better performance without using any extra resources and without increasing the hardware implementation cost and complexity. In the first transmission scheme, two antennas are used only at the relay, adaptive network coding (ANC) is employed at the relay and the relay transmits a CIOD space time block code (STBC). This gives a better performance compared to an existing ANC scheme for two-way relay channel which uses one antenna each at all the three nodes. It is shown that for this scheme at high SNR the average end-to-end symbol error probability (SEP) is upper bounded by twice the SEP of a point-to-point fading channel. In the second transmission scheme, two transmit antennas are used at all the three nodes, CIOD STBCs are transmitted in multiple access and broadcast phases. This scheme provides a diversity order of two for the average end-to-end SEP with an increased decoding complexity of O(M-3) for an arbitrary signal set and O(M-2 root M) for square QAM signal set. Simulation results show that the proposed schemes performs better than the existing ANC schemes under perfect and imperfect channel state information.
Resumo:
In this article, a Field Programmable Gate Array (FPGA)-based hardware accelerator for 3D electromagnetic extraction, using Method of Moments (MoM) is presented. As the number of nets or ports in a system increases, leading to a corresponding increase in the number of right-hand-side (RHS) vectors, the computational cost for multiple matrix-vector products presents a time bottleneck in a linear-complexity fast solver framework. In this work, an FPGA-based hardware implementation is proposed toward a two-level parallelization scheme: (i) matrix level parallelization for single RHS and (ii) pipelining for multiple-RHS. The method is applied to accelerate electrostatic parasitic capacitance extraction of multiple nets in a Ball Grid Array (BGA) package. The acceleration is shown to be linearly scalable with FPGA resources and speed-ups over 10x against equivalent software implementation on a 2.4GHz Intel Core i5 processor is achieved using a Virtex-6 XC6VLX240T FPGA on Xilinx's ML605 board with the implemented design operating at 200MHz clock frequency. (c) 2016 Wiley Periodicals, Inc. Microwave Opt Technol Lett 58:776-783, 2016
Resumo:
Real time monitoring allows the determination of the line state and the calculation of the actual rating value. The real time monitoring systems measure sag, conductor tension, conductor temperature or weather related magnitudes. In this paper, a new ampacity monitoring system for overhead lines, based on the conductor tension, the ambient temperature, the solar radiation and the current intensity, is presented. The measurements are transmitted via GPRS to a control center where a software program calculates the ampacity value. The system takes into account the creep deformation experienced by the conductors during their lifetime and calibrates the tension-temperature reference and the maximum allowable temperature in order to obtain the ampacity. The system includes both hardware implementation and remote control software.
Resumo:
Copyright © 2014 John Wiley & Sons, Ltd. Copyright © 2014 John Wiley & Sons, Ltd. Summary A field programmable gate array (FPGA) based model predictive controller for two phases of spacecraft rendezvous is presented. Linear time-varying prediction models are used to accommodate elliptical orbits, and a variable prediction horizon is used to facilitate finite time completion of the longer range manoeuvres, whilst a fixed and receding prediction horizon is used for fine-grained tracking at close range. The resulting constrained optimisation problems are solved using a primal-dual interior point algorithm. The majority of the computational demand is in solving a system of simultaneous linear equations at each iteration of this algorithm. To accelerate these operations, a custom circuit is implemented, using a combination of Mathworks HDL Coder and Xilinx System Generator for DSP, and used as a peripheral to a MicroBlaze soft-core processor on the FPGA, on which the remainder of the system is implemented. Certain logic that can be hard-coded for fixed sized problems is implemented to be configurable online, in order to accommodate the varying problem sizes associated with the variable prediction horizon. The system is demonstrated in closed-loop by linking the FPGA with a simulation of the spacecraft dynamics running in Simulink on a PC, using Ethernet. Timing comparisons indicate that the custom implementation is substantially faster than pure embedded software-based interior point methods running on the same MicroBlaze and could be competitive with a pure custom hardware implementation.
Resumo:
采用单神经元自适应PID控制器对永磁同步电机进行了调速控制。详细介绍了PMSM(PermanentMagnetSynchronousMotor)的矢量控制原理。最后给出PMSM单神经元自适应PID控制的仿真结果和硬件实现方法。仿真结果表明,该系统具有良好的动态性能。
Resumo:
In the field of embedded systems design, coprocessors play an important role as a component to increase performance. Many embedded systems are built around a small General Purpose Processor (GPP). If the GPP cannot meet the performance requirements for a certain operation, a coprocessor can be included in the design. The GPP can then offload the computationally intensive operation to the coprocessor; thus increasing the performance of the overall system. A common application of coprocessors is the acceleration of cryptographic algorithms. The work presented in this thesis discusses coprocessor architectures for various cryptographic algorithms that are found in many cryptographic protocols. Their performance is then analysed on a Field Programmable Gate Array (FPGA) platform. Firstly, the acceleration of Elliptic Curve Cryptography (ECC) algorithms is investigated through the use of instruction set extension of a GPP. The performance of these algorithms in a full hardware implementation is then investigated, and an architecture for the acceleration the ECC based digital signature algorithm is developed. Hash functions are also an important component of a cryptographic system. The FPGA implementation of recent hash function designs from the SHA-3 competition are discussed and a fair comparison methodology for hash functions presented. Many cryptographic protocols involve the generation of random data, for keys or nonces. This requires a True Random Number Generator (TRNG) to be present in the system. Various TRNG designs are discussed and a secure implementation, including post-processing and failure detection, is introduced. Finally, a coprocessor for the acceleration of operations at the protocol level will be discussed, where, a novel aspect of the design is the secure method in which private-key data is handled
Resumo:
Hardware synthesis from dataflow graphs of signal processing systems is a growing research area as focus shifts to high level design methodologies. For data intensive systems, dataflow based synthesis can lead to an inefficient usage of memory due to the restrictive nature of synchronous dataflow and its inability to easily model data reuse. This paper explores how dataflow graph changes can be used to drive both the on-chip and off-chip memory organisation and how these memory architectures can be mapped to a hardware implementation. By exploiting the data reuse inherent to many image processing algorithms and by creating memory hierarchies, off-chip memory bandwidth can be reduced by a factor of a thousand from the original dataflow graph level specification of a motion estimation algorithm, with a minimal increase in memory size. This analysis is verified using results gathered from implementation of the motion estimation algorithm on a Xilinx Virtex-4 FPGA, where the delay between the memories and processing elements drops from 14.2 ns down to 1.878 ns through the refinement of the memory architecture. Care must be taken when modeling these algorithms however, as inefficiencies in these models can be easily translated into overuse of hardware resources.
Resumo:
A fully homomorphic encryption (FHE) scheme is envisioned as a key cryptographic tool in building a secure and reliable cloud computing environment, as it allows arbitrary evaluation of a ciphertext without revealing the plaintext. However, existing FHE implementations remain impractical due to very high time and resource costs. To the authors’ knowledge, this paper presents the first hardware implementation of a full encryption primitive for FHE over the integers using FPGA technology. A large-integer multiplier architecture utilising Integer-FFT multiplication is proposed, and a large-integer Barrett modular reduction module is designed incorporating the proposed multiplier. The encryption primitive used in the integer-based FHE scheme is designed employing the proposed multiplier and modular reduction modules. The designs are verified using the Xilinx Virtex-7 FPGA platform. Experimental results show that a speed improvement factor of up to 44 is achievable for the hardware implementation of the FHE encryption scheme when compared to its corresponding software implementation. Moreover, performance analysis shows further speed improvements of the integer-based FHE encryption primitives may still be possible, for example through further optimisations or by targeting an ASIC platform.