947 resultados para FPGA, Elettronica digitale, Sintesi logica


Relevância:

10.00% 10.00%

Publicador:

Resumo:

A full hardware implementation of a Weighted Fair Queuing (WFQ) packet scheduler is proposed. The circuit architecture presented has been implemented using Altera Stratix II FPGA technology, utilizing RLDII and QDRII memory components. The circuit can provide fine granularity Quality of Service (QoS) support at a line throughput rate of 12.8Gb/s in its current implementation. The authors suggest that, due to the flexible and scalable modular circuit design approach used, the current circuit architecture can be targeted for a full ASIC implementation to deliver 50 Gb/s throughput. The circuit itself comprises three main components; a WFQ algorithm computation circuit, a tag/time-stamp sort and retrieval circuit, and a high throughput shared buffer. The circuit targets the support of emerging wireline and wireless network nodes that focus on Service Level Agreements (SLA's) and Quality of Experience.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Per-core scratchpad memories (or local stores) allow direct inter-core communication, with latency and energy advantages over coherent cache-based communication, especially as CMP architectures become more distributed. We have designed cache-integrated network interfaces, appropriate for scalable multicores, that combine the best of two worlds – the flexibility of caches and the efficiency of scratchpad memories: on-chip SRAM is configurably shared among caching, scratchpad, and virtualized network interface (NI) functions. This paper presents our architecture, which provides local and remote scratchpad access, to either individual words or multiword blocks through RDMA copy. Furthermore, we introduce event responses, as a technique that enables software configurable communication and synchronization primitives. We present three event response mechanisms that expose NI functionality to software, for multiword transfer initiation, completion notifications for software selected sets of arbitrary size transfers, and multi-party synchronization queues. We implemented these mechanisms in a four-core FPGA prototype, and measure the logic overhead over a cache-only design for basic NI functionality to be less than 20%. We also evaluate the on-chip communication performance on the prototype, as well as the performance of synchronization functions with simulation of CMPs with up to 128 cores. We demonstrate efficient synchronization, low-overhead communication, and amortized-overhead bulk transfers, which allow parallelization gains for fine-grain tasks, and efficient exploitation of the hardware bandwidth.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This implementation of a two-dimensional discrete cosine transform demonstrates the development of a suitable architectural style for a specific technology-in this case, the Xilinx XC6200 FPGA series. The design exploits distributed arithmetic, parallelism, and pipelining to achieve a high-performance custom-computing implementation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, a hardware solution for packet classification based on multi-fields is presented. The proposed scheme focuses on a new architecture based on the decomposition method. A hash circuit is used in order to reduce the memory space required for the Recursive Flow Classification (RFC) algorithm. The implementation results show that the proposed architecture achieves significant performance advantage that is comparable to that of some well-known algorithms. The solution is based on Altera Stratix III FPGA technology.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we present a methodology for implementing a complete Digital Signal Processing (DSP) system onto a heterogeneous network including Field Programmable Gate Arrays (FPGAs) automatically. The methodology aims to allow design refinement and real time verification at the system level. The DSP application is constructed in the form of a Data Flow Graph (DFG) which provides an entry point to the methodology. The netlist for parts that are mapped onto the FPGA(s) together with the corresponding software and hardware Application Protocol Interface (API) are also generated. Using a set of case studies, we demonstrate that the design and development time can be significantly reduced using the methodology developed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Massively parallel networks of highly efficient, high performance Single Instruction Multiple Data (SIMD) processors have been shown to enable FPGA-based implementation of real-time signal processing applications with performance and
cost comparable to dedicated hardware architectures. This is achieved by exploiting simple datapath units with deep processing pipelines. However, these architectures are highly susceptible to pipeline bubbles resulting from data and control hazards; the only way to mitigate against these is manual interleaving of
application tasks on each datapath, since no suitable automated interleaving approach exists. In this paper we describe a new automated integrated mapping/scheduling approach to map algorithm tasks to processors and a new low-complexity list scheduling technique to generate the interleaved schedules. When applied to a spatial Fixed-Complexity Sphere Decoding (FSD) detector
for next-generation Multiple-Input Multiple-Output (MIMO) systems, the resulting schedules achieve real-time performance for IEEE 802.11n systems on a network of 16-way SIMD processors on FPGA, enable better performance/complexity balance than current approaches and produce results comparable to handcrafted implementations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper investigates the implementation of a number of circuits used to perform a high speed closest value match lookup. The design is targeted particularly for use in a search trie, as used in various networking lookup applications, but can be applied to many other areas where such a match is required. A range of different designs have been considered and implemented on FPGA. A detailed description of the architectures investigated is followed by an analysis of the synthesis results. © 2006 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A methodology which allows a non-specialist to rapidly design silicon wavelet transform cores has been developed. This methodology is based on a generic architecture utilizing time-interleaved coefficients for the wavelet transform filters. The architecture is scaleable and it has been parameterized in terms of wavelet family, wavelet type, data word length and coefficient word length. The control circuit is designed in such a way that the cores can also be cascaded without any interface glue logic for any desired level of decomposition. This parameterization allows the use of any orthonormal wavelet family thereby extending the design space for improved transformation from algorithm to silicon. Case studies for stand alone and cascaded silicon cores for single and multi-stage analysis respectively are reported. The typical design time to produce silicon layout of a wavelet based system has been reduced by an order of magnitude. The cores are comparable in area and performance to hand-crafted designs. The designs have been captured in VHDL so they are portable across a range of foundries and are also applicable to FPGA and PLD implementations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The design of a System-on-a-Chip (SoC) demonstrator for a baseline JPEG encoder core is presented. This combines a highly optimized Discrete Cosine Transform (DCT) and quantization unit with an entropy coder which has been realized using off-the-shelf synthesizable IP cores (Run-length coder, Huffman coder and data packer). When synthesized in a 0.35 µm CMOS process, the core can operate at speeds up to 100 MHz and contains 50 k gates plus 11.5 kbits of RAM. This is approximately 20% less than similar JPEG encoder designs reported in literature. When targeted at FPGA the core can operate up to 30 MHz and is capable of compressing 9-bit full-frame color input data at NTSC or PAL rates.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A novel hardware architecture for elliptic curve cryptography (ECC) over GF(p) is introduced. This can perform the main prime field arithmetic functions needed in these cryptosystems including modular inversion and multiplication. This is based on a new unified modular inversion algorithm that offers considerable improvement over previous ECC techniques that use Fermat's Little Theorem for this operation. The processor described uses a full-word multiplier which requires much fewer clock cycles than previous methods, while still maintaining a competitive critical path delay. The benefits of the approach have been demonstrated by utilizing these techniques to create a field-programmable gate array (FPGA) design. This can perform a 256-bit prime field scalar point multiplication in 3.86 ms, the fastest FPGA time reported to date. The ECC architecture described can also perform four different types of modular inversion, making it suitable for use in many different ECC applications. © 2006 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The inclusion of the Discrete Wavelet Transform in the JPEG-2000 standard has added impetus to the research of hardware architectures for the two-dimensional wavelet transform. In this paper, a VLSI architecture for performing the symmetrically extended two-dimensional transform is presented. This architecture conforms to the JPEG-2000 standard and is capable of near-optimal performance when dealing with the image boundaries. The architecture also achieves efficient processor utilization. Implementation results based on a Xilinx Virtex-2 FPGA device are included.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Security devices are vulnerable to Differential Power Analysis (DPA) that reveals the key by monitoring the power consumption of the circuits. In this paper, we present the first DPA attack against an FPGA implementation of the Camellia encryption algorithm with all key sizes and evaluate the DPA resistance of the algorithm. The Camellia cryptographic algorithm involves several different key-dependent intermediate operations including S-Box operations. In previous research, it was believed that the Camellia is stronger than AES due to the additional Whitening phase protecting the S-Box operation. However, we propose an attack that bypasses the Whitening phase and targets the S-Box. In this paper, we also discuss a lowcost countermeasure strategy to protect the Pre-whitening / Post-whitening and FL function of Camellia using Dual-rail Precharged Logic and to protect against attacks of the S-Box using Random Delay Insertion. © 2009 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A methodology has been developed which allows a non-specialist to rapidly design silicon wavelet transform cores for a variety of specifications. The cores include both forward and inverse orthonormal wavelet transforms. This methodology is based on efficient, modular and scaleable architectures utilising time-interleaved coefficients for the wavelet transform filters. The cores are parameterized in terms of wavelet type and data and coefficient word lengths. The designs have been captured in VHDL and are hence portable across a range of silicon foundries as well as FPGA and PLD implementations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A new, single and unified Montgomery modular inverse algorithm, which performs both classical and Montgomery modular inversion, is proposed. This reduces the number of Montgomery multiplication operations required by 33% when compared with previous algorithms reported in the literature. The use of this in practice has been investigated by implementation of the improved unified algorithm and the previous algorithms on FPGA devices. The unified algorithm implementation shows a significant speed-up and a reduction in silicon area usage.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A rapid design methodology for biorthogonal wavelet transform cores has been developed. This methodology is based on a generic, scaleable architecture for the wavelet filters. The architecture offers efficient hardware utilization by combining the linear phase property of biorthogonal filters with decimation in a MAC based implementation. The design has been captured in VHDL and parameterized in terms of wavelet type, data word length and coefficient word length. The control circuit is embedded within the cores and allows them to be cascaded without any interface glue logic for any desired level of decomposition. The design time to produce silicon layout of a biorthogonal wavelet based system is typically less than a day. The resulting silicon cores produced are comparable in area and performance to hand-crafted designs. The designs are portable across a range of foundries and are also applicable to FPGA and PLD implementations.