139 resultados para FPGA, Spartan-3E


Relevância:

10.00% 10.00%

Publicador:

Resumo:

In DSP applications such as fixed transforms and filtering, the full flexibility of a general-purpose multiplier is not required and only a limited range of values is needed on one of the multiplier inputs. A new design technique has been developed for deriving multipliers that operate on a limited range of multiplicands. This can be used to produce FPGA implementations of DSP systems where area is dramatically improved. The paper describes the technique and its application to the design of a poly-phase filter on a Virtex FPGA. A 62% area reduction and 7% speed increase is gained when compared to an equivalent design using general purpose multipliers. It is also compared favourably to other known fixed coefficient approaches.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a matrix inversion architecture based on the novel Modified Squared Givens Rotations (MSGR) algorithm, which extends the original SGR method to complex valued data, and also corrects erroneous results in the original SGR method when zeros occur on the diagonal of the matrix either initially or during processing. The MSGR algorithm also avoids complex dividers in the matrix inversion, thus minimising the complexity of potential real-time implementations. A systolic array architecture is implemented and FPGA synthesis results indicate a high-throughput low-latency complex matrix inversion solution. © 2008 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

SoC systems are now being increasingly constructed using a hierarchy of subsystems or silicon Intellectual Property (IP) cores. The key challenge is to use these cores in a highly efficient manner which can be difficult as the internal core structure may not be known. A design methodology based on synthesizing hierarchical circuit descriptions is presented. The paper employs the MARS synthesis scheduling algorithm within the existing IRIS synthesis flow and details how it can be enhanced to allow for design exploration of IP cores. It is shown that by accessing parameterised expressions for the datapath latencies in the cores, highly efficient FPGA solutions can be achieved. Hardware sharing at both the hierarchical and flattened levels is explored for a normalized lattice filter and results are presented.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A rapid design methodology for biorthogonal wavelet transform cores has been developed based on a generic, scaleable architecture for wavelet filters. The architecture offers efficient hardware utilisation by combining the linear phase property of biorthogonal filters with decimation in a MAC-based implementation. The design has been captured in VHDL and parameterised in terms of wavelet type, data word length and coefficient word length. The control circuit is embedded within the cores and allows them to be cascaded without any interface glue logic for any desired level of decomposition. The design time to produce silicon layout of a biorthogonal wavelet system is typically less than a day. The silicon cores produced are comparable in area and performance to hand-crafted designs, The designs are portable across a range of foundries and are also applicable to FPGA and PLD implementations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A new configurable architecture is presented that offers multiple levels of video playback by accommodating variable levels of network utilization and bandwidth. By utilizing scalable MPEG-4 encoding at the network edge and using specific video delivery protocols, media streaming components are merged to fully optimize video playback for IPv6 networks, thus improving QoS. This is achieved by introducing “programmable network functionality” (PNF) which splits layered video transmission and distributes it evenly over available bandwidth, reducing packet loss and delay caused by out-of-profile DiffServ classes. An FPGA design is given which gives improved performance, e.g. link utilization, end-to-end delay, and that during congestion, improves on-time delivery of video frames by up to 80% when compared to current “static” DiffServ.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A key issue in the design of next generation Internet routers and switches will be provision of traffic manager (TM) functionality in the datapaths of their high speed switching fabrics. A new architecture that allows dynamic deployment of different TM functions is presented. By considering the processing requirements of operations such as policing and congestion, queuing, shaping and scheduling, a solution has been derived that is scalable with a consistent programmable interface. Programmability is achieved using a function computation unit which determines the action (e.g. drop, queue, remark, forward) based on the packet attribute information and a memory storage part. Results of a Xilinx Virtex-5 FPGA reference design are presented.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A new domain-specific, reconfigurable system-on-a-chip (SoC) architecture is proposed for video motion estimation. This has been designed to cover most of the common block-based video coding standards, including MPEG-2, MPEG-4, H.264, WMV-9 and AVS. The architecture exhibits simple control, high throughput and relatively low hardware cost when compared with existing circuits. It can also easily handle flexible search ranges without any increase in silicon area and can be configured prior to the start of the motion estimation process for a specific standard. The computational rates achieved make the circuit suitable for high-end video processing applications, such as HDTV. Silicon design studies indicate that circuits based on this approach incur only a relatively small penalty in terms of power dissipation and silicon area when compared with implementations for specific standards. Indeed, the cost/performance achieved exceeds that of existing but specific solutions and greatly exceeds that of general purpose field programmable gate array (FPGA) designs.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Hardware synthesis from dataflow graphs of signal processing systems is a growing research area as focus shifts to high level design methodologies. For data intensive systems, dataflow based synthesis can lead to an inefficient usage of memory due to the restrictive nature of synchronous dataflow and its inability to easily model data reuse. This paper explores how dataflow graph changes can be used to drive both the on-chip and off-chip memory organisation and how these memory architectures can be mapped to a hardware implementation. By exploiting the data reuse inherent to many image processing algorithms and by creating memory hierarchies, off-chip memory bandwidth can be reduced by a factor of a thousand from the original dataflow graph level specification of a motion estimation algorithm, with a minimal increase in memory size. This analysis is verified using results gathered from implementation of the motion estimation algorithm on a Xilinx Virtex-4 FPGA, where the delay between the memories and processing elements drops from 14.2 ns down to 1.878 ns through the refinement of the memory architecture. Care must be taken when modeling these algorithms however, as inefficiencies in these models can be easily translated into overuse of hardware resources.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Dynamic power consumption is very dependent on interconnect, so clever mapping of digital signal processing algorithms to parallelised realisations with data locality is vital. This is a particular problem for fast algorithm implementations where typically, designers will have sacrificed circuit structure for efficiency in software implementation. This study outlines an approach for reducing the dynamic power consumption of a class of fast algorithms by minimising the index space separation; this allows the generation of field programmable gate array (FPGA) implementations with reduced power consumption. It is shown how a 50% reduction in relative index space separation results in a measured power gain of 36 and 37% over a Cooley-Tukey Fast Fourier Transform (FFT)-based solution for both actual power measurements for a Xilinx Virtex-II FPGA implementation and circuit measurements for a Xilinx Virtex-5 implementation. The authors show the generality of the approach by applying it to a number of other fast algorithms namely the discrete cosine, the discrete Hartley and the Walsh-Hadamard transforms.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Side-channel attacks (SCA) threaten electronic cryptographic devices and can be carried out by monitoring the physical characteristics of security circuits. Differential Power Analysis (DPA) is one the most widely studied side-channel attacks. Numerous countermeasure techniques, such as Random Delay Insertion (RDI), have been proposed to reduce the risk of DPA attacks against cryptographic devices. The RDI technique was first proposed for microprocessors but it was shown to be unsuccessful when implemented on smartcards as it was vulnerable to a variant of the DPA attack known as the Sliding-Window DPA attack.Previous research by the authors investigated the use of the RDI countermeasure for Field Programmable Gate Array (FPGA) based cryptographic devices. A split-RDI technique wasproposed to improve the security of the RDI countermeasure. A set of critical parameters wasalso proposed that could be utilized in the design stage to optimize a security algorithm designwith RDI in terms of area, speed and power. The authors also showed that RDI is an efficientcountermeasure technique on FPGA in comparison to other countermeasures.In this article, a new RDI logic design is proposed that can be used to cost-efficiently implementRDI on FPGA devices. Sliding-Window DPA and realignment attacks, which were shown to beeffective against RDI implemented on smartcard devices, are performed on the improved RDIFPGA implementation. We demonstrate that these attacks are unsuccessful and we also proposea realignment technique that can be used to demonstrate the weakness of RDI implementations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A queue manager (QM) is a core traffic management (TM) function used to provide per-flow queuing in access andmetro networks; however current designs have limited scalability. An on-demand QM (OD-QM) which is part of a new modular field-programmable gate-array (FPGA)-based TM is presented that dynamically maps active flows to the available physical resources; its scalability is derived from exploiting the observation that there are only a few hundred active flows in a high speed network. Simulations with real traffic show that it is a scalable, cost-effective approach that enhances per-flow queuing performance, thereby allowing per-flow QM without the need for extra external memory at speeds up to 10 Gbps. It utilizes 2.3%–16.3% of a Xilinx XC5VSX50t FPGA and works at 111 MHz.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A full hardware implementation of a Weighted Fair Queuing (WFQ) packet scheduler is proposed. The circuit architecture presented has been implemented using Altera Stratix II FPGA technology, utilizing RLDII and QDRII memory components. The circuit can provide fine granularity Quality of Service (QoS) support at a line throughput rate of 12.8Gb/s in its current implementation. The authors suggest that, due to the flexible and scalable modular circuit design approach used, the current circuit architecture can be targeted for a full ASIC implementation to deliver 50 Gb/s throughput. The circuit itself comprises three main components; a WFQ algorithm computation circuit, a tag/time-stamp sort and retrieval circuit, and a high throughput shared buffer. The circuit targets the support of emerging wireline and wireless network nodes that focus on Service Level Agreements (SLA's) and Quality of Experience.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Per-core scratchpad memories (or local stores) allow direct inter-core communication, with latency and energy advantages over coherent cache-based communication, especially as CMP architectures become more distributed. We have designed cache-integrated network interfaces, appropriate for scalable multicores, that combine the best of two worlds – the flexibility of caches and the efficiency of scratchpad memories: on-chip SRAM is configurably shared among caching, scratchpad, and virtualized network interface (NI) functions. This paper presents our architecture, which provides local and remote scratchpad access, to either individual words or multiword blocks through RDMA copy. Furthermore, we introduce event responses, as a technique that enables software configurable communication and synchronization primitives. We present three event response mechanisms that expose NI functionality to software, for multiword transfer initiation, completion notifications for software selected sets of arbitrary size transfers, and multi-party synchronization queues. We implemented these mechanisms in a four-core FPGA prototype, and measure the logic overhead over a cache-only design for basic NI functionality to be less than 20%. We also evaluate the on-chip communication performance on the prototype, as well as the performance of synchronization functions with simulation of CMPs with up to 128 cores. We demonstrate efficient synchronization, low-overhead communication, and amortized-overhead bulk transfers, which allow parallelization gains for fine-grain tasks, and efficient exploitation of the hardware bandwidth.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This implementation of a two-dimensional discrete cosine transform demonstrates the development of a suitable architectural style for a specific technology-in this case, the Xilinx XC6200 FPGA series. The design exploits distributed arithmetic, parallelism, and pipelining to achieve a high-performance custom-computing implementation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, a hardware solution for packet classification based on multi-fields is presented. The proposed scheme focuses on a new architecture based on the decomposition method. A hash circuit is used in order to reduce the memory space required for the Recursive Flow Classification (RFC) algorithm. The implementation results show that the proposed architecture achieves significant performance advantage that is comparable to that of some well-known algorithms. The solution is based on Altera Stratix III FPGA technology.