189 resultados para FFT


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we explore an implementation of a high-throughput, streaming application on REDEFINE-v2, which is an enhancement of REDEFINE. REDEFINE is a polymorphic ASIC combining the flexibility of a programmable solution with the execution speed of an ASIC. In REDEFINE Compute Elements are arranged in an 8x8 grid connected via a Network on Chip (NoC) called RECONNECT, to realize the various macrofunctional blocks of an equivalent ASIC. For a 1024-FFT we carry out an application-architecture design space exploration by examining the various characterizations of Compute Elements in terms of the size of the instruction store. We further study the impact by using application specific, vectorized FUs. By setting up different partitions of the FFT algorithm for persistent execution on REDEFINE-v2, we derive the benefits of setting up pipelined execution for higher performance. The impact of the REDEFINE-v2 micro-architecture for any arbitrary N-point FFT (N > 4096) FFT is also analyzed. We report the various algorithm-architecture tradeoffs in terms of area and execution speed with that of an ASIC implementation. In addition we compare the performance gain with respect to a GPP.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a Radix-4(3) based FFT architecture suitable for OFDM based WLAN applications. The radix-4(3) parallel unrolled architecture presented here, uses a radix-4 butterfly unit which takes all four inputs in parallel and can selectively produce one out of the four outputs. A 64 point FFT processor based on the proposed architecture has been implemented in UMC 130nm 1P8M CMOS process with a maximum clock frequency of 100 MHz and area of 0.83mm(2). The proposed processor provides a throughput of four times the clock rate and can finish one 64 point FFT computation in 16 clock cycles. For IEEE 802.11a/g WLAN, the processor needs to be operated at a clock rate of 5 MHz with a power consumption of 2.27 mW which is 27% less than the previously reported low power implementations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we propose a fully parallel 64K point radix-4(4) FFT processor. The radix-4(4) parallel unrolled architecture uses a novel radix-4 butterfly unit which takes all four inputs in parallel and can selectively produce one out of the four outputs. The radix-4(4) block can take all 256 inputs in parallel and can use the select control signals to generate one out of the 256 outputs. The resultant 64K point FFT processor shows significant reduction in intermediate memory but with increased hardware complexity. Compared to the state-of-art implementation 5], our architecture shows reduced latency with comparable throughput and area. The 64K point FFT architecture was synthesized using a 130nm CMOS technology which resulted in a throughput of 1.4 GSPS and latency of 47.7 mu s with a maximum clock frequency of 350MHz. When compared to 5], the latency is reduced by 303 mu s with 50.8% reduction in area.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An FFT-based two-step phase-shifting (TPS) algorithm is described in detail and implemented by use of experimental interferograms. This algorithm has been proposed to solve the TPS problem with random phase shift except pi. By comparison with the visibility-function-based TPS algorithm, it proves that the FFT-based algorithm has obvious advantages in phase extracting. Meanwhile, we present a pi-phase-shift supplement to the TPS algorithm, which combines the two interferograms and demodulates the phase map by locating the extrema of the combined fringes after removing the respective backgrounds. So combining this method and FFT-based one, one could really implement the TPS with random phase shift. Whereafter, we systematically compare the TPS with single-interferogram analysis algorithm and conventional three-step phase-shifting one. The results demonstrate that the FFT-based TPS algorithm has a satisfactory accuracy. At last, based on the polarizing interferometry, a schematic setup of two-channel TPS interferometer with random phase shift is suggested to implement the simultaneous collection of interferograms. (c) 2007 Elsevier GrnbH. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

在研究快速傅里叶变换(FFT)算法的基础上,根据FPGA性能高、灵活性强、速度快的特点,提出了高效的基4-FFT处理器的实现方法。数据存储采用分块存储的方法,大大提高了存取速度。数据寻址采用新型的地址产生方法,可并行产生所需数据地址。同时,在蝶形单元的设计中很好的将并行运算技术和流水线技术相结合了起来,又进一步提高了运算速度。测试结果表明,时钟在50MHz时完成1024点FFT的时间为25.6μs,满足了应用实时性的要求。

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Methods are presented for developing synthesizable FFT cores. These are based on a modular approach in which parameterized commutator and processor blocks are cascaded to implement the computations required in many important FFT signal flow graphs. In addition, it is shown how the use of a digital serial data organization can be used to produce systems that offer 100% processor utilization along with reductions in storage requirements. The approach has been used to create generators for the automated synthesis of FFT cores that are portable across a broad range of silicon technologies. Resulting chip designs are competitive with ones created using manual methods but with significant reductions in design times.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Methods are presented for developing synthesizable FFT cores. These are based on a modular approach in which parameterizable blocks are cascaded to implement the computations required across a range of typical FFT signal flow graphs. The underlying architectural approach combines the use of a digital serial data organization with generic commutator blocks to produce systems that offer 100% processor utilization with storage requirements less than previous designs. The approach has been used to create generators for the automated synthesis of FFT cores that are portable across a broad range of silicon technologies. Resulting chip designs are competitive with manual methods but with significant reductions in design times.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes how worst-case error analysis can be applied to solve some of the practical issues in the development and implementation of a low power, high performance radix-4 FFT chip for digital video applications. The chip has been fabricated using a 0.6 µm CMOS technology and can perform a 64 point complex forward or inverse FFT on real-time video at up to 18 Megasamples per second. It comprises 0.5 million transistors in a die area of 7.8×8 mm and dissipates 1 W, leading to a cost-effective silicon solution for high quality video processing applications. The analysis focuses on the effect that different radix-4 architectural configurations and finite wordlengths has on the FFT output dynamic range. These issues are addressed using both mathematical error models and through extensive simulation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Details of a new low power FFT processor for use in digital television applications are presented. This has been fabricated using a 0.6 µm CMOS technology and can perform a 64 point complex forward or inverse FFT on real-rime video at up to 18 Megasamples per second. It comprises 0.5 million transistors in a die area of 7.8×8 mm and dissipates 1 W. Its performance, in terms of computational rate per area per watt, is significantly higher than previously reported devices, leading to a cost-effective silicon solution for high quality video processing applications. This is the result of using a novel VLSI architecture which has been derived from a first principles factorisation of the DFT matrix and tailored to a direct silicon implementation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, a low complexity system for spectral analysis of heart rate variability (HRV) is presented. The main idea of the proposed approach is the implementation of the Fast-Lomb periodogram that is a ubiquitous tool in spectral analysis, using a wavelet based Fast Fourier transform. Interestingly we show that the proposed approach enables the classification of processed data into more and less significant based on their contribution to output quality. Based on such a classification a percentage of less-significant data is being pruned leading to a significant reduction of algorithmic complexity with minimal quality degradation. Indeed, our results indicate that the proposed system can achieve up-to 45% reduction in number of computations with only 4.9% average error in the output quality compared to a conventional FFT based HRV system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Methods are presented for developing synthesisable FFT cores. These are based on a modular approach in which parameterisable blocks are cascaded to implement the computations required across a range of typical FFT signal flow graphs. The underlying architectural approach combines the use of a digital serial data organisation with generic commutator blocks to produce systems that offer 100% processor utilisation with storage requirements less than previous designs. The approach has been used to create generators for the automated synthesis of FFT cores that are portable across a broad range of silicon technologies. Resulting chip designs are competitive with manual methods but with significant reductions in design times.