782 resultados para FPGA, VHDL, Picoblaze, SERDES
Resumo:
This paper deals with the key issues encountered in testing during the development of high-speed networking hardware systems by documenting a practical method for "real-life like" testing. The proposed method is empowered by modern and commonly available Field Programmable Gate Array (FPGA) technology. Innovative application of standard FPGA blocks in combination with reconfigurability are used as a back-bone of the method. A detailed elaboration of the method is given so as to serve as a general reference. The method is fully characterised and compared to alternatives through a case study proving it to be the most efficient and effective one at a reasonable cost.
Resumo:
Space applications demand the need for building reliable systems. Autonomic computing defines such reliable systems as self-managing systems. The work reported in this paper combines agent based and swarm robotic approaches leading to swarm-array computing, a novel technique to achieve autonomy for distributed parallel computing systems. Two swarm-array computing approaches based on swarms of computational resources and swarms of tasks are explored. FPGA is considered as the computing system. The feasibility of the two proposed approaches that binds the computing system and the task together is simulated on the SeSAm multi-agent simulator.
A low clock frequency FFT core implementation for multiband full-rate ultra-wideband (UWB) receivers
Resumo:
This paper discusses the design, implementation and synthesis of an FFT module that has been specifically optimized for use in the OFDM based Multiband UWB system, although the work is generally applicable to many other OFDM based receiver systems. Previous work has detailed the requirements for the receiver FFT module within the Multiband UWB ODFM based system and this paper draws on those requirements coupled with modern digital architecture principles and low power design criteria to converge on our optimized solution particularly aimed at a low-clock rate implementation. The FFT design obtained in this paper is also applicable for implementation of the transmitter IFFT module therefore only needing one FFT module in the device for half-duplex operation. The results from this paper enable the baseband designers of the 200Mbit/sec variant of Multiband UWB systems (and indeed other OFDM based receivers) using System-on-Chip (SoC), FPGA and ASIC technology to create cost effective and low power consumer electronics product solutions biased toward the very competitive market.
Resumo:
This paper discusses the requirements on the numerical precision for a practical Multiband Ultra-Wideband (UWB) consumer electronic solution. To this end we first present the possibilities that UWB has to offer to the consumer electronics market and the possible range of devices. We then show the performance of a model of the UWB baseband system implemented using floating point precision. Then, by simulation we find the minimal numerical precision required to maintain floating-point performance for each of the specific data types and signals present in the UWB baseband. Finally, we present a full description of the numerical requirements for both the transmit and receive components of the UWB baseband. The numerical precision results obtained in this paper can then be used by baseband designers to implement cost effective UWB systems using System-on-Chip (SoC), FPGA and ASIC technology solutions biased toward the competitive consumer electronics market(1).
Resumo:
The general packet radio service (GPRS) has been developed to allow packet data to be transported efficiently over an existing circuit-switched radio network, such as GSM. The main application of GPRS are in transporting Internet protocol (IP) datagrams from web servers (for telemetry or for mobile Internet browsers). Four GPRS baseband coding schemes are defined to offer a trade-off in requested data rates versus propagation channel conditions. However, data rates in the order of > 100 kbits/s are only achievable if the simplest coding scheme is used (CS-4) which offers little error detection and correction (EDC) (requiring excellent SNR) and the receiver hardware is capable of full duplex which is not currently available in the consumer market. A simple EDC scheme to improve the GPRS block error rate (BLER) performance is presented, particularly for CS-4, however gains in other coding schemes are seen. For every GPRS radio block that is corrected by the EDC scheme, the block does not need to be retransmitted releasing bandwidth in the channel and improving the user's application data rate. As GPRS requires intensive processing in the baseband, a viable field programmable gate array (FPGA) solution is presented in this paper.
Resumo:
The General Packet Radio Service (GPRS) was developed to allow packet data to be transported efficiently over an existing circuit switched radio network. The main applications for GPRS are in transporting IP datagram’s from the user’s mobile Internet browser to and from the Internet, or in telemetry equipment. A simple Error Detection and Correction (EDC) scheme to improve the GPRS Block Error Rate (BLER) performance is presented, particularly for coding scheme 4 (CS-4), however gains in other coding schemes are seen. For every GPRS radio block that is corrected by the EDC scheme, the block does not need to be retransmitted releasing bandwidth in the channel, improving throughput and the user’s application data rate. As GPRS requires intensive processing in the baseband, a viable hardware solution for a GPRS BLER co-processor is discussed that has been currently implemented in a Field Programmable Gate Array (FPGA) and presented in this paper.
Resumo:
How can a bridge be built between autonomic computing approaches and parallel computing systems? The work reported in this paper is motivated towards bridging this gap by proposing a swarm-array computing approach based on ‘Intelligent Agents’ to achieve autonomy for distributed parallel computing systems. In the proposed approach, a task to be executed on parallel computing cores is carried onto a computing core by carrier agents that can seamlessly transfer between processing cores in the event of a predicted failure. The cognitive capabilities of the carrier agents on a parallel processing core serves in achieving the self-ware objectives of autonomic computing, hence applying autonomic computing concepts for the benefit of parallel computing systems. The feasibility of the proposed approach is validated by simulation studies using a multi-agent simulator on an FPGA (Field-Programmable Gate Array) and experimental studies using MPI (Message Passing Interface) on a computer cluster. Preliminary results confirm that applying autonomic computing principles to parallel computing systems is beneficial.
Resumo:
Recent research in multi-agent systems incorporate fault tolerance concepts, but does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely 'Intelligent Agents'. A task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator, and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.
Resumo:
Recent research in multi-agent systems incorporate fault tolerance concepts. However, the research does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely ‘Intelligent Agents’. In the approach considered a task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The agents hence contribute towards fault tolerance and towards building reliable systems. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.
Resumo:
A reconfigurable scalar quantiser capable of accepting n-bit input data is presented. The data length n can be varied in the range 1... N-1 under partial-run time reconfiguration, p-RTR. Issues as improvement in throughput using this reconfigurable quantiser of p-RTR against RTR for data of variable length are considered. The quantiser design referred to as the priority quantiser PQ is then compared against a direct design of the quantiser DIQ. It is then evaluated that for practical quantiser sizes, PQ shows better area usage when both are targeted onto the same FPGA. Other benefits are also identified.
Resumo:
This paper presents a simple clocking technique to migrate classical synchronous pipelined designs to a synchronous functional-equivalent alternative system in the context of FPGAs. When the new pipelined design runs at the same throughput of the original design, around 30% better mW/MHz ratio was observed in Virtex-based FPGA circuits. The evaluation is done using a simple but representative and practical systolic design as an example. The technique in essence is a simple replacement of the clocking mechanism for the pipe-storage elements; however no extra design effort is needed. The results show that the proposed technique allows immediate power and area-time savings of existing designs rather than exploring potential benefits by a new logic design to the problem using the classic pipeline clocking mechanism.
Resumo:
The real-time parallel computation of histograms using an array of pipelined cells is proposed and prototyped in this paper with application to consumer imaging products. The array operates in two modes: histogram computation and histogram reading. The proposed parallel computation method does not use any memory blocks. The resulting histogram bins can be stored into an external memory block in a pipelined fashion for subsequent reading or streaming of the results. The array of cells can be tuned to accommodate the required data path width in a VLSI image processing engine as present in many imaging consumer devices. Synthesis of the architectures presented in this paper in FPGA are shown to compute the real-time histogram of images streamed at over 36 megapixels at 30 frames/s by processing in parallel 1, 2 or 4 pixels per clock cycle.
Resumo:
The core processing step of the noise reduction median filter technique is to find the median within a window of integers. A four-step procedure method to compute the running median of the last N W-bit stream of integers showing area and time benefits is proposed. The method slices integers into groups of B-bit using a pipeline of W/B blocks. From the method, an architecture is developed giving a designer the flexibility to exchange area gains for faster frequency of operation, or vice versa, by adjusting N, W and B parameter values. Gains in area of around 40%, or in frequency of operation of around 20%, are clearly observed by FPGA circuit implementations compared to latest methods in the literature.
Resumo:
Localization and Mapping are two of the most important capabilities for autonomous mobile robots and have been receiving considerable attention from the scientific computing community over the last 10 years. One of the most efficient methods to address these problems is based on the use of the Extended Kalman Filter (EKF). The EKF simultaneously estimates a model of the environment (map) and the position of the robot based on odometric and exteroceptive sensor information. As this algorithm demands a considerable amount of computation, it is usually executed on high end PCs coupled to the robot. In this work we present an FPGA-based architecture for the EKF algorithm that is capable of processing two-dimensional maps containing up to 1.8 k features at real time (14 Hz), a three-fold improvement over a Pentium M 1.6 GHz, and a 13-fold improvement over an ARM920T 200 MHz. The proposed architecture also consumes only 1.3% of the Pentium and 12.3% of the ARM energy per feature.
Resumo:
This paper proposes a parallel hardware architecture for image feature detection based on the Scale Invariant Feature Transform algorithm and applied to the Simultaneous Localization And Mapping problem. The work also proposes specific hardware optimizations considered fundamental to embed such a robotic control system on-a-chip. The proposed architecture is completely stand-alone; it reads the input data directly from a CMOS image sensor and provides the results via a field-programmable gate array coupled to an embedded processor. The results may either be used directly in an on-chip application or accessed through an Ethernet connection. The system is able to detect features up to 30 frames per second (320 x 240 pixels) and has accuracy similar to a PC-based implementation. The achieved system performance is at least one order of magnitude better than a PC-based solution, a result achieved by investigating the impact of several hardware-orientated optimizations oil performance, area and accuracy.