990 resultados para Array processing
Resumo:
Two major UK systolic array projects are described. The first concerns the development of a wavefront array processor for adaptive beamforming; the second concerns the design of bit-level systolic arrays for high-performance signal processing.
Resumo:
This paper describes the design and the architecture of a bit-level systolic array processor. The bit-level systolic array described is directly applicable to a wide range of image processing operations where high performance and throughput are essential. The architecture is illustrated by describing the operation of the correlator and convolver chips which are being developed. The advantage of the system is also discussed.
Resumo:
A bit level systolic array system is proposed for the Winograd Fourier transform algorithm. The design uses bit-serial arithmetic and, in common with other systolic arrays, features nearest-neighbor interconnections, regularity and high throughput. The short interconnections in this method contrast favorably with the long interconnections between butterflies required in the FFT. The structure is well suited to VLSI implementations. It is demonstrated how long transforms can be implemented with components designed to perform a short length transform. These components build into longer transforms preserving the regularity and structure of the short length transform design.
Resumo:
A scheduling method for implementing a generic linear QR array processor architecture is presented. This improves on previous work. It also considerably simplifies the derivation of schedules for a folded linear system, where detailed account has to be taken of processor cell latency. The architecture and scheduling derived provide the basis of a generator for the rapid design of System-on-a-Chip (SoC) cores for QR decomposition.
Resumo:
A new high performance, programmable image processing chip targeted at video and HDTV applications is described. This was initially developed for image small object recognition but has much broader functional application including 1D and 2D FIR filtering as well as neural network computation. The core of the circuit is made up of an array of twenty one multiplication-accumulation cells based on systolic architecture. Devices can be cascaded to increase the order of the filter both vertically and horizontally. The chip has been fabricated in a 0.6 µ, low power CMOS technology and operates on 10 bit input data at over 54 Megasamples per second. The introduction gives some background to the chip design and highlights that there are few other comparable devices. Section 2 gives a brief introduction to small object detection. The chip architecture and the chip design will be described in detail in the later sections.
Resumo:
The paper presents IPPro which is a high performance, scalable soft-core processor targeted for image processing applications. It has been based on the Xilinx DSP48E1 architecture using the ZYNQ Field Programmable Gate Array and is a scalar 16-bit RISC processor that operates at 526MHz, giving 526MIPS of performance. Each IPPro core uses 1 DSP48, 1 Block RAM and 330 Kintex-7 slice-registers, thus making the processor as compact as possible whilst maintaining flexibility and programmability. A key aspect of the approach is in reducing the application design time and implementation effort by using multiple IPPro processors in a SIMD mode. For different applications, this allows us to exploit different levels of parallelism and mapping for the specified processing architecture with the supported instruction set. In this context, a Traffic Sign Recognition (TSR) algorithm has been prototyped on a Zedboard with the colour and morphology operations accelerated using multiple IPPros. Simulation and experimental results demonstrate that the processing platform is able to achieve a speedup of 15 to 33 times for colour filtering and morphology operations respectively, with a reduced design effort and time.
Resumo:
We consider a multipair decode-and-forward relay channel, where multiple sources transmit simultaneously their signals to multiple destinations with the help of a full-duplex relay station. We assume that the relay station is equipped with massive arrays, while all sources and destinations have a single antenna. The relay station uses channel estimates obtained from received pilots and zero-forcing (ZF) or maximum-ratio combining/maximum-ratio transmission (MRC/MRT) to process the signals. To reduce significantly the loop interference effect, we propose two techniques: i) using a massive receive antenna array; or ii) using a massive transmit antenna array together with very low transmit power at the relay station. We derive an exact achievable rate in closed-form for MRC/MRT processing and an analytical approximation of the achievable rate for ZF processing. This approximation is very tight, especially for large number of relay station antennas. These closed-form expressions enable us to determine the regions where the full-duplex mode outperforms the half-duplex mode, as well as, to design an optimal power allocation scheme. This optimal power allocation scheme aims to maximize the energy efficiency for a given sum spectral efficiency and under peak power constraints at the relay station and sources. Numerical results verify the effectiveness of the optimal power allocation scheme. Furthermore, we show that, by doubling the number of transmit/receive antennas at the relay station, the transmit power of each source and of the relay station can be reduced by 1.5dB if the pilot power is equal to the signal power, and by 3dB if the pilot power is kept fixed, while maintaining a given quality-of-service.
Resumo:
The Field Programmable Gate Array (FPGA) implementation of the commonly used Histogram of Oriented Gradients (HOG) algorithm is explored. The HOG algorithm is employed to extract features for object detection. A key focus has been to explore the use of a new FPGA-based processor which has been targeted at image processing. The paper gives details of the mapping and scheduling factors that influence the performance and the stages that were undertaken to allow the algorithm to be deployed on FPGA hardware, whilst taking into account the specific IPPro architecture features. We show that multi-core IPPro performance can exceed that of against state-of-the-art FPGA designs by up to 3.2 times with reduced design and implementation effort and increased flexibility all on a low cost, Zynq programmable system.
Resumo:
Field programmable gate array devices boast abundant resources with which custom accelerator components for signal, image and data processing may be realised; however, realising high performance, low cost accelerators currently demands manual register transfer level design. Software-programmable ’soft’ processors have been proposed as a way to reduce this design burden but they are unable to support performance and cost comparable to custom circuits. This paper proposes a new soft processing approach for FPGA which promises to overcome this barrier. A high performance, fine-grained streaming processor, known as a Streaming Accelerator Element, is proposed which realises accelerators as large scale custom multicore networks. By adopting a streaming execution approach with advanced program control and memory addressing capabilities, typical program inefficiencies can be almost completely eliminated to enable performance and cost which are unprecedented amongst software-programmable solutions. When used to realise accelerators for fast fourier transform, motion estimation, matrix multiplication and sobel edge detection it is shown how the proposed architecture enables real-time performance and with performance and cost comparable with hand-crafted custom circuit accelerators and up to two orders of magnitude beyond existing soft processors.
Resumo:
Pre-processing (PP) of received symbol vector and channel matrices is an essential pre-requisite operation for Sphere Decoder (SD)-based detection of Multiple-Input Multiple-Output (MIMO) wireless systems. PP is a highly complex operation, but relative to the total SD workload it represents a relatively small fraction of the overall computational cost of detecting an OFDM MIMO frame in standards such as 802.11n. Despite this, real-time PP architectures are highly inefficient, dominating the resource cost of real-time SD architectures. This paper resolves this issue. By reorganising the ordering and QR decomposition sub operations of PP, we describe a Field Programmable Gate Array (FPGA)-based PP architecture for the Fixed Complexity Sphere Decoder (FSD) applied to 4 × 4 802.11n MIMO which reduces resource cost by 50% as compared to state-of-the-art solutions whilst maintaining real-time performance.
Resumo:
The increasing design complexity associated with modern Field Programmable Gate Array (FPGA) has prompted the emergence of 'soft'-programmable processors which attempt to replace at least part of the custom circuit design problem with a problem of programming parallel processors. Despite substantial advances in this technology, its performance and resource efficiency for computationally complex operations remains in doubt. In this paper we present the first recorded implementation of a softcore Fast-Fourier Transform (FFT) on Xilinx Virtex FPGA technology. By employing a streaming processing architecture, we show how it is possible to achieve architectures which offer 1.1 GSamples/s throughput and up to 19 times speed-up against the Xilinx Radix-2 FFT dedicated circuit with comparable cost.
Resumo:
O presente trabalho tem por objectivo o estudo de novos dispositivos fotónicos aplicados a sistemas de comunicações por fibra óptica e a sistemas de processamento de sinais RF. Os dispositivos apresentados baseiam-se em processamento de sinal linear e não linear. Dispositivos lineares ópticos tais como o interferómetro de Mach-Zehnder permitem adicionar sinais ópticos com pesos fixos ou sintonizáveis. Desta forma, este dispositivo pode ser usado respectivamente como um filtro óptico em amplitude com duas saídas complementares, ou, como um filtro óptico de resposta de fase sintonizável. O primeiro princípio de operação serve como base para um novo sistema fotónico de medição em tempo real da frequência de um sinal RF. O segundo princípio de operação é explorado num novo sistema fotónico de direccionamento do campo eléctrico radiado por um agregado de antenas, e também num novo compensador sintonizável de dispersão cromática. O processamento de sinal é não linear quando sinais ópticos são atrasados e posteriormente misturados entre si, em vez de serem linearmente adicionados. Este princípio de operação está por detrás da mistura de um sinal eléctrico com um sinal óptico, que por sua vez é a base de um novo sistema fotónico de medição em tempo real da frequência de um sinal RF. A mistura de sinais ópticos em meios não lineares permite uma operação eficiente numa grande largura espectral. Tal operação é usada para realizar conversão de comprimento de onda sintonizável. Um sinal óptico com multiplexagem no domínio temporal de elevada largura de banda é misturado com duas bombas ópticas não moduladas com base em processos não lineares paramétricos num guia de ondas de niobato de lítio com inversão periódica da polarização dos domínios ferroeléctricos. Noutro trabalho, uma bomba pulsada em que cada pulso tem um comprimento de onda sintonizável serve como base a um novo conversor de sinal óptico com multiplexagem no domínio temporal para um sinal óptico com multiplexagem no comprimento de onda. A bomba é misturada com o sinal óptico de entrada através de um processo não linear paramétrico numa fibra óptica com parâmetro não linear elevado. Todos os dispositivos fotónicos de processamento de sinal linear ou não linear propostos são experimentalmente validados. São também modelados teoricamente ou através de simulação, com a excepção dos que envolvem mistura de sinais ópticos. Uma análise qualitativa é suficiente nestes últimos dispositivos.
Resumo:
In his introduction, Pinna (2010) quoted one of Wertheimer’s observations: “I stand at the window and see a house, trees, sky. Theoretically I might say there were 327 brightnesses and nuances of color. Do I have ‘327’? No. I have sky, house, and trees.” This seems quite remarkable, for Max Wertheimer, together with Kurt Koffka and Wolfgang Koehler, was a pioneer of Gestalt Theory: perceptual organisation was tackled considering grouping rules of line and edge elements in relation to figure-ground segregation, i.e., a meaningful object (the figure) as perceived against a complex background (the ground). At the lowest level – line and edge elements – Wertheimer (1923) himself formulated grouping principles on the basis of proximity, good continuation, convexity, symmetry and, often forgotten, past experience of the observer. Rubin (1921) formulated rules for figure-ground segregation using surroundedness, size and orientation, but also convexity and symmetry. Almost a century of research into Gestalt later, Pinna and Reeves (2006) introduced the notion of figurality, meant to represent the integrated set of properties of visual objects, from the principles of grouping and figure-ground to the colour and volume of objects with shading. Pinna, in 2010, went one important step further and studied perceptual meaning, i.e., the interpretation of complex figures on the basis of past experience of the observer. Re-establishing a link to Wertheimer’s rule about past experience, he formulated five propositions, three definitions and seven properties on the basis of observations made on graphically manipulated patterns. For example, he introduced the illusion of meaning by comics-like elements suggesting wind, therefore inducing a learned interpretation. His last figure shows a regular array of squares but with irregular positions on the right side. This pile of (ir)regular squares can be interpreted as the result of an earthquake which destroyed part of an apartment block. This is much more intuitive, direct and economic than describing the complexity of the array of squares.
Resumo:
Hyperspectral instruments have been incorporated in satellite missions, providing large amounts of data of high spectral resolution of the Earth surface. This data can be used in remote sensing applications that often require a real-time or near-real-time response. To avoid delays between hyperspectral image acquisition and its interpretation, the last usually done on a ground station, onboard systems have emerged to process data, reducing the volume of information to transfer from the satellite to the ground station. For this purpose, compact reconfigurable hardware modules, such as field-programmable gate arrays (FPGAs), are widely used. This paper proposes an FPGA-based architecture for hyperspectral unmixing. This method based on the vertex component analysis (VCA) and it works without a dimensionality reduction preprocessing step. The architecture has been designed for a low-cost Xilinx Zynq board with a Zynq-7020 system-on-chip FPGA-based on the Artix-7 FPGA programmable logic and tested using real hyperspectral data. Experimental results indicate that the proposed implementation can achieve real-time processing, while maintaining the methods accuracy, which indicate the potential of the proposed platform to implement high-performance, low-cost embedded systems, opening perspectives for onboard hyperspectral image processing.