972 resultados para hardware implementation


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract—There are sometimes occasions when ultrasound beamforming is performed with only a subset of the total data that will eventually be available. The most obvious example is a mechanically-swept (wobbler) probe in which the three-dimensional data block is formed from a set of individual B-scans. In these circumstances, non-blind deconvolution can be used to improve the resolution of the data. Unfortunately, most of these situations involve large blocks of three-dimensional data. Furthermore, the ultrasound blur function varies spatially with distance from the transducer. These two facts make the deconvolution process time-consuming to implement. This paper is about ways to address this problem and produce spatially-varying deconvolution of large blocks of three-dimensional data in a matter of seconds. We present two approaches, one based on hardware and the other based on software. We compare the time they each take to achieve similar results and discuss the computational resources and form of blur model that each requires.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Parallel processing techniques have been used in the past to provide high performance computing resources for activities such as fire-field modelling. This has traditionally been achieved using specialized hardware and software, the expense of which would be difficult to justify for many fire engineering practices. In this article we demonstrate how typical office-based PCs attached to a Local Area Network has the potential to offer the benefits of parallel processing with minimal costs associated with the purchase of additional hardware or software. It was found that good speedups could be achieved on homogeneous networks of PCs, for example a problem composed of ~100,000 cells would run 9.3 times faster on a network of 12 800MHz PCs than on a single 800MHz PC. It was also found that a network of eight 3.2GHz Pentium 4 PCs would run 7.04 times faster than a single 3.2GHz Pentium computer. A dynamic load balancing scheme was also devised to allow the effective use of the software on heterogeneous PC networks. This scheme also ensured that the impact between the parallel processing task and other computer users on the network was minimized.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A novel application-specific instruction set processor (ASIP) for use in the construction of modern signal processing systems is presented. This is a flexible device that can be used in the construction of array processor systems for the real-time implementation of functions such as singular-value decomposition (SVD) and QR decomposition (QRD), as well as other important matrix computations. It uses a coordinate rotation digital computer (CORDIC) module to perform arithmetic operations and several approaches are adopted to achieve high performance including pipelining of the micro-rotations, the use of parallel instructions and a dual-bus architecture. In addition, a novel method for scale factor correction is presented which only needs to be applied once at the end of the computation. This also reduces computation time and enhances performance. Methods are described which allow this processor to be used in reduced dimension (i.e., folded) array processor structures that allow tradeoffs between hardware and performance. The net result is a flexible matrix computational processing element (PE) whose functionality can be changed under program control for use in a wider range of scenarios than previous work. Details are presented of the results of a design study, which considers the application of this decomposition PE architecture in a combined SVD/QRD system and demonstrates that a combination of high performance and efficient silicon implementation are achievable. © 2005 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper, chosen as a best paper from the 2005 SAMOS Workshop on Computer Systems: describes the for the first time the major Abhainn project for automated system level design of embedded signal processing systems. In particular, this describes four key novelties: novel algorithm modelling techniques for DSP systems, automated implementation realisation, algorithm transformation for system optimisation and automated inter-processor communication. This is applied to two complex systems: a radar and sonar system. In both cases technology which allows non-experts to automatically create low-overhead, high performance embedded signal processing systems is exhibited.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A novel tag computation circuit for a credit based Self-Clocked Fair Queuing (SCFQ) Scheduler is presented. The scheduler combines Weighted Fair Queuing (WFQ) with a credit based bandwidth reallocation scheme. The proposed architecture is able to reallocate bandwidth on the fly if particular links suffer from channel quality degradation .The hardware architecture is parallel and pipelined enabling an aggregated throughput rate of 180 million tag computations per second. The throughput performance is ideal for Broadband Wireless Access applications, allowing room for relatively complex computations in QoS aware adaptive scheduling. The high-level system break-down is described and synthesis results for Altera Stratix II FPGA technology are presented.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

High-speed field-programmable gate array (FPGA) implementations of an adaptive least mean square (LMS) filter with application in an electronic support measures (ESM) digital receiver, are presented. They employ "fine-grained" pipelining, i.e., pipelining within the processor and result in an increased output latency when used in the LMS recursive system. Therefore, the major challenge is to maintain a low latency output whilst increasing the pipeline stage in the filter for higher speeds. Using the delayed LMS (DLMS) algorithm, fine-grained pipelined FPGA implementations using both the direct form (DF) and the transposed form (TF) are considered and compared. It is shown that the direct form LMS filter utilizes the FPGA resources more efficiently thereby allowing a 120 MHz sampling rate.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The design and implementation of a programmable cyclic redundancy check (CRC) computation circuit architecture, suitable for deployment in network related system-on-chips (SoCs) is presented. The architecture has been designed to be field reprogrammable so that it is fully flexible in terms of the polynomial deployed and the input port width. The circuit includes an embedded configuration controller that has a low reconfiguration time and hardware cost. The circuit has been synthesised and mapped to 130-nm UMC standard cell [application-specific integrated circuit (ASIC)] technology and is capable of supporting line speeds of 5 Gb/s. © 2006 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dynamic power consumption is very dependent on interconnect, so clever mapping of digital signal processing algorithms to parallelised realisations with data locality is vital. This is a particular problem for fast algorithm implementations where typically, designers will have sacrificed circuit structure for efficiency in software implementation. This study outlines an approach for reducing the dynamic power consumption of a class of fast algorithms by minimising the index space separation; this allows the generation of field programmable gate array (FPGA) implementations with reduced power consumption. It is shown how a 50% reduction in relative index space separation results in a measured power gain of 36 and 37% over a Cooley-Tukey Fast Fourier Transform (FFT)-based solution for both actual power measurements for a Xilinx Virtex-II FPGA implementation and circuit measurements for a Xilinx Virtex-5 implementation. The authors show the generality of the approach by applying it to a number of other fast algorithms namely the discrete cosine, the discrete Hartley and the Walsh-Hadamard transforms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A scalable large vocabulary, speaker independent speech recognition system is being developed using Hidden Markov Models (HMMs) for acoustic modeling and a Weighted Finite State Transducer (WFST) to compile sentence, word, and phoneme models. The system comprises a software backend search and an FPGA-based Gaussian calculation which are covered here. In this paper, we present an efficient pipelined design implemented both as an embedded peripheral and as a scalable, parallel hardware accelerator. Both architectures have been implemented on an Alpha Data XRC-5T1, reconfigurable computer housing a Virtex 5 SX95T FPGA. The core has been tested and is capable of calculating a full set of Gaussian results from 3825 acoustic models in 9.03 ms which coupled with a backend search of 5000 words has provided an accuracy of over 80%. Parallel implementations have been designed with up to 32 cores and have been successfully implemented with a clock frequency of 133?MHz.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A new type of advanced encryption standard (AES) implementation using a normal basis is presented. The method is based on a lookup technique that makes use of inversion and shift registers, which leads to a smaller size of lookup for the S-box than its corresponding implementations. The reduction in the lookup size is based on grouping sets of inverses into conjugate sets which in turn leads to a reduction in the number of lookup values. The above technique is implemented in a regular AES architecture using register files, which requires less interconnect and area and is suitable for security applications. The results of the implementation are competitive in throughput and area compared with the corresponding solutions in a polynomial basis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a lookup circuit with advanced memory techniques and algorithms that examines network packet headers at high throughput rates. Hardware solutions and test scenarios are introduced to evaluate the proposed approach. The experimental results show that the proposed lookup circuit is able to achieve at least 39 million packet header lookups per second, which facilitates the application of next-generation stateful packet classifications at beyond 20Gbps internet traffic throughput rates.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, a hardware solution for packet classification based on multi-fields is presented. The proposed scheme focuses on a new architecture based on the decomposition method. A hash circuit is used in order to reduce the memory space required for the Recursive Flow Classification (RFC) algorithm. The implementation results show that the proposed architecture achieves significant performance advantage that is comparable to that of some well-known algorithms. The solution is based on Altera Stratix III FPGA technology.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The emergence of programmable logic devices as processing platforms for digital signal processing applications poses challenges concerning rapid implementation and high level optimization of algorithms on these platforms. This paper describes Abhainn, a rapid implementation methodology and toolsuite for translating an algorithmic expression of the system to a working implementation on a heterogeneous multiprocessor/field programmable gate array platform, or a standalone system on programmable chip solution. Two particular focuses for Abhainn are the automated but configurable realisation of inter-processor communuication fabrics, and the establishment of novel dedicated hardware component design methodologies allowing algorithm level transformation for system optimization. This paper outlines the approaches employed in both these particular instances.