129 resultados para intel processor

em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Cell Broadband Engine (BE) Architecture is a new heterogeneous multi-core architecture targeted at compute-intensive workloads. The architecture of the Cell BE has several features that are unique in high-performance general-purpose processors, most notably the extensive support for vectorization, scratch pad memories and explicit programming of direct memory accesses (DMAs) and mailbox communication. While these features strongly increase programming complexity, it is generally claimed that significant speedups can be obtained by using Cell BE processors. This paper presents our experiences with using the Cell BE architecture to accelerate Clustal W, a bio-informatics program for multiple sequence alignment. We report on how we apply the unique features of the Cell BE to Clustal W and how important each is in obtaining high performance. By making extensive use of vectorization and by parallelizing the application across all cores, we demonstrate a speedup of 24.4 times when using 16 synergistic processor units on a QS21 Cell Blade compared to single-thread execution on the power processing unit. As the Cell BE exploits a large number of slim cores, our highly optimized implementation is just 3.8 times faster than a 3-thread version running on an Intel Core2 Duo, as the latter processor exploits a small number of fat cores.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A generic architecture for implementing a QR array processor in silicon is presented. This improves on previous research by considerably simplifying the derivation of timing schedules for a QR system implemented as a folded linear array, where account has to be taken of processor cell latency and timing at the detailed circuit level. The architecture and scheduling derived have been used to create a generator for the rapid design of System-on-a-Chip (SoC) cores for QR decomposition. This is demonstrated through the design of a single-chip architecture for implementing an adaptive beamformer for radar applications. Published as IEEE Trans Circuits and Systems Part II, Analog and Digital Signal Processing, April 2003 NOT Express Briefs. Parts 1 and II of Journal reorganised since then into Regular Papers and Express briefs

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A novel application-specific instruction set processor (ASIP) for use in the construction of modern signal processing systems is presented. This is a flexible device that can be used in the construction of array processor systems for the real-time implementation of functions such as singular-value decomposition (SVD) and QR decomposition (QRD), as well as other important matrix computations. It uses a coordinate rotation digital computer (CORDIC) module to perform arithmetic operations and several approaches are adopted to achieve high performance including pipelining of the micro-rotations, the use of parallel instructions and a dual-bus architecture. In addition, a novel method for scale factor correction is presented which only needs to be applied once at the end of the computation. This also reduces computation time and enhances performance. Methods are described which allow this processor to be used in reduced dimension (i.e., folded) array processor structures that allow tradeoffs between hardware and performance. The net result is a flexible matrix computational processing element (PE) whose functionality can be changed under program control for use in a wider range of scenarios than previous work. Details are presented of the results of a design study, which considers the application of this decomposition PE architecture in a combined SVD/QRD system and demonstrates that a combination of high performance and efficient silicon implementation are achievable. © 2005 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A novel wireless local area network (WLAN) security processor is described in this paper. It is designed to offload security encapsulation processing from the host microprocessor in an IEEE 802.11i compliant medium access control layer to a programmable hardware accelerator. The unique design, which comprises dedicated cryptographic instructions and hardware coprocessors, is capable of performing wired equivalent privacy, temporal key integrity protocol, counter mode with cipher block chaining message authentication code protocol, and wireless robust authentication protocol. Existing solutions to wireless security have been implemented on hardware devices and target specific WLAN protocols whereas the programmable security processor proposed in this paper provides support for all WLAN protocols and thus, can offer backwards compatibility as well as future upgrade ability as standards evolve. It provides this additional functionality while still achieving equivalent throughput rates to existing architectures. © 2006 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An application specific programmable processor (ASIP) suitable for the real-time implementation of matrix computations such as Singular Value and QR Decomposition is presented. The processor incorporates facilities for the issue of parallel instructions and a dual-bus architecture that are designed to achieve high performance. Internally, it uses a CORDIC module to perform arithmetic operations, with pipelining of the internal recursive loop exploited to multiplex the two independent micro-rotations onto a single piece of hardware. The net result is a flexible processing element whose functionality can be changed under program control, which combines high performance with efficient silicon implementation. This is illustrated through the results of a detailed silicon design study and the applications of the techniques to a combined SVD/QRD system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An area-efficient high-throughput architecture based on distributed arithmetic is proposed for 3D discrete wavelet transform (DWT). The 3D DWT processor was designed in VHDL and mapped to a Xilinx Virtex-E FPGA. The processor runs up to 85 MHz, which can process the five-level DWT analysis of a 128 x 128 x 128 fMRI volume image in 20 ms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A novel power-efficient systolic array architecture is proposed for full search block matching (FSBM) motion estimation, where the partial distortion elimination algorithm is used to dynamically switch off the computation of eliminated partial candidate blocks. The RTL-level simulation shows that the proposed architecture can reduce the power consumption of the computation part of the algorithm to about 60% of that of the conventional 2D systolic arrays.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, a new reconfigurable multi-standard Motion Estimation (ME) architecture is proposed and a standard-cell based design study is presented. The architecture exhibits simpler control, high throughput and relative low hardware cost and is highly competitive when compared with existing designs for specific video standards. ©2007 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper a novel scalable public-key processor architecture is presented that supports modular exponentiation and Elliptic Curve Cryptography over both prime GF(p) and binary GF(2) extension fields. This is achieved by a high performance instruction set that provides a comprehensive range of integer and polynomial basis field arithmetic. The instruction set and associated hardware are generic in nature and do not specifically support any cryptographic algorithms or protocols. Firmware within the device is used to efficiently implement complex and data intensive arithmetic. A firmware library has been developed in order to demonstrate support for numerous exponentiation and ECC approaches, such as different coordinate systems and integer recoding methods. The processor has been developed as a high-performance asymmetric cryptography platform in the form of a scalable Verilog RTL core. Various features of the processor may be scaled, such as the pipeline width and local memory subsystem, in order to suit area, speed and power requirements. The processor is evaluated and compares favourably with previous work in terms of performance while offering an unparalleled degree of flexibility. © 2006 IEEE.

Relevância:

20.00% 20.00%

Publicador: