8 resultados para QRD-M
Resumo:
In this paper, a low-complexity soft-output QRD-M detection algorithm is proposed for high-throughput Multiple-input multiple-output (MIMO) systems. By employing novel expansion on demand and distributed sorting scheme, the proposed algorithm can reduce 70% and 85% foundational operations for 16-QAM and 64-QAM respectively compared to the conventional QRD-M algorithm. Furthermore, the proposed algorithm can yield soft information to improve the bit error rate (BER) performance. Simulation results show that the proposed algorithm can achieve a near-NIL detection performance with less foundational operations
Resumo:
An application specific programmable processor (ASIP) suitable for the real-time implementation of matrix computations such as Singular Value and QR Decomposition is presented. The processor incorporates facilities for the issue of parallel instructions and a dual-bus architecture that are designed to achieve high performance. Internally, it uses a CORDIC module to perform arithmetic operations, with pipelining of the internal recursive loop exploited to multiplex the two independent micro-rotations onto a single piece of hardware. The net result is a flexible processing element whose functionality can be changed under program control, which combines high performance with efficient silicon implementation. This is illustrated through the results of a detailed silicon design study and the applications of the techniques to a combined SVD/QRD system.
Resumo:
A silicon implementation of the Approximate Rotations algorithm capable of carrying the computational load of algorithms such as QRD and SVD, within the real-time realisation of applications such as Adaptive Beamforming, is described. A modification to the original Approximate Rotations algorithm to simplify the method of optimal angle selection is proposed. Analysis shows that fewer iterations of the Approximate Rotations algorithm are required compared with the conventional CORDIC algorithm to achieve similar degrees of accuracy. The silicon design studies undertaken provide direct practical evidence of superior performance with the Approximate Rotations algorithm, requiring approximately 40% of the total computation time of the conventional CORDIC algorithm, for a similar silicon area cost. © 2004 IEEE.
Resumo:
In the world of high performance computing huge efforts have been put to accelerate Numerical Linear Algebra (NLA) kernels like QR Decomposition (QRD) with the added advantage of reconfigurability and scalability. While popular custom hardware solution in form of systolic arrays can deliver high performance, they are not scalable, and hence not commercially viable. In this paper, we show how systolic solutions of QRD can be realized efficiently on REDEFINE, a scalable runtime reconfigurable hardware platform. We propose various enhancements to REDEFINE to meet the custom need of accelerating NLA kernels. We further do the design space exploration of the proposed solution for any arbitrary application of size n × n. We determine the right size of the sub-array in accordance with the optimal pipeline depth of the core execution units and the number of such units to be used per sub-array.
Resumo:
QR decomposition (QRD) is a widely used Numerical Linear Algebra (NLA) kernel with applications ranging from SONAR beamforming to wireless MIMO receivers. In this paper, we propose a novel Givens Rotation (GR) based QRD (GR QRD) where we reduce the computational complexity of GR and exploit higher degree of parallelism. This low complexity Column-wise GR (CGR) can annihilate multiple elements of a column of a matrix simultaneously. The algorithm is first realized on a Two-Dimensional (2 D) systolic array and then implemented on REDEFINE which is a Coarse Grained run-time Reconfigurable Architecture (CGRA). We benchmark the proposed implementation against state-of-the-art implementations to report better throughput, convergence and scalability.
Resumo:
Submitted by 阎军 (yanj@red.semi.ac.cn) on 2010-04-07T05:24:22Z No. of bitstreams: 1 鉴海防.pdf: 4830504 bytes, checksum: e38f23ff06bf16bda48b7ef96b7511ac (MD5)
Resumo:
A novel application-specific instruction set processor (ASIP) for use in the construction of modern signal processing systems is presented. This is a flexible device that can be used in the construction of array processor systems for the real-time implementation of functions such as singular-value decomposition (SVD) and QR decomposition (QRD), as well as other important matrix computations. It uses a coordinate rotation digital computer (CORDIC) module to perform arithmetic operations and several approaches are adopted to achieve high performance including pipelining of the micro-rotations, the use of parallel instructions and a dual-bus architecture. In addition, a novel method for scale factor correction is presented which only needs to be applied once at the end of the computation. This also reduces computation time and enhances performance. Methods are described which allow this processor to be used in reduced dimension (i.e., folded) array processor structures that allow tradeoffs between hardware and performance. The net result is a flexible matrix computational processing element (PE) whose functionality can be changed under program control for use in a wider range of scenarios than previous work. Details are presented of the results of a design study, which considers the application of this decomposition PE architecture in a combined SVD/QRD system and demonstrates that a combination of high performance and efficient silicon implementation are achievable. © 2005 IEEE.
Resumo:
Modern Multiple-Input Multiple-Output (MIMO) communication systems place huge demands on embedded processing resources in terms of throughput, latency and resource utilization. State-of-the-art MIMO detector algorithms, such as Fixed-Complexity Sphere Decoding (FSD), rely on efficient channel preprocessing involving numerous calculations of the pseudo-inverse of the channel matrix by QR Decomposition (QRD) and ordering. These highly complicated operations can quickly become the critical prerequisite for real-time MIMO detection, exaggerated as the number of antennas in a MIMO detector increases. This paper describes a sorted QR decomposition (SQRD) algorithm extended for FSD, which significantly reduces the complexity and latency
of this preprocessing step and increases the throughput of MIMO detection. It merges the calculations of the QRD and ordering operations to avoid multiple iterations of QRD. Specifically, it shows that SQRD reduces the computational complexity by over 60-70% when compared to conventional
MIMO preprocessing algorithms. In 4x4 to 7x7 MIMO cases, the approach suffers merely 0.16-0.2 dB reduction in Bit Error Rate (BER) performance.