54 resultados para Arithmetic.
Resumo:
We investigate the group valued functor G(D) = D*/F*D' where D is a division algebra with center F and D' the commutator subgroup of D*. We show that G has the most important functorial properties of the reduced Whitehead group SK1. We then establish a fundamental connection between this group, its residue version, and relative value group when D is a Henselian division algebra. The structure of G(D) turns out to carry significant information about the arithmetic of D. Along these lines, we employ G(D) to compute the group SK1(D). As an application, we obtain theorems of reduced K-theory which require heavy machinery, as simple examples of our method.
Resumo:
A novel application-specific instruction set processor (ASIP) for use in the construction of modern signal processing systems is presented. This is a flexible device that can be used in the construction of array processor systems for the real-time implementation of functions such as singular-value decomposition (SVD) and QR decomposition (QRD), as well as other important matrix computations. It uses a coordinate rotation digital computer (CORDIC) module to perform arithmetic operations and several approaches are adopted to achieve high performance including pipelining of the micro-rotations, the use of parallel instructions and a dual-bus architecture. In addition, a novel method for scale factor correction is presented which only needs to be applied once at the end of the computation. This also reduces computation time and enhances performance. Methods are described which allow this processor to be used in reduced dimension (i.e., folded) array processor structures that allow tradeoffs between hardware and performance. The net result is a flexible matrix computational processing element (PE) whose functionality can be changed under program control for use in a wider range of scenarios than previous work. Details are presented of the results of a design study, which considers the application of this decomposition PE architecture in a combined SVD/QRD system and demonstrates that a combination of high performance and efficient silicon implementation are achievable. © 2005 IEEE.
Resumo:
An application specific programmable processor (ASIP) suitable for the real-time implementation of matrix computations such as Singular Value and QR Decomposition is presented. The processor incorporates facilities for the issue of parallel instructions and a dual-bus architecture that are designed to achieve high performance. Internally, it uses a CORDIC module to perform arithmetic operations, with pipelining of the internal recursive loop exploited to multiplex the two independent micro-rotations onto a single piece of hardware. The net result is a flexible processing element whose functionality can be changed under program control, which combines high performance with efficient silicon implementation. This is illustrated through the results of a detailed silicon design study and the applications of the techniques to a combined SVD/QRD system.
Resumo:
Scientific computation has unavoidable approximations built into its very fabric. One important source of error that is difficult to detect and control is round-off error propagation which originates from the use of finite precision arithmetic. We propose that there is a need to perform regular numerical `health checks' on scientific codes in order to detect the cancerous effect of round-off error propagation. This is particularly important in scientific codes that are built on legacy software. We advocate the use of the CADNA library as a suitable numerical screening tool. We present a case study to illustrate the practical use of CADNA in scientific codes that are of interest to the Computer Physics Communications readership. In doing so we hope to stimulate a greater awareness of round-off error propagation and present a practical means by which it can be analyzed and managed.
Resumo:
An area-efficient high-throughput architecture based on distributed arithmetic is proposed for 3D discrete wavelet transform (DWT). The 3D DWT processor was designed in VHDL and mapped to a Xilinx Virtex-E FPGA. The processor runs up to 85 MHz, which can process the five-level DWT analysis of a 128 x 128 x 128 fMRI volume image in 20 ms.
Resumo:
In this paper a novel scalable public-key processor architecture is presented that supports modular exponentiation and Elliptic Curve Cryptography over both prime GF(p) and binary GF(2) extension fields. This is achieved by a high performance instruction set that provides a comprehensive range of integer and polynomial basis field arithmetic. The instruction set and associated hardware are generic in nature and do not specifically support any cryptographic algorithms or protocols. Firmware within the device is used to efficiently implement complex and data intensive arithmetic. A firmware library has been developed in order to demonstrate support for numerous exponentiation and ECC approaches, such as different coordinate systems and integer recoding methods. The processor has been developed as a high-performance asymmetric cryptography platform in the form of a scalable Verilog RTL core. Various features of the processor may be scaled, such as the pipeline width and local memory subsystem, in order to suit area, speed and power requirements. The processor is evaluated and compares favourably with previous work in terms of performance while offering an unparalleled degree of flexibility. © 2006 IEEE.
Resumo:
The arithmetical performance of typically achieving 5- to 7-year-olds (N = 29) was measured at four 6-month intervals. The same seven tasks were used at each time point: exact calculation, story problems, approximate arithmetic, place value, calculation principles, forced retrieval, and written problems. Although group analysis showed mostly linear growth over the 18-month period, analysis of individual differences revealed a much more complex picture. Some children exhibited marked variation in performance across the seven tasks, including evidence of difficulty in some cases. Individual growth patterns also showed differences in developmental trajectories between children on each task and within children across tasks. The findings support the idea of the componential nature of arithmetical ability and underscore the need for further longitudinal research on typically achieving children and of careful consideration of individual differences. (C) 2009 Elsevier Inc. All rights reserved.
Resumo:
The choice of radix is crucial for multi-valued logic synthesis. Practical examples, however, reveal that it is not always possible to find the optimal radix when taking into consideration actual physical parameters of multi-valued operations. In other words, each radix has its advantages and disadvantages. Our proposal is to synthesise logic in different radices, so it may benefit from their combination. The theory presented in this paper is based on Reed-Muller expansions over Galois field arithmetic. The work aims to firstly estimate the potential of the new approach and to secondly analyse its impact on circuit parameters down to the level of physical gates. The presented theory has been applied to real-life examples focusing on cryptographic circuits where Galois Fields find frequent application. The benchmark results show the approach creates a new dimension for the trade-off between circuit parameters and provides information on how the implemented functions are related to different radices.
Resumo:
This implementation of a two-dimensional discrete cosine transform demonstrates the development of a suitable architectural style for a specific technology-in this case, the Xilinx XC6200 FPGA series. The design exploits distributed arithmetic, parallelism, and pipelining to achieve a high-performance custom-computing implementation.
Resumo:
The competition between Photoinduced electron transfer (PET) and other de-excitation pathways such as fluorescence and phosphorescence can be controlled within designed molecular structures. Depending on the particular design, the resulting optical output is thus a function of various inputs such as ion concentration and excitation light dose. Once digitized into binary code, these input-output patterns can be interpreted according to Boolean logic. The single-input logic types of YES and NOT cover simple sensors and the double- (or higher-) input logic types represent other gates such as AND and OR. The logic-based arithmetic processors such as half-adders and half-subtractors are also featured. Naturally, a principal application of the more complex gates is in multi-sensing contexts.
Resumo:
The initial part of this paper reviews the early challenges (c 1980) in achieving real-time silicon implementations of DSP computations. In particular, it discusses research on application specific architectures, including bit level systolic circuits that led to important advances in achieving the DSP performance levels then required. These were many orders of magnitude greater than those achievable using programmable (including early DSP) processors, and were demonstrated through the design of commercial digital correlator and digital filter chips. As is discussed, an important challenge was the application of these concepts to recursive computations as occur, for example, in Infinite Impulse Response (IIR) filters. An important breakthrough was to show how fine grained pipelining can be used if arithmetic is performed most significant bit (msb) first. This can be achieved using redundant number systems, including carry-save arithmetic. This research and its practical benefits were again demonstrated through a number of novel IIR filter chip designs which at the time, exhibited performance much greater than previous solutions. The architectural insights gained coupled with the regular nature of many DSP and video processing computations also provided the foundation for new methods for the rapid design and synthesis of complex DSP System-on-Chip (SoC), Intellectual Property (IP) cores. This included the creation of a wide portfolio of commercial SoC video compression cores (MPEG2, MPEG4, H.264) for very high performance applications ranging from cell phones to High Definition TV (HDTV). The work provided the foundation for systematic methodologies, tools and design flows including high-level design optimizations based on "algorithmic engineering" and also led to the creation of the Abhainn tool environment for the design of complex heterogeneous DSP platforms comprising processors and multiple FPGAs. The paper concludes with a discussion of the problems faced by designers in developing complex DSP systems using current SoC technology. © 2007 Springer Science+Business Media, LLC.
Resumo:
A novel high performance bit parallel architecture to perform square root and division is proposed. Relevant VLSI design issues have been addressed. By employing redundant arithmetic and a semisystolic schedule, the throughput has been made independent of the size of the array.
Resumo:
A bit level systolic array system is proposed for the Winograd Fourier transform algorithm. The design uses bit-serial arithmetic and, in common with other systolic arrays, features nearest-neighbor interconnections, regularity and high throughput. The short interconnections in this method contrast favorably with the long interconnections between butterflies required in the FFT. The structure is well suited to VLSI implementations. It is demonstrated how long transforms can be implemented with components designed to perform a short length transform. These components build into longer transforms preserving the regularity and structure of the short length transform design.
Resumo:
The application of fine grain pipelining techniques in the design of high performance Wave Digital Filters (WDFs) is described. It is shown that significant increases in the sampling rate of bit parallel circuits can be achieved using most significant bit (msb) first arithmetic. A novel VLSI architecture for implementing two-port adaptor circuits is described which embodies these ideas. The circuit in question is highly regular, uses msb first arithmetic and is implemented using simple carry-save adders. © 1992 Kluwer Academic Publishers.
Resumo:
A high-performance VLSI architecture to perform multiply-accumulate, division and square root operations is proposed. The circuit is highly regular, requires only minimal control and can be pipelined right down to the bit level. The system can also be reconfigured on every cycle to perform any one of these operations. The gate count per row has been estimated at (27n+70) gate equivalents where n is the divisor wordlength. The throughput rate, which equals the clock speed, is the same for each operation and is independent of the wordlength. This is achieved through the combination of pipelining and redundant arithmetic. With a 1.0 µm CMOS technology and extensive pipelining, throughput rates in excess of 70 million operations per second are expected.