992 resultados para floating point unit
Resumo:
This paper proposes a set of well defined steps to design functional verification monitors intended to verify Floating Point Units (FPU) described in HDL. The first step consists on defining the input and output domain coverage. Next, the corner cases are defined. Finally, an already verified reference model is used in order to test the correctness of the Device Under Verification (DUV). As a case study a monitor for an IEEE754-2008 compliant design is implemented. This monitor is built to be easily instantiated into verification frameworks such as OVM. Two different designs were verified reaching complete input coverage and successful compliant results.
Resumo:
Scientific applications rely heavily on floating point data types. Floating point operations are complex and require complicated hardware that is both area and power intensive. The emergence of massively parallel architectures like Rigel creates new challenges and poses new questions with respect to floating point support. The massively parallel aspect of Rigel places great emphasis on area efficient, low power designs. At the same time, Rigel is a general purpose accelerator and must provide high performance for a wide class of applications. This thesis presents an analysis of various floating point unit (FPU) components with respect to Rigel, and attempts to present a candidate design of an FPU that balances performance, area, and power and is suitable for massively parallel architectures like Rigel.
Resumo:
This paper presents a single precision floating point arithmetic unit with support for multiplication, addition, fused multiply-add, reciprocal, square-root and inverse squareroot with high-performance and low resource usage. The design uses a piecewise 2nd order polynomial approximation to implement reciprocal, square-root and inverse square-root. The unit can be configured with any number of operations and is capable to calculate any function with a throughput of one operation per cycle. The floatingpoint multiplier of the unit is also used to implement the polynomial approximation and the fused multiply-add operation. We have compared our implementation with other state-of-the-art proposals, including the Xilinx Core-Gen operators, and conclude that the approach has a high relative performance/area efficiency. © 2014 Technical University of Munich (TUM).
Resumo:
IEEE 754 floating-point arithmetic is widely used in modern, general-purpose computers. It is based on real arithmetic and is made total by adding both a positive and a negative infinity, a negative zero, and many Not-a-Number (NaN) states. Transreal arithmetic is total. It also has a positive and a negative infinity but no negative zero, and it has a single, unordered number, nullity. Modifying the IEEE arithmetic so that it uses transreal arithmetic has a number of advantages. It removes one redundant binade from IEEE floating-point objects, doubling the numerical precision of the arithmetic. It removes eight redundant, relational,floating-point operations and removes the redundant total order operation. It replaces the non-reflexive, floating-point, equality operator with a reflexive equality operator and it indicates that some of the exceptions may be removed as redundant { subject to issues of backward compatibility and transient future compatibility as programmers migrate to the transreal paradigm.
Resumo:
The IEEE 754 standard for oating-point arithmetic is widely used in computing. It is based on real arithmetic and is made total by adding both a positive and a negative infinity, a negative zero, and many Not-a-Number (NaN) states. The IEEE infinities are said to have the behaviour of limits. Transreal arithmetic is total. It also has a positive and a negative infinity but no negative zero, and it has a single, unordered number, nullity. We elucidate the transreal tangent and extend real limits to transreal limits. Arguing from this firm foundation, we maintain that there are three category errors in the IEEE 754 standard. Firstly the claim that IEEE infinities are limits of real arithmetic confuses limiting processes with arithmetic. Secondly a defence of IEEE negative zero confuses the limit of a function with the value of a function. Thirdly the definition of IEEE NaNs confuses undefined with unordered. Furthermore we prove that the tangent function, with the infinities given by geometrical con- struction, has a period of an entire rotation, not half a rotation as is commonly understood. This illustrates a category error, confusing the limit with the value of a function, in an important area of applied mathe- matics { trigonometry. We brie y consider the wider implications of this category error. Another paper proposes transreal arithmetic as a basis for floating- point arithmetic; here we take the profound step of proposing transreal arithmetic as a replacement for real arithmetic to remove the possibility of certain category errors in mathematics. Thus we propose both theo- retical and practical advantages of transmathematics. In particular we argue that implementing transreal analysis in trans- floating-point arith- metic would extend the coverage, accuracy and reliability of almost all computer programs that exploit real analysis { essentially all programs in science and engineering and many in finance, medicine and other socially beneficial applications.
Resumo:
Localization and Mapping are two of the most important capabilities for autonomous mobile robots and have been receiving considerable attention from the scientific computing community over the last 10 years. One of the most efficient methods to address these problems is based on the use of the Extended Kalman Filter (EKF). The EKF simultaneously estimates a model of the environment (map) and the position of the robot based on odometric and exteroceptive sensor information. As this algorithm demands a considerable amount of computation, it is usually executed on high end PCs coupled to the robot. In this work we present an FPGA-based architecture for the EKF algorithm that is capable of processing two-dimensional maps containing up to 1.8 k features at real time (14 Hz), a three-fold improvement over a Pentium M 1.6 GHz, and a 13-fold improvement over an ARM920T 200 MHz. The proposed architecture also consumes only 1.3% of the Pentium and 12.3% of the ARM energy per feature.
Resumo:
The high integration density of current nanometer technologies allows the implementation of complex floating-point applications in a single FPGA. In this work the intrinsic complexity of floating-point operators is addressed targeting configurable devices and making design decisions providing the most suitable performance-standard compliance trade-offs. A set of floating-point libraries composed of adder/subtracter, multiplier, divisor, square root, exponential, logarithm and power function are presented. Each library has been designed taking into account special characteristics of current FPGAs, and with this purpose we have adapted the IEEE floating-point standard (software-oriented) to a custom FPGA-oriented format. Extended experimental results validate the design decisions made and prove the usefulness of reducing the format complexity
Resumo:
The high performance and capacity of current FPGAs makes them suitable as acceleration co-processors. This article studies the implementation, for such accelerators, of the floating-point power function xy as defined by the C99 and IEEE 754-2008 standards, generalized here to arbitrary exponent and mantissa sizes. Last-bit accuracy at the smallest possible cost is obtained thanks to a careful study of the various subcomponents: a floating-point logarithm, a modified floating-point exponential, and a truncated floating-point multiplier. A parameterized architecture generator in the open-source FloPoCo project is presented in details and evaluated.
Resumo:
Thesis (M. S.)--University of Illinois at Urbana-Champaign.
Resumo:
In this thesis we present an approach to automated verification of floating point programs. Existing techniques for automated generation of correctness theorems are extended to produce proof obligations for accuracy guarantees and absence of floating point exceptions. A prototype automated real number theorem prover is presented, demonstrating a novel application of function interval arithmetic in the context of subdivision-based numerical theorem proving. The prototype is tested on correctness theorems for two simple yet nontrivial programs, proving exception freedom and tight accuracy guarantees automatically. The prover demonstrates a novel application of function interval arithmetic in the context of subdivision-based numerical theorem proving. The experiments show how function intervals can be used to combat the information loss problems that limit the applicability of traditional interval arithmetic in the context of hard real number theorem proving.
Resumo:
The focus of our work is the verification of tight functional properties of numerical programs, such as showing that a floating-point implementation of Riemann integration computes a close approximation of the exact integral. Programmers and engineers writing such programs will benefit from verification tools that support an expressive specification language and that are highly automated. Our work provides a new method for verification of numerical software, supporting a substantially more expressive language for specifications than other publicly available automated tools. The additional expressivity in the specification language is provided by two constructs. First, the specification can feature inclusions between interval arithmetic expressions. Second, the integral operator from classical analysis can be used in the specifications, where the integration bounds can be arbitrary expressions over real variables. To support our claim of expressivity, we outline the verification of four example programs, including the integration example mentioned earlier. A key component of our method is an algorithm for proving numerical theorems. This algorithm is based on automatic polynomial approximation of non-linear real and real-interval functions defined by expressions. The PolyPaver tool is our implementation of the algorithm and its source code is publicly available. In this paper we report on experiments using PolyPaver that indicate that the additional expressivity does not come at a performance cost when comparing with other publicly available state-of-the-art provers. We also include a scalability study that explores the limits of PolyPaver in proving tight functional specifications of progressively larger randomly generated programs. © 2014 Springer International Publishing Switzerland.
Resumo:
Microprocessori basati su singolo processore (CPU), hanno visto una rapida crescita di performances ed un abbattimento dei costi per circa venti anni. Questi microprocessori hanno portato una potenza di calcolo nell’ordine del GFLOPS (Giga Floating Point Operation per Second) sui PC Desktop e centinaia di GFLOPS su clusters di server. Questa ascesa ha portato nuove funzionalità nei programmi, migliori interfacce utente e tanti altri vantaggi. Tuttavia questa crescita ha subito un brusco rallentamento nel 2003 a causa di consumi energetici sempre più elevati e problemi di dissipazione termica, che hanno impedito incrementi di frequenza di clock. I limiti fisici del silicio erano sempre più vicini. Per ovviare al problema i produttori di CPU (Central Processing Unit) hanno iniziato a progettare microprocessori multicore, scelta che ha avuto un impatto notevole sulla comunità degli sviluppatori, abituati a considerare il software come una serie di comandi sequenziali. Quindi i programmi che avevano sempre giovato di miglioramenti di prestazioni ad ogni nuova generazione di CPU, non hanno avuto incrementi di performance, in quanto essendo eseguiti su un solo core, non beneficiavano dell’intera potenza della CPU. Per sfruttare appieno la potenza delle nuove CPU la programmazione concorrente, precedentemente utilizzata solo su sistemi costosi o supercomputers, è diventata una pratica sempre più utilizzata dagli sviluppatori. Allo stesso tempo, l’industria videoludica ha conquistato una fetta di mercato notevole: solo nel 2013 verranno spesi quasi 100 miliardi di dollari fra hardware e software dedicati al gaming. Le software houses impegnate nello sviluppo di videogames, per rendere i loro titoli più accattivanti, puntano su motori grafici sempre più potenti e spesso scarsamente ottimizzati, rendendoli estremamente esosi in termini di performance. Per questo motivo i produttori di GPU (Graphic Processing Unit), specialmente nell’ultimo decennio, hanno dato vita ad una vera e propria rincorsa alle performances che li ha portati ad ottenere dei prodotti con capacità di calcolo vertiginose. Ma al contrario delle CPU che agli inizi del 2000 intrapresero la strada del multicore per continuare a favorire programmi sequenziali, le GPU sono diventate manycore, ovvero con centinaia e centinaia di piccoli cores che eseguono calcoli in parallelo. Questa immensa capacità di calcolo può essere utilizzata in altri campi applicativi? La risposta è si e l’obiettivo di questa tesi è proprio quello di constatare allo stato attuale, in che modo e con quale efficienza pùo un software generico, avvalersi dell’utilizzo della GPU invece della CPU.
Resumo:
Electromagnetic suspension systems are inherently nonlinear and often face hardware limitation when digitally controlled. The main contributions of this paper are: the design of a nonlinear H(infinity) controller. including dynamic weighting functions, applied to a large gap electromagnetic suspension system and the presentation of a procedure to implement this controller on a fixed-point DSP, through a methodology able to translate a floating-point algorithm into a fixed-point algorithm by using l(infinity) norm minimization due to conversion error. Experimental results are also presented, in which the performance of the nonlinear controller is evaluated specifically in the initial suspension phase. (C) 2009 Elsevier Ltd. All rights reserved.