Deep Learning applications have highlighted the inefficiencies of the IEEE floating point format.
Both Google and Microsoft have jettisoned IEEE floating point for their AI cloud services to gain two
orders of magnitude better performance over their competitors. Similarly, AI applications for mobile
and embedded applications have moved away from IEEE floating point to optimize performance per Watt.
However, Deep Learning applications are hardly the only applications that expose the limitations
of IEEE floating point. Cloud scale, IoT, embedded, control, and HPC applications are also
limited by the inefficiencies of the format. As NVIDIA, Google, and Microsoft have demonstrated,
a simple change to a new number system can improve scale and cost of these applications by
orders of magnitude, and create completely new application and service domains.
When performance and/or power efficiency are differentiating attributes for an application,
the complexity of IEEE floats simply can't compete with number systems that are tailored to the
needs of the application.
Posits are a tapered floating point format, designed to replace IEEE floating point and provide
a more robust computational arithmetic for the reals.
The Stillwater Universal Number library provides application developers a ready-to-use arithmetic
library to incorporate this new number system in their applications. To get started, simply
clone the library and follow the README.
Disadvantages of IEEE Floating Point
The core limitations of IEEE floating point are caused by two key problems of the format:
- inefficient representation of the reals
- inability to reproduce results across different concurrency environments
The complete list of issues that are holding back IEEE floating point formats:
- Wasted Bit Patterns - 32-bit IEEE floating point has around eight million ways
to represent NaN (Not-A-Number), while 64-bit floating point has two quadrillion.
A NaN is an exception value to represent undefined or invalid results, such as the result
of a division by zero, so there is absolutely no reason for allocating that many encodings to NaN.
- Mathematically Incorrect
This loss of associative and distributive arithmetic behavior is problematic for reproducibility.
This problem is particularly acute for embedded and control applications that need to
behave predictably, for example, control systems in autonomous vehicles.
- The format specifies two zeroes - a negative and positive zero - which behave differently.
- Loss of associative and distributive arithmetic laws due to rounding after each operation.
- Overflows to ± inf and underflows to 0 - Overflowing to ± inf increases the
relative error by an infinite factor, while underflowing to 0 loses sign information.
- Unused dynamic range - The dynamic range of double precision floats is a
whopping 2^2047, whereas most numerical software is architected to operate around 1.0.
- Complicated Circuitry - Denormalized floating point numbers have a hidden bit
of 0 instead of 1. This creates a host of special handling requirements that complicate
compliant hardware implementations.
- No Gradual Overflow and Fixed Accuracy - If accuracy is defined as the
number of significand bits, IEEE floating point have fixed accuracy for all numbers except
denormalized numbers because the number of signficand digits is fixed.
Denormalized numbers are characterized by a decreased number of significand digits when the
value approaches zero as a result of having a zero hidden bit. Denormalized numbers fill the
underflow gap (i.e. the gap between zero and the least non-zero values). The counterpart for
gradual underflow is gradual overflow which does not exist in IEEE floating points.
Advantages of posits
In contrast, the posit number system is designed to be efficient, symmetric,
and mathematically correct in any concurrency environment. Avoiding any special cases, such as
denormalized numbers, yields a more efficient execution pipeline and higher performance per Watt.
- Economical - No bit patterns are redundant.
There is one representation for infinity denoted as ± inf and zero.
All other bit patterns are valid distinct non-zero real numbers. ± inf serves as a replacement for NaN.
- Mathematical Elegant - There is only one representation for zero,
and the encoding is symmetric around 1.0. Associative and distributive laws are supported
through deferred rounding via the quire, enabling reproducible linear algebra algorithms in
any concurrency environment.
- Tapered Accuracy - Tapered accuracy is when values with small exponent
have more digits of accuracy and values with large exponents have less digits of accuracy.
This concept was first introduced by Morris (1971) in his paper ”Tapered Floating Point:
A New Floating-Point Representation”.
- Parameterized precision and dynamic range -- posits are defined by a size, nbits,
and the number of exponent bits, es. This enables system designers the freedom to pick
the right precision and dynamic range required for the application. For example, for
AI applications we may pick 5 or 6 bit posits without any exponent bits to improve performance.
For embedded DSP applications, such as 5G base stations, we may select a 16 bit posit with one
exponent bit to improve performance per Watt.
- Simpler Circuitry - There are only two special cases, Not a Real and Zero.
No denormalized numbers, overflow, or underflow.
Goals of the library
This library is a bit-level arithmetic reference implementation of the evolving Universal Number
Type III (posit and valid) standard. The library provides a faithful posit arithmetic layer for
any C/C++/Python environment.
As a reference library, there is extensive test infrastructure to validate the arithmetic,
and there is a host of utilities to become familiar with the internal workings of posits and valids.