Stillwater Supercomputing

Future-proof your operations

turnkey solutions from edge to cloud

black and white image of a single board computer close up

Imbue your device with real-time intelligence

with our next generation platform. We did the all the hard work so you don't have to - cut down latency and boost productivity with just a few lines of code.

The Open-KL run-time is a modern virtual machine for knowledge processing applications. It encapsulates the data structures and operators that cover the math and computer science of intelligent systems operators, and presents high level abstractions for use by the application. The run-time is aware of the underlying hardware, and will dispatch a low level algorithm to execute the operator that is matched to the characteristics of the machine.

A key problem created by hardware acceleration is a new asymmetry between computational resources. In the standard stored program machine model, processing resources are instruction driven and access a shared, flat memory space. Coordination between computational resources is managed by instruction stream barriers, and pipes. Since there is symmetry among the processing elements each thread of execution assumes the same model of computation. However, for asymmetric hardware accelerated platforms, threads of execution have a very specific context, with very different performance and power characteristics. This creates the problem of coordination and collaboration between different computational resources, typically the central processor and the hardware accelerator.

This coordination and collaboration tries to minimize power consumption, and computational time. This minimization problem is the same for all accelerators, and thus a common run-time that manages this minimization is advantageous.

‍

High Performance

Knowledge processing operators, such as machine learning and sensor fusion, are complex algorithms.

OpenKL provides finely tuned parallel implementations that work with CPU, GPU, KPU, and in the elastic cloud.

Elastic Cloud Enabled

When applying knowledge processing techniques on Big Data, you'll want to leverage scalable cloud platforms.

OpenKL provides implementations that setup and tear down clusters, in the cloud if needed.

High-touch Support

Expert assistance when you need it.

Power efficient number systems and arithmetic libraries

Application-tailored precision and dynamic range for Deep Learning, DSP, HPC, and IoT workloads.

Deep Learning applications have highlighted the inefficiencies of the IEEE floating point format. Both Google and Microsoft have jettisoned IEEE floating point for their AI cloud services to gain two orders of magnitude better performance over their competitors. Similarly, AI applications for mobile and embedded applications have moved away from IEEE floating point to optimize performance per Watt.

However, Deep Learning applications are hardly the only applications that expose the limitations of IEEE floating point. Cloud scale, IoT, embedded, control, and HPC applications are also limited by the inefficiencies of the format. As NVIDIA, Google, and Microsoft have demonstrated, a simple change to a new number system can improve scale and cost of these applications by orders of magnitude, and create completely new application and service domains.

When performance and/or power efficiency are differentiating attributes for an application, the complexity of IEEE floats simply can't compete with number systems that are tailored to the needs of the application. Posits are a tapered floating point format, designed to replace IEEE floating point and provide a more robust computational arithmetic for the reals. The Stillwater Universal Number library provides application developers a ready-to-use arithmetic library to incorporate this new number system in their applications. To get started, simply clone the library and follow the README.

‍

Disadvantages of IEEE Floating Point

The core limitations of IEEE floating point are caused by two key problems of the format:

inefficient representation of the reals
inability to reproduce results across different concurrency environments

The complete list of issues that are holding back IEEE floating point formats:

Wasted Bit Patterns - 32-bit IEEE floating point has around eight million ways to represent NaN (Not-A-Number), while 64-bit floating point has two quadrillion. A NaN is an exception value to represent undefined or invalid results, such as the result of a division by zero, so there is absolutely no reason for allocating that many encodings to NaN.
Mathematically Incorrect
The format specifies two zeroes - a negative and positive zero - which behave differently.
Loss of associative and distributive arithmetic laws due to rounding after each operation.
This loss of associative and distributive arithmetic behavior is problematic for reproducibility. This problem is particularly acute for embedded and control applications that need to behave predictably, for example, control systems in autonomous vehicles.
Overflows to ± inf and underflows to 0 - Overflowing to ± inf increases the relative error by an infinite factor, while underflowing to 0 loses sign information.
Unused dynamic range - The dynamic range of double precision floats is a whopping 2^2047, whereas most numerical software is architected to operate around 1.0.
Complicated Circuitry - Denormalized floating point numbers have a hidden bit of 0 instead of 1. This creates a host of special handling requirements that complicate compliant hardware implementations.
No Gradual Overflow and Fixed Accuracy - If accuracy is defined as the number of significand bits, IEEE floating point have fixed accuracy for all numbers except denormalized numbers because the number of signficand digits is fixed. Denormalized numbers are characterized by a decreased number of significand digits when the value approaches zero as a result of having a zero hidden bit. Denormalized numbers fill the underflow gap (i.e. the gap between zero and the least non-zero values). The counterpart for gradual underflow is gradual overflow which does not exist in IEEE floating points.

‍

Advantages of posits

In contrast, the posit number system is designed to be efficient, symmetric, and mathematically correct in any concurrency environment. Avoiding any special cases, such as denormalized numbers, yields a more efficient execution pipeline and higher performance per Watt.

Economical - No bit patterns are redundant. There is one representation for infinity denoted as ± inf and zero. All other bit patterns are valid distinct non-zero real numbers. ± inf serves as a replacement for NaN.
Mathematical Elegant - There is only one representation for zero, and the encoding is symmetric around 1.0. Associative and distributive laws are supported through deferred rounding via the quire, enabling reproducible linear algebra algorithms in any concurrency environment.
Tapered Accuracy - Tapered accuracy is when values with small exponent have more digits of accuracy and values with large exponents have less digits of accuracy. This concept was first introduced by Morris (1971) in his paper ”Tapered Floating Point: A New Floating-Point Representation”.
Parameterized precision and dynamic range -- posits are defined by a size, nbits, and the number of exponent bits, es. This enables system designers the freedom to pick the right precision and dynamic range required for the application. For example, for AI applications we may pick 5 or 6 bit posits without any exponent bits to improve performance. For embedded DSP applications, such as 5G base stations, we may select a 16 bit posit with one exponent bit to improve performance per Watt.
Simpler Circuitry - There are only two special cases, Not a Real and Zero. No denormalized numbers, overflow, or underflow.

‍

Goals of the library

This library is a bit-level arithmetic reference implementation of the evolving Universal Number Type III (posit and valid) standard. The library provides a faithful posit arithmetic layer for any C/C++/Python environment.

As a reference library, there is extensive test infrastructure to validate the arithmetic, and there is a host of utilities to become familiar with the internal workings of posits and valids.

Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of" nested selector system.

Template C++ Library

Header-only C++ template library makes it trivial to integrate into your computational software. Many software packages have gone before you, Eigen, MTL4, G+SMO, ODE, so you are in good company.

Accurate

The library models the arithmetic at the bit-level and is the validation vehicle for our posit-enabled tensor processor hardware.

Fully Parameterized

The library provides a complete set of posit configurations, ranging from the very small, posit<2,0>, to the very large, posit<256,5>.

Accelerating Innovation™
for 15 years.

We build custom solutions from edge to cloud, from scratch if necessary.
We deliver robust optimization and take the guesswork out of Cost of Ownership.
We have turnkey solutions that beat your competition and we have staying power.

Reduce Latency

Optimize Cost

Increase Efficiency

These improvements can be delivered for the AI/Deep Learning,
Machine Learning, Industry 4.0, Telecommunication, Finance,
Cyber Security and Defense markets.

Future-proof your operations

Imbue your device with real-time intelligence

Power efficient number systems and arithmetic libraries

Disadvantages of IEEE Floating Point

Advantages of posits

Goals of the library

Accelerating Innovation™ for 15 years.

We build custom solutions from edge to cloud, from scratch if necessary.We deliver robust optimization and take the guesswork out of Cost of Ownership.We have turnkey solutions that beat your competition and we have staying power.

Reduce Latency

Optimize Cost

Increase Efficiency

These improvements can be delivered for the AI/Deep Learning, Machine Learning, Industry 4.0, Telecommunication, Finance, Cyber Security and Defense markets.

Future-proof your operations

Imbue your device with real-time intelligence

Power efficient number systems and arithmetic libraries

Disadvantages of IEEE Floating Point

Advantages of posits

Goals of the library

Get news straight to your inbox

Accelerating Innovation™
for 15 years.

We build custom solutions from edge to cloud, from scratch if necessary.
We deliver robust optimization and take the guesswork out of Cost of Ownership.
We have turnkey solutions that beat your competition and we have staying power.

These improvements can be delivered for the AI/Deep Learning,
Machine Learning, Industry 4.0, Telecommunication, Finance,
Cyber Security and Defense markets.