Stillwater Supercomputing, Inc. started with a vision to deliver sophisticated high performance computing solutions with simple human values at heart.
We make the extraordinary not only achievable but sustainable.
We grow and unlock potential while respecting the planet we inhabit and the community we are proud to be a part of.
During our time here we delivered High Performance Computing and Cloud Infrastructure solutions to organizations large and small, helped under-served communities and collaborated with universities across three continents.
We met friends, like-minded innovators, supporters and allies.
Wonderful people from all walks of life united by the goal to go faster, further, together.
Still water runs deep.
Together, we celebrate 15 years of innovation and collaboration.
Together, we democratize high performance computing.
Together, we Accelerate Innovation™.
Achieve near zero-latency
of real-time, embedded and control systems.
Take the guesswork out of Total Cost of Ownership. Run cloud analytics and security services at lower cost.
Reduce Cost per Operation of transactional systems in traditional banking and on the blockchain.
Best-in-class energy savings so you can innovate and be mindful of our planet at the same time.
If your use case isn't listed here, please contact us for a free consultation.
with our next generation platform. We did the all the hard work so you don't have to - cut down latency and boost productivity with just a few lines of code.
The Open-KL run-time is a modern virtual machine for knowledge processing applications. It encapsulates the data structures and operators that cover the math and computer science of intelligent systems operators, and presents high level abstractions for use by the application. The run-time is aware of the underlying hardware, and will dispatch a low level algorithm to execute the operator that is matched to the characteristics of the machine.
A key problem created by hardware acceleration is a new asymmetry between computational resources. In the standard stored program machine model, processing resources are instruction driven and access a shared, flat memory space. Coordination between computational resources is managed by instruction stream barriers, and pipes. Since there is symmetry among the processing elements each thread of execution assumes the same model of computation. However, for asymmetric hardware accelerated platforms, threads of execution have a very specific context, with very different performance and power characteristics. This creates the problem of coordination and collaboration between different computational resources, typically the central processor and the hardware accelerator.
This coordination and collaboration tries to minimize power consumption, and computational time. This minimization problem is the same for all accelerators, and thus a common run-time that manages this minimization is advantageous.
Knowledge processing operators, such as machine learning and sensor fusion, are complex algorithms.
OpenKL provides finely tuned parallel implementations that work with CPU, GPU, KPU, and in the elastic cloud.
When applying knowledge processing techniques on Big Data, you'll want to leverage scalable cloud platforms.
OpenKL provides implementations that setup and tear down clusters, in the cloud if needed.
Expert assistance when you need it.
Application-tailored precision and dynamic range for Deep Learning, DSP, HPC, and IoT workloads.
Deep Learning applications have highlighted the inefficiencies of the IEEE floating point format. Both Google and Microsoft have jettisoned IEEE floating point for their AI cloud services to gain two orders of magnitude better performance over their competitors. Similarly, AI applications for mobile and embedded applications have moved away from IEEE floating point to optimize performance per Watt.
However, Deep Learning applications are hardly the only applications that expose the limitations of IEEE floating point. Cloud scale, IoT, embedded, control, and HPC applications are also limited by the inefficiencies of the format. As NVIDIA, Google, and Microsoft have demonstrated, a simple change to a new number system can improve scale and cost of these applications by orders of magnitude, and create completely new application and service domains.
When performance and/or power efficiency are differentiating attributes for an application, the complexity of IEEE floats simply can't compete with number systems that are tailored to the needs of the application. Posits are a tapered floating point format, designed to replace IEEE floating point and provide a more robust computational arithmetic for the reals. The Stillwater Universal Number library provides application developers a ready-to-use arithmetic library to incorporate this new number system in their applications. To get started, simply clone the library and follow the README.
The core limitations of IEEE floating point are caused by two key problems of the format:
The complete list of issues that are holding back IEEE floating point formats:
In contrast, the posit number system is designed to be efficient, symmetric, and mathematically correct in any concurrency environment. Avoiding any special cases, such as denormalized numbers, yields a more efficient execution pipeline and higher performance per Watt.
This library is a bit-level arithmetic reference implementation of the evolving Universal Number Type III (posit and valid) standard. The library provides a faithful posit arithmetic layer for any C/C++/Python environment.
As a reference library, there is extensive test infrastructure to validate the arithmetic, and there is a host of utilities to become familiar with the internal workings of posits and valids.
Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of" nested selector system.
Header-only C++ template library makes it trivial to integrate into your computational software. Many software packages have gone before you, Eigen, MTL4, G+SMO, ODE, so you are in good company.
The library models the arithmetic at the bit-level and is the validation vehicle for our posit-enabled tensor processor hardware.
The library provides a complete set of posit configurations, ranging from the very small, posit<2,0>, to the very large, posit<256,5>.
Subscribe to our newsletter to be the first to know about new articles, blog posts, white papers and webinars.
We never spam or share your data with third parties.