ashvardanian/NumKong
SIMD-accelerated numerical library with 2000+ kernels for mixed-precision linear algebra, vector distances, and tensor operations across CPU architectures.

NumKong provides high-performance SIMD kernels for dot products, matrix operations, and distance computations across 16 numeric types from 4-bit integers to 128-bit complex numbers. It uses wider accumulator promotion to prevent overflow and is explicitly designed for vector search and information retrieval workloads. The library includes tensor abstractions and bindings for Python, Rust, Go, JavaScript, Swift, and C++, competing with BLAS implementations like OpenBLAS and libraries like NumPy and PyTorch.