zml/zml
A multi-hardware AI inference stack built with Zig, MLIR, and OpenXLA that targets NVIDIA, AMD, TPU, and Trainium accelerators.

Velocity · 7d
+5.7
★ / day
Trend
→steady
star history
ZML is a production inference system that decouples AI workloads from proprietary hardware by compiling models directly to target accelerators with peak performance. It supports running various LLMs including Llama 3.1/3.2, Qwen 3.5, and LFM 2.5, and is built using the Zig language, MLIR compiler infrastructure, and Bazel build system.