microsoft/aici
A Rust-based framework for building WebAssembly controllers that constrain and direct LLM token generation in real-time.

AICI provides an interface for creating Controllers that modify LLM output during token-by-token decoding. Controllers run as lightweight Wasm modules on the same machine as the inference engine, allowing efficient integration without GPU overhead. The framework abstracts LLM inference details to simplify Controller development and provides portability across multiple LLM serving engines including llama.cpp, HuggingFace Transformers, and vLLM.