← all repositories

brontoguana/krasis

A Rust/CUDA-based LLM runtime for efficiently running multi-hundred-billion-parameter MoE models on consumer NVIDIA GPUs.

krasis
Velocity · 7d
+4.0
★ / day
Trend
steady
star history

Krasis is an inference engine that executes large language models on consumer-grade hardware by managing expert residency between VRAM and CPU RAM. It moves performance-critical paths from Python to Rust and CUDA, providing GPU-accelerated prefill and decode, HQQ attention caching, and compact KV cache formats like k6v6 and k4v4. The project includes an OpenAI-compatible API, interactive launcher, benchmark tooling, and GitHub-release based installation.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.