LeanModels/DFloat11
Lossless compression framework that reduces LLM and diffusion model size by ~30% while preserving identical outputs.

DFloat11 is a lossless compression framework targeting LLMs and diffusion models for efficient GPU inference. It reduces model size by approximately 30% through specialized compression algorithms while guaranteeing bit-for-bit identical outputs to the original uncompressed model. The framework supports popular models including FLUX.1, Qwen3-8B, Wan2.1, and HiDream-I1, and includes CPU offloading capabilities to enable inference on resource-constrained hardware with reduced VRAM requirements.