The LLM matchmaker your GPU never knew it needed
A Rust TUI that scores hundreds of models against your actual hardware so you stop downloading 70B weights onto a laptop.

What it does
llmfit is a terminal tool that detects your RAM, CPU, and GPU, then scores hundreds of LLMs across quality, speed, memory fit, and context length. It tells you which quantization will run, estimates tok/s, and can even download the model. Think of it as a dating app for your GPU — swipe left on the 70B parameter models your laptop cannot handle.
The interesting bit
The project bridges synthetic estimates with reality through a community leaderboard (press b) powered by localmaxxing.com. You can browse actual tok/s and TTFT numbers from users with the same GPU, or simulate owning an RTX 5090 before you buy one. There is also a “Plan mode” (p) that inverts the logic: tell it a model and it estimates what hardware you would need.
Key highlights
- Interactive TUI with Vim-style navigation (Normal / Visual / Select modes) plus a classic CLI for scripting
- Supports Ollama, llama.cpp, MLX, Docker Model Runner, and LM Studio backends
- Dynamic quantization selection, multi-GPU support, and MoE architecture handling
- Hardware simulation mode (
S) to override RAM/VRAM/CPU and preview fit - Download manager (
D) with history, deletion, and configurable directories - 27+ hardware presets from Apple M1 to RTX 5090 for comparison shopping
Caveats
- The README is enthusiastic about features but light on how scoring weights are actually derived; the “quality” dimension is vague
- Community leaderboard depends on third-party data submission, so coverage may be spotty for niche hardware
Verdict
Anyone running local LLMs on consumer hardware should try this before their next git clone of a 40GB model. Cloud-only users or those already happy with a single Ollama setup can skip it.