← all repositories
AlexsJones/llmfit

The LLM matchmaker your GPU never knew it needed

A Rust TUI that scores hundreds of models against your actual hardware so you stop downloading 70B weights onto a laptop.

27.6k stars Rust LLMOps · Eval
llmfit
Velocity · 7d
+245
★ / day
Trend
steady
star history

What it does

llmfit is a terminal tool that detects your RAM, CPU, and GPU, then scores hundreds of LLMs across quality, speed, memory fit, and context length. It tells you which quantization will run, estimates tok/s, and can even download the model. Think of it as a dating app for your GPU — swipe left on the 70B parameter models your laptop cannot handle.

The interesting bit

The project bridges synthetic estimates with reality through a community leaderboard (press b) powered by localmaxxing.com. You can browse actual tok/s and TTFT numbers from users with the same GPU, or simulate owning an RTX 5090 before you buy one. There is also a “Plan mode” (p) that inverts the logic: tell it a model and it estimates what hardware you would need.

Key highlights

  • Interactive TUI with Vim-style navigation (Normal / Visual / Select modes) plus a classic CLI for scripting
  • Supports Ollama, llama.cpp, MLX, Docker Model Runner, and LM Studio backends
  • Dynamic quantization selection, multi-GPU support, and MoE architecture handling
  • Hardware simulation mode (S) to override RAM/VRAM/CPU and preview fit
  • Download manager (D) with history, deletion, and configurable directories
  • 27+ hardware presets from Apple M1 to RTX 5090 for comparison shopping

Caveats

  • The README is enthusiastic about features but light on how scoring weights are actually derived; the “quality” dimension is vague
  • Community leaderboard depends on third-party data submission, so coverage may be spotty for niche hardware

Verdict

Anyone running local LLMs on consumer hardware should try this before their next git clone of a 40GB model. Cloud-only users or those already happy with a single Ollama setup can skip it.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.