← all repositories
tashfeenahmed/freellmapi

The 1.7-billion-token free tier nobody knew they had

A local proxy that turns sixteen scattered LLM free tiers into one OpenAI-compatible endpoint with automatic failover.

8.5k stars TypeScript Inference · Serving
freellmapi
Velocity · 7d
+179
★ / day
Trend
steady
star history

What it does

FreeLLMAPI is a self-hosted proxy that sits between your OpenAI SDK client and sixteen different LLM providers. You paste in your free-tier API keys, it encrypts them in SQLite, then exposes a single POST /v1/chat/completions endpoint. The router tracks per-key rate limits (RPM, TPM, daily caps) and automatically falls through your ordered provider chain when one hits a wall or returns an error. It also implements the newer /v1/responses wire format so current Codex CLI versions work without fuss.

The interesting bit

The sticky-session logic is the quiet win: multi-turn conversations lock to the same model for 30 minutes, which avoids the hallucination jump you get when a router silently swaps from Gemini to Llama mid-chat. The per-key rate tracking is granular enough that it counts tokens per day, not just per minute, which matters when free tiers often have monthly or daily caps that are easy to blow through with a single long conversation.

Key highlights

  • Sixteen providers, one key — Google, Groq, Cerebras, SambaNova, Mistral, OpenRouter, GitHub Models, Cloudflare, Cohere, Z.ai, NVIDIA, HuggingFace, Ollama Cloud, Kilo, Pollinations, LLM7, plus any custom OpenAI-compatible endpoint.
  • Encrypted at rest — AES-256-GCM for provider keys; decryption only happens in-memory right before a request.
  • Health checks and cooldowns — Dead or rate-limited keys are automatically skipped; up to 20 retry attempts across the fallback chain.
  • Single-user by design — Scrypt-hashed admin login for the dashboard, separate unified bearer token for the API. No multi-tenant complexity.
  • Runs on a Pi — ~40 MB RSS at idle, multi-arch Docker images (amd64 + arm64), or plain Node 20+.

Caveats

  • Narrow scope, deliberately — No embeddings, image generation, audio, legacy completions, moderation, or n > 1. The README is explicit: “assume it isn’t there yet” if not listed.
  • Single-user only — No per-user billing or multi-tenant auth. Exposing it beyond localhost requires trusting your LAN, since the proxy is guarded only by the unified API key.
  • “Personal experimentation only” — The repository description itself carries this disclaimer, and the README includes a full Terms of Service review section worth reading before you lean hard on any provider.

Verdict

Worth it if you’re a solo developer or hobbyist who wants to experiment across many models without managing sixteen dashboards or accidentally burning through a surprise paid tier. Skip it if you need embeddings, multi-user setups, or anything resembling a production SLA — this is a personal router, not infrastructure.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.