Inference · Serving

big names · picking up speed

+1543 ★/day↗accelerating

OmniRoute keeps your coding agents online by automatically failing over across 177 AI providers—including free tiers—when quotas run dry.

★ 31.1k TypeScript Inference · Serving · explained Feature

earendil-works/pi

+764 ★/day↗accelerating

Pi bundles a unified LLM API, agent runtime, and interactive coding CLI into a monorepo that treats every dependency update as a potential attack.

★ 78.1k TypeScript Coding Assistants · explained Feature

ggml-org/whisper.cpp

+63 ★/day↗accelerating

A minimal C/C++ port of OpenAI’s Whisper built to transcribe speech locally on phones, browsers, and underclocked POWER9 boxes.

★ 52.3k C++ Image · Video · Audio · explained

cheahjs/free-llm-api-resources

+120 ★/day↗accelerating

It catalogs legitimate services offering free API access to large language models, complete with rate limits, model lists, and data-privacy caveats.

★ 28.3k Python Learning · explained

Wei-Shaw/sub2api

+215 ★/day↗accelerating

Sub2API pools AI subscriptions behind a metered gateway so teams or resellers can distribute API quotas without building their own billing stack.

★ 34.5k Go LLMOps · Eval · explained

chatanywhere/GPT_API_free

+34 ★/day↗accelerating

A hosted proxy that offers free, rate-limited API access to GPT, DeepSeek, and others for Chinese users who'd rather not tunnel through a VPN.

★ 39.1k Inference · Serving · explained

decolua/9router

+136 ★/day↗accelerating

9Router is a local proxy that auto-switches your AI coding tools from paid to free providers and compresses token-heavy tool outputs so you stop hitting limits.

★ 23.7k JavaScript Coding Assistants · explained

unslothai/unsloth

+72 ★/day↗accelerating

It wraps local inference and fine-tuning for open models in a web UI, using custom kernels to squeeze more performance out of desktop GPUs than standard tooling.

★ 68.9k Python Inference · Serving · explained

google-ai-edge/mediapipe

+22 ★/day↗accelerating

It exists to let developers run customized vision, text, and audio machine learning across mobile, web, and edge hardware without cloud round-trips.

★ 36.3k C++ Computer Vision · explained

huggingface/transformers

+38 ★/day↗accelerating

It centralizes model definitions so the same architecture works across PyTorch, JAX, vLLM, and llama.cpp without rewrites.

★ 163k Python Language Models · explained

mozilla-ai/llamafile

+9.3 ★/day↗accelerating

Mozilla wraps llama.cpp and a full model into a single cross-platform executable using an obscure libc trick.

★ 25.5k C++ Inference · Serving · explained

BerriAI/litellm

+104 ★/day↗accelerating

Because swapping from GPT-4o to Claude shouldn't require rewriting your request plumbing.

★ 54.8k Python LLMOps · Eval · explained

microsoft/BitNet

+8.1 ★/day↗accelerating

Microsoft built an inference engine that lets a single CPU run a 100B-parameter model at human reading speed by using 1.58-bit weights.

★ 39.8k C++ Inference · Serving · explained

mudler/LocalAI

+32 ★/day↗accelerating

LocalAI wraps 36+ inference engines behind one OpenAI-compatible API and pulls them on demand, so you can run LLMs, vision, voice, and video on anything from a CPU to a Jetson.

★ 47.9k Go Inference · Serving · explained

danny-avila/LibreChat

+51 ★/day↗accelerating

LibreChat bundles every major LLM provider into a single self-hosted chat platform so teams don't have to choose—or leak data.

★ 41.3k TypeScript Chat Assistants · explained

QuantumNous/new-api

+110 ★/day↗accelerating

Because juggling native APIs from a dozen LLM vendors, each with its own auth and billing, is a recipe for migraines.

★ 43.5k TypeScript Inference · Serving · explained

deepseek-ai/DeepSeek-OCR

+11 ★/day↗accelerating

An OCR model that asks how few vision tokens an LLM needs before it can no longer read the page.

★ 23.7k Python Inference · Serving · explained

deepseek-ai/DeepSeek-V3

+10 ★/day↗accelerating

DeepSeek-V3 exists to prove that a 671-billion-parameter model can train end-to-end without a single rollback, activate only 37B parameters per token, and still match leading closed-source systems.

★ 104k Python Language Models · explained

SYSTRAN/faster-whisper

+25 ★/day↗accelerating

A reimplementation of OpenAI's Whisper that trades the original inference engine for CTranslate2 and gains up to 4× speed without sacrificing accuracy.

★ 24.6k Python Inference · Serving · explained

karpathy/llm.c

+9.1 ★/day↗accelerating

Because training a transformer shouldn't require 245MB of PyTorch just to multiply matrices.

★ 30.6k Cuda Language Models · explained

loading more…