Inference · Serving

big names on the move

+1543 ★/day↗accelerating

OmniRoute keeps your coding agents online by automatically failing over across 177 AI providers—including free tiers—when quotas run dry.

★ 31.1k TypeScript Inference · Serving · explained Feature

earendil-works/pi

+764 ★/day↗accelerating

Pi bundles a unified LLM API, agent runtime, and interactive coding CLI into a monorepo that treats every dependency update as a potential attack.

★ 78.1k TypeScript Coding Assistants · explained Feature

Wei-Shaw/sub2api

+215 ★/day↗accelerating

Sub2API pools AI subscriptions behind a metered gateway so teams or resellers can distribute API quotas without building their own billing stack.

★ 34.5k Go LLMOps · Eval · explained

router-for-me/CLIProxyAPI

+200 ★/day↘cooling

A Go proxy that handles OAuth login for Claude Code and Codex so you can call them through standard API clients.

★ 45.1k Go Inference · Serving · explained Feature

decolua/9router

+136 ★/day↗accelerating

9Router is a local proxy that auto-switches your AI coding tools from paid to free providers and compresses token-heavy tool outputs so you stop hitting limits.

★ 23.7k JavaScript Coding Assistants · explained

Comfy-Org/ComfyUI

+134 ★/day↘cooling

It exists because clicking 'generate' isn't enough when you need to control every model, parameter, and preprocessing step.

★ 122.4k Python Image · Video · Audio · explained

cheahjs/free-llm-api-resources

+120 ★/day↗accelerating

It catalogs legitimate services offering free API access to large language models, complete with rate limits, model lists, and data-privacy caveats.

★ 28.3k Python Learning · explained

QuantumNous/new-api

+110 ★/day↗accelerating

Because juggling native APIs from a dozen LLM vendors, each with its own auth and billing, is a recipe for migraines.

★ 43.5k TypeScript Inference · Serving · explained

BerriAI/litellm

+104 ★/day↗accelerating

Because swapping from GPT-4o to Claude shouldn't require rewriting your request plumbing.

★ 54.8k Python LLMOps · Eval · explained

ggml-org/llama.cpp

+102 ★/day↘cooling

It exists to run large language models on virtually any hardware—from Apple Silicon to RISC-V to your browser—with zero external dependencies and minimal setup.

★ 121.7k C++ Inference · Serving · explained

vllm-project/vllm

+84 ★/day↗accelerating

vLLM is an open-source inference engine that pages attention key-value memory like an operating system to drive higher GPU throughput, then exposes it through an OpenAI-compatible API.

★ 87.2k Python Inference · Serving · explained

unslothai/unsloth

+72 ★/day↗accelerating

It wraps local inference and fine-tuning for open models in a web UI, using custom kernels to squeeze more performance out of desktop GPUs than standard tooling.

★ 68.9k Python Inference · Serving · explained

ollama/ollama

+68 ★/day↗accelerating

It exists so you can download, run, and chat with open-weight LLMs locally through one CLI and REST API, keeping inference on your own silicon.

★ 176.9k Go Inference · Serving · explained

OpenBMB/VoxCPM

+65 ★/day↘cooling

VoxCPM2 proves TTS doesn't need discrete tokens: a 2B-parameter diffusion model generates continuous 48kHz speech for 30 languages and text-prompted voice cloning.

★ 34.2k Python Image · Video · Audio · explained

ggml-org/whisper.cpp

+63 ★/day↗accelerating

A minimal C/C++ port of OpenAI’s Whisper built to transcribe speech locally on phones, browsers, and underclocked POWER9 boxes.

★ 52.3k C++ Image · Video · Audio · explained

lyogavin/airllm

+62 ★/day↘cooling

AirLLM slices giant transformers into layer shards so they fit in consumer VRAM without quantization or distillation.

★ 24.1k Jupyter Notebook Inference · Serving · explained

danny-avila/LibreChat

+51 ★/day↗accelerating

LibreChat bundles every major LLM provider into a single self-hosted chat platform so teams don't have to choose—or leak data.

★ 41.3k TypeScript Chat Assistants · explained

SillyTavern/SillyTavern

+39 ★/day↘cooling

SillyTavern gives AI hobbyists a single, local interface to command text generators, image engines, and voice models across dozens of backends without losing control of their prompts or data.

★ 31.2k JavaScript Chat Assistants · explained

huggingface/transformers

+38 ★/day↗accelerating

It centralizes model definitions so the same architecture works across PyTorch, JAX, vLLM, and llama.cpp without rewrites.

★ 163k Python Language Models · explained

OpenBB-finance/OpenBB

+38 ★/day↘cooling

OpenBB normalizes proprietary and public financial data so engineers can feed the same sources to Python scripts, REST APIs, Excel, and AI agents without rebuilding integrations.

★ 71k Python Domain Apps · explained

loading more…