Inference · Serving

underdogs · picking up speed

+57% /wk +168 ★/day↗accelerating

A demo repo for running extreme-quantized language models locally without needing a research cluster.

★ 2k Shell Inference · Serving · explained

+50% /wk +64 ★/day↗accelerating

An unofficial proxy that borrows your ChatGPT/Codex OAuth tokens to serve a local OpenAI-compatible API, bypassing API credit billing.

★ 902 TypeScript Inference · Serving · explained

seakee/CPA-Manager-Plus

+36% /wk +112 ★/day↗accelerating

It exists to stop your AI gateway from quietly burning through quotas, cash, and expired OAuth tokens without leaving a paper trail.

★ 2.2k TypeScript LLMOps · Eval · explained

AtomicBot-ai/atomic-agent

+32% /wk +47 ★/day↗accelerating

It keeps the entire agent loop—prompts, tool calls, browser state, and memory—on your laptop so you don't have to rent a control plane in the cloud.

★ 1k TypeScript Agents · explained

routatic/proxy

+21% /wk +27 ★/day↗accelerating

A Go proxy that tricks Claude Code into using $5/month open models through OpenCode instead of Anthropic's API.

★ 892 Go Coding Assistants · explained

Lynpoint/CyberVerse

+18% /wk +38 ★/day↗accelerating

A self-hosted framework for building real-time voice-first AI agents that persist memory, delegate long tasks to background sub-agents, and optionally show up as lip-synced digital humans.

★ 1.5k Python Agents · explained

espressif/esp-claw

+18% /wk +48 ★/day↗accelerating

Espressif's C framework turns cheap microcontrollers into edge AI agents you program through IM chat.

★ 1.9k C Agents · explained

hero8152/Infinite-Canvas

+14% /wk +48 ★/day↗accelerating

One desktop UI that wires together ComfyUI, OpenAI, Gemini, ModelScope, and a dozen other generative APIs—plus some very opinionated legal terms.

★ 2.4k Python App Builders · explained

verl-project/verl-omni

+14% /wk +13 ★/day↗accelerating

It split off from `verl` to give diffusion, video, and omni-modality models an RL post-training framework that doesn't treat them like chatbots.

★ 651 Python ML Frameworks · explained

RyanCodrai/turbovec

+12% /wk +242 ★/day↗accelerating

turbovec exists so you can index embeddings immediately—no training, no tuning, no rebuilds—and search them faster than FAISS in a fraction of the RAM.

★ 14.3k Python RAG · Search · explained Feature

inference-labs-inc/subnet-2

+10% /wk +34 ★/day↗accelerating

A Bittensor subnet that uses zero-knowledge proofs to verify miners actually ran the AI models they claim to.

★ 2.4k Rust Inference · Serving · explained

Tencent/AngelSlim

+9.2% /wk +20 ★/day↗accelerating

AngelSlim integrates quantization, speculative decoding, and distillation so you can shrink and serve massive models from a single toolkit.

★ 1.5k Python Inference · Serving · explained

kvcache-ai/ktransformers

+8.7% /wk +237 ★/day↗accelerating

KTransformers makes CPU-GPU heterogeneous inference and fine-tuning for massive MoE models almost practical on consumer hardware.

★ 19k Python Inference · Serving · explained

AtomicBot-ai/Atomic-Chat

+8.5% /wk +14 ★/day↗accelerating

It turns your local machine into an OpenAI-compatible inference endpoint so agents and IDEs can run on offline models without reconfiguration.

★ 1.2k TypeScript Inference · Serving · explained

lidge-jun/ima2-gen

+12% /wk +10 ★/day↗accelerating

It exists because cloud image generators deserve a local memory layer, a branching canvas, and a UI outside the chat thread.

★ 603 TypeScript Image · Video · Audio · explained

xLLM-AI/xllm

+8.1% /wk +17 ★/day↗accelerating

xLLM is a C++ inference framework specifically optimized for Chinese AI accelerators, and it already powers JD.com’s core retail production workloads.

★ 1.5k C++ Inference · Serving · explained

tuya/TuyaOpen

+7.7% /wk +19 ★/day↗accelerating

A C/C++ SDK that bundles speech recognition, multimodal AI, and cloud LLM plumbing so Wi-Fi modules and MCUs can behave like smart agents.

★ 1.8k C Agents · explained

openlake-project/openlake

+14% /wk +45 ★/day↗accelerating

OpenLake wants storage to bypass the host entirely and land straight in GPU memory.

★ 2.3k Rust Inference · Serving · explained

Bytez-com/docs

+5.4% /wk +18 ★/day↗accelerating

Bytez wraps 175,000+ AI models behind a single endpoint so you don't have to host them yourself.

★ 2.3k TypeScript Inference · Serving · explained

wildminder/awesome-ltx2

+5.9% /wk +4.7 ★/day↗accelerating

Because finding the right LTX-2 checkpoint, quantization, or LoRA across Hugging Face and ComfyUI nodes is a part-time job.

★ 555 Image · Video · Audio · explained

loading more…