Inference · Serving

underdogs breaking out

+202% /wk +392 ★/day→steady

The project squeezes a 28.9-million-parameter language model onto an ESP32-S3 microcontroller by keeping most of its weights in slow flash memory and reading only what each token needs.

★ 1.4k Python Language Models · explained

lidge-jun/opencodex

+88% /wk +630 ★/day↗accelerating

It breaks the vendor lock on Codex and Claude Code by translating their API calls to any LLM backend you choose.

★ 5k TypeScript Coding Assistants · explained

PrismML-Eng/Bonsai-demo

+58% /wk +169 ★/day↗accelerating

A demo repo for running extreme-quantized language models locally without needing a research cluster.

★ 2.1k Shell Inference · Serving · explained

EvanZhouDev/openai-oauth

+53% /wk +74 ★/day↗accelerating

An unofficial proxy that borrows your ChatGPT/Codex OAuth tokens to serve a local OpenAI-compatible API, bypassing API credit billing.

★ 969 TypeScript Inference · Serving · explained

unicity-aos/aos-ce

+50% /wk +526 ★/day→steady

AOS Community Edition is a Rust-based agent operating system that lets agents inspect the runtime, spot missing capabilities, and forge their own least-privilege extensions.

★ 7.4k Rust Agents · explained

seakee/CPA-Manager-Plus

+37% /wk +119 ★/day↗accelerating

It exists to stop your AI gateway from quietly burning through quotas, cash, and expired OAuth tokens without leaving a paper trail.

★ 2.2k TypeScript LLMOps · Eval · explained

AtomicBot-ai/atomic-agent

+30% /wk +45 ★/day↗accelerating

It keeps the entire agent loop—prompts, tool calls, browser state, and memory—on your laptop so you don't have to rent a control plane in the cloud.

★ 1.1k TypeScript Agents · explained

routatic/proxy

+21% /wk +27 ★/day↗accelerating

A Go proxy that tricks Claude Code into using $5/month open models through OpenCode instead of Anthropic's API.

★ 892 Go Coding Assistants · explained

espressif/esp-claw

+18% /wk +48 ★/day↗accelerating

Espressif's C framework turns cheap microcontrollers into edge AI agents you program through IM chat.

★ 1.9k C Agents · explained

astaxie/TokenHub

+15% /wk +15 ★/day→steady

TokenHub exists because dropping a shared OpenAI key into a Slack channel does not scale past one invoice and zero accountability.

★ 696 Go Inference · Serving · explained

hero8152/Infinite-Canvas

+14% /wk +48 ★/day↗accelerating

One desktop UI that wires together ComfyUI, OpenAI, Gemini, ModelScope, and a dozen other generative APIs—plus some very opinionated legal terms.

★ 2.4k Python App Builders · explained

RyanCodrai/turbovec

+13% /wk +259 ★/day↗accelerating

turbovec exists so you can index embeddings immediately—no training, no tuning, no rebuilds—and search them faster than FAISS in a fraction of the RAM.

★ 14.4k Python RAG · Search · explained Feature

YGYOOO/WorldX

+12% /wk +21 ★/day↗accelerating

WorldX turns one sentence into a self-running simulation of AI agents who gossip, scheme, and remember grudges without a script.

★ 1.2k TypeScript Agents · explained

Osmantic/ODS

+12% /wk +62 ★/day↘cooling

Dream Server exists because most people would rather pay OpenAI than spend a weekend hand-wiring Docker configs for local LLMs, RAG, and image generation.

★ 3.7k Python Inference · Serving · explained

lidge-jun/ima2-gen

+11% /wk +9.9 ★/day↗accelerating

It exists because cloud image generators deserve a local memory layer, a branching canvas, and a UI outside the chat thread.

★ 614 TypeScript Image · Video · Audio · explained

techjarves/Uncensored-Local-Studio

+11% /wk +12 ★/day→steady

It unifies Stable Diffusion, GGUF chat, Whisper, and Kokoro TTS into a single offline desktop GUI so you can skip cloud APIs, subscriptions, and censorship filters.

★ 734 JavaScript Inference · Serving · explained

inference-labs-inc/subnet-2

+10% /wk +34 ★/day↗accelerating

A Bittensor subnet that uses zero-knowledge proofs to verify miners actually ran the AI models they claim to.

★ 2.4k Rust Inference · Serving · explained

verl-project/verl-omni

+10% /wk +9.6 ★/day↗accelerating

It split off from `verl` to give diffusion, video, and omni-modality models an RL post-training framework that doesn't treat them like chatbots.

★ 661 Python ML Frameworks · explained

Tencent/AngelSlim

+9.3% /wk +20 ★/day↗accelerating

AngelSlim integrates quantization, speculative decoding, and distillation so you can shrink and serve massive models from a single toolkit.

★ 1.5k Python Inference · Serving · explained

AtomicBot-ai/Atomic-Chat

+9.2% /wk +15 ★/day↗accelerating

It turns your local machine into an OpenAI-compatible inference endpoint so agents and IDEs can run on offline models without reconfiguration.

★ 1.2k TypeScript Inference · Serving · explained

loading more…