Inference · Serving — the hottest AI repositories on heatdrop

Newcomers Heavyweights

Hottest Accelerating

Inference · Serving

heavyweights · velocity + momentum

pewdiepie-archdaemon/odysseus

+8238 ★/day→steady

A self-hosted AI workspace that bolts chat, agents, email triage, calendars, and deep research onto your own hardware.

★ 61.7k Python Agents · explained

antirez/ds4

+407 ★/day→steady

A deliberately narrow inference engine that treats your SSD as first-class KV cache real estate.

★ 13.2k C Inference · Serving · explained

karpathy/nanochat

+230 ★/day→steady

Karpathy's minimal LLM training harness turns a $43K 2019 training run into a sub-$100 afternoon project.

★ 54.7k Python Language Models · explained

earendil-works/pi

+201 ★/day→steady

The pi project bundles a CLI coding agent with an unusually paranoid approach to supply-chain security.

★ 60.7k TypeScript Coding Assistants · explained

tashfeenahmed/freellmapi

+180 ★/day→steady

A local proxy that turns sixteen scattered LLM free tiers into one OpenAI-compatible endpoint with automatic failover.

★ 8.5k TypeScript Inference · Serving · explained

deepseek-ai/DeepSeek-V3

+196 ★/day→steady

A massive Mixture-of-Experts model that trains cheap and runs lean by keeping most of its weights asleep.

★ 103.7k Python Language Models · explained

Sophomoresty/gemini-web2api

+147 ★/day→steady

Because Google's free web chat doesn't have an official API, so someone built the unofficial one by reverse-engineering its private protocol.

★ 1.6k Python Inference · Serving · explained

RightNow-AI/openfang

+172 ★/day→steady

OpenFang ships autonomous "Hands"—pre-built agents that research, monitor, and publish on schedules, not chat prompts.

★ 17.8k Rust Agents · explained

Wei-Shaw/sub2api

+151 ★/day→steady

An open-source gateway for splitting AI subscriptions across teams without breaking native tools.

★ 26k Go LLMOps · Eval · explained

ollama/ollama

+161 ★/day→steady

Ollama wraps llama.cpp in a one-line installer and a model registry so you can run open weights without reading a dozen READMEs.

★ 173.5k Go Inference · Serving · explained

jundot/omlx

+141 ★/day→steady

oMLX brings vLLM-style continuous batching and tiered KV caching to Apple Silicon, controlled from a native Swift menubar app.

★ 16.2k Python Inference · Serving · explained

opensquilla/opensquilla

+107 ★/day→steady

OpenSquilla routes each turn to the cheapest capable LLM, keeping persistent memory and tool use identical across CLI, Web UI, and chat channels.

★ 3.5k Python Agents · explained

TencentCloud/CubeSandbox

+105 ★/day→steady

A Rust-based sandbox service that swaps container speed for real kernel isolation while keeping the same Python SDK.

★ 6.2k Rust Agents · explained

decolua/9router

+109 ★/day→steady

Local proxy that auto-falls back to free models when your paid quota dies mid-session.

★ 16.8k JavaScript Coding Assistants · explained

AUTOMATIC1111/stable-diffusion-webui

+118 ★/day→steady

A Gradio-based web UI that crams every community trick for image generation into one browser tab.

★ 163.5k Python Image · Video · Audio · explained

RyanCodrai/turbovec

+98 ★/day→steady

A Rust vector index that squeezes 31 GB of float32 embeddings into 4 GB without a training phase, then outruns FAISS on the query.

★ 7.2k Python RAG · Search · explained

router-for-me/CLIProxyAPI

+107 ★/day→steady

A Go proxy that exposes Gemini CLI, Claude Code, Codex, and Grok through standard OpenAI-compatible APIs—no API keys required, just your existing OAuth logins.

★ 36.4k Go Inference · Serving · explained

OpenBMB/VoxCPM

+104 ★/day→steady

VoxCPM2 generates speech directly from text using continuous diffusion, no discrete audio tokens required.

★ 27.5k Python Image · Video · Audio · explained

deepseek-ai/DeepSeek-OCR

+99 ★/day→steady

An LLM-centric vision encoder that squeezes documents into surprisingly few tokens, then lets the language model do the actual reading.

★ 23.3k Python Inference · Serving · explained

ggml-org/llama.cpp

+97 ★/day→steady

A dependency-free C/C++ inference engine that squeezes large language models onto laptops, phones, and browsers through aggressive quantization and hand-rolled kernels.

★ 115.4k C++ Inference · Serving · explained

loading more…