Inference · Serving

Inference · Serving

heavyweights · velocity + momentum
01
pewdiepie-archdaemon/odysseus
+8238 ★/daysteady

A self-hosted AI workspace that bolts chat, agents, email triage, calendars, and deep research onto your own hardware.

61.7k Python Agents · explained
02
antirez/ds4
+407 ★/daysteady

A deliberately narrow inference engine that treats your SSD as first-class KV cache real estate.

13.2k C Inference · Serving · explained
03
karpathy/nanochat
+230 ★/daysteady

Karpathy's minimal LLM training harness turns a $43K 2019 training run into a sub-$100 afternoon project.

54.7k Python Language Models · explained
04
earendil-works/pi
+201 ★/daysteady

The pi project bundles a CLI coding agent with an unusually paranoid approach to supply-chain security.

60.7k TypeScript Coding Assistants · explained
06
deepseek-ai/DeepSeek-V3
+196 ★/daysteady

A massive Mixture-of-Experts model that trains cheap and runs lean by keeping most of its weights asleep.

103.7k Python Language Models · explained
07
Sophomoresty/gemini-web2api
+147 ★/daysteady

Because Google's free web chat doesn't have an official API, so someone built the unofficial one by reverse-engineering its private protocol.

1.6k Python Inference · Serving · explained
08
RightNow-AI/openfang
+172 ★/daysteady

OpenFang ships autonomous "Hands"—pre-built agents that research, monitor, and publish on schedules, not chat prompts.

17.8k Rust Agents · explained
09
Wei-Shaw/sub2api
+151 ★/daysteady

An open-source gateway for splitting AI subscriptions across teams without breaking native tools.

26k Go LLMOps · Eval · explained
10
ollama/ollama
+161 ★/daysteady

Ollama wraps llama.cpp in a one-line installer and a model registry so you can run open weights without reading a dozen READMEs.

173.5k Go Inference · Serving · explained
11
jundot/omlx
+141 ★/daysteady

oMLX brings vLLM-style continuous batching and tiered KV caching to Apple Silicon, controlled from a native Swift menubar app.

16.2k Python Inference · Serving · explained
12
opensquilla/opensquilla
+107 ★/daysteady

OpenSquilla routes each turn to the cheapest capable LLM, keeping persistent memory and tool use identical across CLI, Web UI, and chat channels.

3.5k Python Agents · explained
13
TencentCloud/CubeSandbox
+105 ★/daysteady

A Rust-based sandbox service that swaps container speed for real kernel isolation while keeping the same Python SDK.

6.2k Rust Agents · explained
14
decolua/9router
+109 ★/daysteady

Local proxy that auto-falls back to free models when your paid quota dies mid-session.

16.8k JavaScript Coding Assistants · explained
16
RyanCodrai/turbovec
+98 ★/daysteady

A Rust vector index that squeezes 31 GB of float32 embeddings into 4 GB without a training phase, then outruns FAISS on the query.

7.2k Python RAG · Search · explained
17
router-for-me/CLIProxyAPI
+107 ★/daysteady

A Go proxy that exposes Gemini CLI, Claude Code, Codex, and Grok through standard OpenAI-compatible APIs—no API keys required, just your existing OAuth logins.

36.4k Go Inference · Serving · explained
18
OpenBMB/VoxCPM
+104 ★/daysteady

VoxCPM2 generates speech directly from text using continuous diffusion, no discrete audio tokens required.

27.5k Python Image · Video · Audio · explained
19
deepseek-ai/DeepSeek-OCR
+99 ★/daysteady

An LLM-centric vision encoder that squeezes documents into surprisingly few tokens, then lets the language model do the actual reading.

23.3k Python Inference · Serving · explained
20
ggml-org/llama.cpp
+97 ★/daysteady

A dependency-free C/C++ inference engine that squeezes large language models onto laptops, phones, and browsers through aggressive quantization and hand-rolled kernels.

115.4k C++ Inference · Serving · explained
loading more…

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.