Inference · Serving

Inference · Serving

newcomers · velocity + momentum
01
pewdiepie-archdaemon/odysseus
+4132 ★/daycooling

A self-hosted AI workspace that bolts chat, agents, email triage, calendars, and deep research onto your own hardware.

67.1k Python Agents · explained Feature
02
RyanCodrai/turbovec
+938 ★/dayaccelerating

A Rust vector index that squeezes 31 GB of float32 embeddings into 4 GB without a training phase, then outruns FAISS on the query.

10.8k Python RAG · Search · explained
03
OpenBMB/VoxCPM
+404 ★/dayaccelerating

VoxCPM2 generates speech directly from text using continuous diffusion, no discrete audio tokens required.

28.3k Python Image · Video · Audio · explained
04
tashfeenahmed/freellmapi
+315 ★/dayaccelerating

A local proxy that turns sixteen scattered LLM free tiers into one OpenAI-compatible endpoint with automatic failover.

9.5k TypeScript Inference · Serving · explained
05
Andyyyy64/whichllm
+261 ★/dayaccelerating

whichllm ranks local models by real benchmark scores, not parameter count, and tells you which ones actually fit your hardware.

4.4k Python Inference · Serving · explained
06
earendil-works/pi
+330 ★/dayaccelerating

The pi project bundles a CLI coding agent with an unusually paranoid approach to supply-chain security.

61.6k TypeScript Coding Assistants · explained
07
Wei-Shaw/sub2api
+275 ★/dayaccelerating

An open-source gateway for splitting AI subscriptions across teams without breaking native tools.

27k Go LLMOps · Eval · explained
08
opensquilla/opensquilla
+171 ★/dayaccelerating

OpenSquilla routes each turn to the cheapest capable LLM, keeping persistent memory and tool use identical across CLI, Web UI, and chat channels.

3.8k Python Agents · explained
09
ggml-org/llama.cpp
+222 ★/dayaccelerating

A dependency-free C/C++ inference engine that squeezes large language models onto laptops, phones, and browsers through aggressive quantization and hand-rolled kernels.

116k C++ Inference · Serving · explained
10
QuantumNous/new-api
+200 ★/dayaccelerating

A Go-based gateway that cross-converts between OpenAI, Claude, and Gemini formats so you don't have to pick sides in the API format wars.

38.2k Go Inference · Serving · explained
11
NVIDIA/cosmos
+172 ★/dayaccelerating

Cosmos 3 tries to unify video generation, robot action prediction, and physical reasoning inside a single 16B–64B Mixture-of-Transformers architecture.

9.8k Jupyter Notebook Image · Video · Audio · explained
12
maziyarpanahi/openmed
+142 ★/dayaccelerating

OpenMed packages clinical entity extraction and HIPAA-grade de-identification into models small enough for Apple Silicon and impatient DevOps teams.

2.3k Python Domain Apps · explained
13
decolua/9router
+163 ★/dayaccelerating

Local proxy that auto-falls back to free models when your paid quota dies mid-session.

17.2k JavaScript Coding Assistants · explained
14
router-for-me/CLIProxyAPI
+172 ★/dayaccelerating

A Go proxy that exposes Gemini CLI, Claude Code, Codex, and Grok through standard OpenAI-compatible APIs—no API keys required, just your existing OAuth logins.

37.1k Go Inference · Serving · explained
15
lyogavin/airllm
+162 ★/dayaccelerating

AirLLM slices giant transformers into layer shards so they fit in consumer VRAM without quantization or distillation.

19.8k Jupyter Notebook Inference · Serving · explained
16
Comfy-Org/ComfyUI
+143 ★/dayaccelerating

A visual programming interface for image, video, 3D, and audio generation that treats model pipelines as composable graphs.

116.5k Python Image · Video · Audio · explained
17
danny-avila/LibreChat
+121 ★/dayaccelerating

Self-hosted chat UI that unifies OpenAI, Anthropic, Google, AWS, and two dozen other providers under one roof.

38.8k TypeScript Chat Assistants · explained
18
BerriAI/litellm
+115 ★/dayaccelerating

LiteLLM is the adapter layer that stops your codebase from fracturing across a dozen provider SDKs.

50k Python LLMOps · Eval · explained
19
modelscope/FunASR
+104 ★/dayaccelerating

A Chinese speech toolkit that bundles ASR, diarization, emotion detection, and streaming into one MIT-licensed package.

17.7k Python Image · Video · Audio · explained
20
ollama/ollama
+114 ★/daycooling

Ollama wraps llama.cpp in a one-line installer and a model registry so you can run open weights without reading a dozen READMEs.

173.8k Go Inference · Serving · explained
loading more…

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.