Inference · Serving

Inference · Serving

newcomers · gaining speed
01
RyanCodrai/turbovec
+938 ★/dayaccelerating

A Rust vector index that squeezes 31 GB of float32 embeddings into 4 GB without a training phase, then outruns FAISS on the query.

10.8k Python RAG · Search · explained
02
Andyyyy64/whichllm
+261 ★/dayaccelerating

whichllm ranks local models by real benchmark scores, not parameter count, and tells you which ones actually fit your hardware.

4.4k Python Inference · Serving · explained
03
OpenBMB/VoxCPM
+404 ★/dayaccelerating

VoxCPM2 generates speech directly from text using continuous diffusion, no discrete audio tokens required.

28.3k Python Image · Video · Audio · explained
04
maziyarpanahi/openmed
+142 ★/dayaccelerating

OpenMed packages clinical entity extraction and HIPAA-grade de-identification into models small enough for Apple Silicon and impatient DevOps teams.

2.3k Python Domain Apps · explained
05
NVIDIA/cosmos
+172 ★/dayaccelerating

Cosmos 3 tries to unify video generation, robot action prediction, and physical reasoning inside a single 16B–64B Mixture-of-Transformers architecture.

9.8k Jupyter Notebook Image · Video · Audio · explained
06
lyogavin/airllm
+162 ★/dayaccelerating

AirLLM slices giant transformers into layer shards so they fit in consumer VRAM without quantization or distillation.

19.8k Jupyter Notebook Inference · Serving · explained
07
QuantumNous/new-api
+200 ★/dayaccelerating

A Go-based gateway that cross-converts between OpenAI, Claude, and Gemini formats so you don't have to pick sides in the API format wars.

38.2k Go Inference · Serving · explained
08
modelscope/FunASR
+104 ★/dayaccelerating

A Chinese speech toolkit that bundles ASR, diarization, emotion detection, and streaming into one MIT-licensed package.

17.7k Python Image · Video · Audio · explained
09
NangoHQ/nango
+84 ★/dayaccelerating

Nango turns natural language into deployable TypeScript integration code, then runs it on managed infrastructure.

10.3k TypeScript Agents · explained
10
danny-avila/LibreChat
+121 ★/dayaccelerating

Self-hosted chat UI that unifies OpenAI, Anthropic, Google, AWS, and two dozen other providers under one roof.

38.8k TypeScript Chat Assistants · explained
11
ggml-org/llama.cpp
+222 ★/dayaccelerating

A dependency-free C/C++ inference engine that squeezes large language models onto laptops, phones, and browsers through aggressive quantization and hand-rolled kernels.

116k C++ Inference · Serving · explained
12
tile-ai/TileRT
+36 ★/dayaccelerating

TileRT squeezes millisecond-level latency out of hundred-billion-parameter models by decomposing operators into tile-level tasks and overlapping compute, I/O, and communication across 8 GPUs.

1.3k Python Inference · Serving · explained
13
BerriAI/litellm
+115 ★/dayaccelerating

LiteLLM is the adapter layer that stops your codebase from fracturing across a dozen provider SDKs.

50k Python LLMOps · Eval · explained
14
abus-aikorea/voice-pro
+56 ★/dayaccelerating

Voice-Pro bundles Whisper, F5-TTS, CosyVoice, and a dozen other tools into a single Gradio interface for creators who want ElevenLabs-like results without the API bills.

10.9k Python Inference · Serving · explained
15
cheahjs/free-llm-api-resources
+72 ★/dayaccelerating

A living spreadsheet of which AI providers actually let you call their models for free, with rate limits and gotchas spelled out.

23.2k Python Learning · explained
17
mnfst/manifest
+20 ★/dayaccelerating

Manifest picks the cheapest model that can handle each query, mixing API keys, subscriptions, and local hardware in one endpoint.

6.9k TypeScript Inference · Serving · explained
18
SillyTavern/SillyTavern
+54 ★/dayaccelerating

A locally-run frontend that wrangles dozens of LLM APIs, image generators, and TTS into one obsessively customizable interface.

29.2k JavaScript Chat Assistants · explained
19
ml-explore/mlx-lm
+32 ★/dayaccelerating

A purpose-built inference and fine-tuning stack that treats M-series chips as first-class citizens instead of afterthoughts.

5.7k Python Language Models · explained
20
Blaizzy/mlx-vlm
+20 ★/dayaccelerating

MLX-VLM crams speculative decoding, continuous batching, and KV cache quantization into a Mac-native toolkit for running multimodal models locally.

5k Python Image · Video · Audio · explained
loading more…

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.