← all repositories

skyzh/tiny-llm

A hands-on course teaching systems engineers to build a minimal vLLM-like LLM inference engine from scratch using Python and MLX.

tiny-llm
Velocity · 7d
+10
★ / day
Trend
steady
star history

This repository provides a week-long course on LLM serving infrastructure for systems engineers. It covers implementing core LLM components (attention, RoPE, QK norm) in pure Python without high-level neural network APIs, then builds a simplified vLLM-style inference system with KV caching, continuous batching, and flash attention optimizations. The course uses Qwen3 models running on Apple Silicon via MLX framework.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.