← all repositories

meta-pytorch/gpt-fast

Minimalist PyTorch-based inference engine for transformer text generation models including LLaMA, Mixtral, Gemma, Grok-1, and DBRX.

gpt-fast
Velocity · 7d
+6.4
★ / day
Trend
steady
star history

gpt-fast is a lightweight text generation framework written in under 1000 lines of Python that leverages native PyTorch operations for high-performance inference. It supports key optimization techniques including int8 and int4 quantization, speculative decoding, and tensor parallelism across Nvidia and AMD GPUs. The project targets various transformer architectures from the LLaMA family and Mixture-of-Experts models, positioned as an educational reference implementation rather than a production framework.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.