Is gpt-fast open source?

Yes — meta-pytorch/gpt-fast is open source, released under the BSD-3-Clause license.

What language is gpt-fast written in?

meta-pytorch/gpt-fast is primarily written in Python.

How popular is gpt-fast?

meta-pytorch/gpt-fast has 6.2k stars on GitHub.

Where can I find gpt-fast?

meta-pytorch/gpt-fast is on GitHub at https://github.com/meta-pytorch/gpt-fast.

← all repositories

meta-pytorch/gpt-fast

The anti-framework: LLM inference in <1,000 lines

Q: What is meta-pytorch/gpt-fast?

gpt-fast exists to prove that native PyTorch alone can deliver serious transformer inference speed—no heavy framework required.

★6.2k stars Python Inference · Serving Language Models

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does gpt-fast is a compact reference implementation for running large language model inference with almost nothing but PyTorch. It packs quantization, speculative decoding, and tensor parallelism into fewer than 1,000 lines of Python. The authors explicitly call it an anti-framework: the intended use is to copy, paste, and adapt rather than install and import.

The interesting bit The project treats PyTorch as a complete inference stack rather than a building block for something heavier. It ships int8 and int4 weight-only quantization, a pure-PyTorch GPTQ implementation, and even Mixtral 8x7B MoE support, all while refusing to grow into a library. That restraint is the point.

Key highlights

Under 1,000 lines of Python with only PyTorch and sentencepiece as dependencies.
Supports int8/int4 quantization, speculative decoding, and tensor parallelism across NVIDIA and AMD GPUs.
Benchmarked on LLaMA 2/3, Mixtral, and CodeLlama; includes tensor-parallel results up to 8 GPUs and 405B models.
Experimental GPTQ quantization and EleutherAI harness evaluation (though generative evaluation tasks are currently unsupported).
Explicitly designed as copy-paste reference code, not a maintained framework.

Caveats

Generative evaluation tasks are not yet supported in the experimental eval harness.
It currently depends on PyTorch nightly, so API drift is expected.

Verdict Grab this if you want to understand how far vanilla PyTorch can stretch for LLM inference, or if you need a hackable baseline to fork. Skip it if you are looking for a batteries-included serving framework or a stable pip-installable library.

Frequently asked

What is meta-pytorch/gpt-fast?: gpt-fast exists to prove that native PyTorch alone can deliver serious transformer inference speed—no heavy framework required.
Is gpt-fast open source?: Yes — meta-pytorch/gpt-fast is open source, released under the BSD-3-Clause license.
What language is gpt-fast written in?: meta-pytorch/gpt-fast is primarily written in Python.
How popular is gpt-fast?: meta-pytorch/gpt-fast has 6.2k stars on GitHub.
Where can I find gpt-fast?: meta-pytorch/gpt-fast is on GitHub at https://github.com/meta-pytorch/gpt-fast.