Yes — ConnorJL/GPT2 is open source, released under the MIT license.

What language is GPT2 written in?

ConnorJL/GPT2 is primarily written in Python.

ConnorJL/GPT2 has 1.4k stars on GitHub.

Where can I find GPT2?

ConnorJL/GPT2 is on GitHub at https://github.com/ConnorJL/GPT2.

ConnorJL/GPT2

A scrappy GPT-2 reimplementation that admits it can't quite match OpenAI

Independent training code for GPT-2 with TPU support, plus the rare honesty that the results fall short of the original.

★1.4k stars Python Language Models Inference · Serving

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

This is a from-scratch TensorFlow implementation for training GPT-2 on GPUs or Google TPUs. It includes scripts to wrangle the OpenWebText corpus (Reddit-linked web pages), encode them as TFRecords, and train models from 117M parameters up to 1.5B. The author also released their own pretrained checkpoints, though they label them “inferior” to OpenAI’s.

The interesting bit

The README opens with a disclaimer that the author couldn’t replicate the original model’s full performance and has no idea why — a refreshing break from the usual benchmark inflation. The whole thing is built around JSON config files rather than argparse soup, and it includes a handwritten data pipeline that stitches short texts together so you never waste context window on padding.

Key highlights

Supports both single GPUs and TPU pods (v2-256, v3-512) without code changes
Released pretrained models: 117M, “PrettyBig” (~345M+), and 1.5B
Custom data pipeline requires modifying inputs.py by hand — no slick abstraction
Dataset generation is documented but hacky; author spent ~€500 on cloud compute to process it
Prediction only works on GPU/CPU, not TPUs

Caveats

The author explicitly states performance does not match the original GPT-2 and the bug remains unfound
Evaluation breaks on TPU pods and must be commented out
Dataset scripts are “a bit hacky” and need manual adaptation

Verdict

Worth a look if you need a hackable, pre-Transformers-era GPT-2 training codebase with TPU support and don’t mind some assembly required. Skip it if you want battle-tested, drop-in reproductions or modern PyTorch ergonomics.

Frequently asked

What is ConnorJL/GPT2?: Independent training code for GPT-2 with TPU support, plus the rare honesty that the results fall short of the original.
Is GPT2 open source?: Yes — ConnorJL/GPT2 is open source, released under the MIT license.
What language is GPT2 written in?: ConnorJL/GPT2 is primarily written in Python.
How popular is GPT2?: ConnorJL/GPT2 has 1.4k stars on GitHub.
Where can I find GPT2?: ConnorJL/GPT2 is on GitHub at https://github.com/ConnorJL/GPT2.