← all repositories
huggingface/pytorch-openai-transformer-lm

Porting OpenAI's GPT-1 to PyTorch, weights and all

A faithful translation of the original TensorFlow transformer with a working weight importer, because not everyone speaks TF.

1.5k stars Python Language ModelsML Frameworks
pytorch-openai-transformer-lm
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

This repo re-implements OpenAI’s original 2018 “Generative Pre-Training” transformer in PyTorch, including a script that slurps the official TensorFlow pretrained weights into PyTorch tensors. You get the language model backbone, a tied LM head for generation, and a classification head for fine-tuning — all with module names matching the original TF variables to keep diffs minimal.

The interesting bit

The fidelity is almost archaeological: they even reproduced OpenAI’s custom Adam optimizer with fixed weight decay and transformer learning-rate scheduling. The ROCStories benchmark comes within 0.04% of the TensorFlow median (85.84% vs. 85.8%), which is either impressive dedication or a very expensive way to avoid installing TensorFlow.

Key highlights

  • Imports OpenAI’s released pretrained weights directly; no retraining from scratch
  • Single-GPU fine-tuning hits 85.84% on ROCStories in 10 minutes on a K80
  • Includes LMHead and ClfHead classes for language modeling or classification tasks
  • Custom Adam with Loshchilov-style weight decay and scheduled LR baked in
  • Requires only PyTorch ≥0.4 for inference; extra deps (spacy, sklearn, etc.) only for training

Caveats

  • Single-GPU only; batch size capped at 20 on a K80 vs. 64 in the official 8-GPU setup, and accuracy improves noticeably with more batch
  • You must manually clone OpenAI’s repo and drop the model folder in place; no automated weight download
  • The README’s “first experiments” note suggests multi-GPU is “not tried yet,” so scaling is DIY

Verdict

Worth a look if you need GPT-1 in PyTorch for historical reproduction or ablation work. Skip it if you want modern scale — this is the 2018 ancestor, not a maintained foundation model.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.