EleutherAI/gpt-neo
An implementation of GPT-2 and GPT-3-style transformer language models using mesh-tensorflow with support for training and inference on TPU and GPU.

Implements model and data parallel GPT-3-style language models using the mesh-tensorflow library. Supports training and inference on TPU and GPU with extensions beyond standard GPT-3 including local attention, linear attention, mixture of experts, and axial positional embeddings. Provides pretrained models (1.3B and 2.7B parameters) trained on The Pile dataset. The project is archived in favor of the GPU-focused GPT-NeoX repository.