Is DialoGPT open source?

Yes — microsoft/DialoGPT is open source, released under the MIT license.

What language is DialoGPT written in?

microsoft/DialoGPT is primarily written in Python.

How popular is DialoGPT?

microsoft/DialoGPT has 2.4k stars on GitHub.

Where can I find DialoGPT?

microsoft/DialoGPT is on GitHub at https://github.com/microsoft/DialoGPT.

← all repositories

microsoft/DialoGPT

Microsoft's chatbot that learned from Reddit — and its own successor

DialoGPT was an early GPT-2-based dialogue model, but Microsoft now tells you to use GODEL instead.

★2.4k stars Python Language Models Chat Assistants

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

DialoGPT generates conversational responses by fine-tuning GPT-2 on 147 million multi-turn Reddit dialogues. The repo bundles data extraction scripts, training code, and three pretrained checkpoints (117M, 345M, 762M parameters) that you can download and fine-tune. A demo.py script attempts to paper over the setup pain by automating model downloads, data prep, and training in one command.

The interesting bit

The README opens with a blunt admission: this project is “no longer maintained” and superseded by GODEL, which “outperforms DialoGPT.” It’s rare to see a research repo actively discourage its own use unless you’re chasing reproducibility. The model also passed a single-turn Turing test against human responses — though the sources don’t detail how rigorous that test was.

Key highlights

Three model sizes, with the 762M variant needing >16GB GPU memory for efficient training
Distributed training supported; 8 V100s cut epoch time from 118h to 18h on benchmark data
Docker and Conda environments provided, but Ubuntu 16.04 is the only officially supported OS
Hugging Face model cards available for easier interactive use
Includes DSTC-7 challenge reproduction scripts and a 6k multi-reference test set

Caveats

Data pipeline broke in 2022 due to Pushshift server changes; fix requires 800GB temp disk space and ~10 hours with 8 processes
“Stability can not be gauranteed” on non-Linux platforms (their spelling, not mine)
FP16 training requires installing a specific pinned commit of NVIDIA Apex

Verdict

Worth a look if you’re reproducing 2019 dialogue generation papers or need a GPT-2-based baseline. Everyone else should follow Microsoft’s advice and head to GODEL. The repo’s real value may be as a time capsule of early large-scale dialogue pretraining — and as a case study in graceful project obsolescence.

Frequently asked

What is microsoft/DialoGPT?: DialoGPT was an early GPT-2-based dialogue model, but Microsoft now tells you to use GODEL instead.
Is DialoGPT open source?: Yes — microsoft/DialoGPT is open source, released under the MIT license.
What language is DialoGPT written in?: microsoft/DialoGPT is primarily written in Python.
How popular is DialoGPT?: microsoft/DialoGPT has 2.4k stars on GitHub.
Where can I find DialoGPT?: microsoft/DialoGPT is on GitHub at https://github.com/microsoft/DialoGPT.