Is LongWriter open source?

Yes — THUDM/LongWriter is open source, released under the Apache-2.0 license.

What language is LongWriter written in?

THUDM/LongWriter is primarily written in Python.

How popular is LongWriter?

THUDM/LongWriter has 1.9k stars on GitHub.

Where can I find LongWriter?

THUDM/LongWriter is on GitHub at https://github.com/THUDM/LongWriter.

← all repositories

THUDM/LongWriter

Teaching 8B models to generate 10,000-word walls of text

LongWriter fine-tunes small LLMs to reliably generate ultra-long coherent text, then provides the data pipeline and benchmarks to prove it.

★1.9k stars Python Language Models Inference · Serving ML Frameworks

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does LongWriter is a family of fine-tuned models—built on GLM-4-9B and Llama-3.1-8B—that can generate coherent text exceeding 10,000 words from a single prompt. The repository ships the models, a 6,000-sample training dataset called LongWriter-6k, training scripts adapted from the LongAlign codebase, and two evaluation suites (LongBench-Write and LongWrite-Ruler) that use GPT-4o as a judge to score quality and length. A newer variant, LongWriter-Zero-32B, is trained with pure reinforcement learning and the authors report it surpasses both the original LongWriter and much larger 100B+ models on long-form writing tasks.

The interesting bit The project treats extreme output length as a trainable skill rather than a scaling-law inevitability. It includes AgentWrite, an automated pipeline that constructs ultra-long training data by planning and then writing sections sequentially, effectively bootstrapping length without human annotation.

Key highlights

Ships two fine-tuned chat models (LongWriter-glm4-9b, LongWriter-llama3.1-8b) and a 32B RL-tuned successor (LongWriter-Zero-32B).
Capable of 10,000+ word generation in under a minute with vllm.
Open-sources the AgentWrite data construction pipeline and the LongWriter-6k dataset.
Provides evaluation code and benchmarks (LongBench-Write, LongWrite-Ruler) that stress-test both output quality and maximum length.

Caveats

The AgentWrite pipeline and the quality evaluator both require supplying your own API keys.
Training depends on FlashAttention 2 and a specific transformers version, so expect environment friction.

Verdict Worth a look if you are researching long-output generation or need to train small models for report-writing and novel-length tasks. Skip it if you are just looking for a drop-in chat replacement and do not care about word count.

Frequently asked

What is THUDM/LongWriter?: LongWriter fine-tunes small LLMs to reliably generate ultra-long coherent text, then provides the data pipeline and benchmarks to prove it.
Is LongWriter open source?: Yes — THUDM/LongWriter is open source, released under the Apache-2.0 license.
What language is LongWriter written in?: THUDM/LongWriter is primarily written in Python.
How popular is LongWriter?: THUDM/LongWriter has 1.9k stars on GitHub.
Where can I find LongWriter?: THUDM/LongWriter is on GitHub at https://github.com/THUDM/LongWriter.