Is TextRL open source?

Yes — voidful/TextRL is open source, released under the MIT license.

What language is TextRL written in?

voidful/TextRL is primarily written in Python.

How popular is TextRL?

voidful/TextRL has 564 stars on GitHub.

Where can I find TextRL?

voidful/TextRL is on GitHub at https://github.com/voidful/TextRL.

← all repositories

voidful/TextRL

A saner API for training LLMs with reinforcement learning

TextRL wraps HuggingFace's TRL library in a single config dataclass and a handful of trainer classes so you can run GRPO, DPO, or KTO without drowning in boilerplate.

★564 stars Python ML Frameworks Language Models LLMOps · Eval

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does TextRL is a thin layer over HuggingFace TRL that standardizes how you configure and run RLHF-style training. You define a TextRLConfig dataclass, pick a trainer (OnlineTrainer, PreferenceTrainer, RewardModelTrainer), and pass a callable reward function or a preference dataset. It handles PEFT/LoRA, 4-bit quantization, vLLM rollout for GRPO, and distributed training via accelerate without adding its own scaffolding.

The interesting bit The reward function API is deliberately plain Python: decorate any callable with @reward_fn, compose multiple rewards with weights, or wrap a HuggingFace sentiment classifier in one line. No custom tensor formats, no subclassing gym environments. The v1.0 rewrite killed the old PFRL/gym API entirely—this is now purely a TRL ergonomic wrapper.

Key highlights

One TextRLConfig covers GRPO, RLOO, REINFORCE++, DPO, IPO, KTO, and a dozen other algorithms via TRL’s unified loss_type.
load_model() returns (policy, tokenizer, ref_model) with optional LoRA, QLoRA, and Flash Attention 2 in a single call.
vLLM rollout support for GRPO generation, gated behind extra={"use_vllm": True}.
CLI tools for YAML-driven training, adapter merging, and reward-only evaluation.
Explicit about what’s not supported: PPO, OnlineDPO, ORPO, SimPO, and others removed in TRL 0.29+ raise with a migration hint.

Caveats

The project is basically glue code around TRL; if TRL breaks or removes an algorithm, TextRL breaks too.
vLLM rollout is GRPO-only; RLOO and REINFORCE++ don’t get the fast path.
The README notes a “v1.0 breaking change” with legacy API removal—check docs/migration.md if you’re upgrading.

Verdict Worth a look if you’re already in the TRL ecosystem and want less boilerplate, or if you train enough models that copy-pasting TRL scripts has become tedious. Skip it if you need PPO, SimPO, or deeply custom training loops that TRL itself doesn’t support.

Frequently asked

What is voidful/TextRL?: TextRL wraps HuggingFace's TRL library in a single config dataclass and a handful of trainer classes so you can run GRPO, DPO, or KTO without drowning in boilerplate.
Is TextRL open source?: Yes — voidful/TextRL is open source, released under the MIT license.
What language is TextRL written in?: voidful/TextRL is primarily written in Python.
How popular is TextRL?: voidful/TextRL has 564 stars on GitHub.
Where can I find TextRL?: voidful/TextRL is on GitHub at https://github.com/voidful/TextRL.