← all repositories
salesforce/CodeRL

Teaching code models to learn from their own compiler errors

CodeRL uses a critic network to score generated programs by predicted unit-test outcomes, then feeds that signal back to the generator via reinforcement learning.

CodeRL
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does CodeRL fine-tunes code-generating language models (specifically CodeT5) using actor-critic reinforcement learning. A generator model writes candidate programs; a separate critic model predicts whether those programs will pass unit tests. The critic’s scores become reward signals that train the generator to write better code. At inference time, the system can regenerate programs based on feedback from example unit tests and critic scores.

The interesting bit The critic doesn’t just say “pass/fail” — it predicts four outcome classes (Compile Error, Runtime Error, Failed Tests, Passed Tests), giving the generator denser feedback than binary reward. There’s also a binary critic variant used during inference for a “critical sampling” strategy that the README lists as not yet implemented.

Key highlights

  • Built on Salesforce’s own CodeT5-large-ntp-py, a 770M-parameter model further pretrained on Python GitHub code for generation tasks
  • Evaluated on APPS and MBPP program-synthesis benchmarks
  • Provides shell scripts for generation, unit-test execution, critic training, and both ground-truth and synthetic-program fine-tuning
  • Pre-extracted example unit tests for APPS included (average ~2 per problem)
  • Critic and fine-tuned model checkpoints available via Google Cloud Storage

Caveats

  • The “Generating Programs with Critic Sampling” process is listed as not yet implemented in the README’s progress checklist
  • Requires installing a specific fork of transformers (v4.16.1) from source, not the standard pip package
  • Setup involves manual dataset downloads and model checkpoint placement in expected folder structures

Verdict Worth a look if you’re researching RL for code generation or need a reproducible NeurIPS 2022 baseline on APPS/MBPP. Skip if you want a polished, pip-installable tool — this is research code with rough edges and incomplete features.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.