← all repositories
openai/gpt-oss

OpenAI's first open-weight models come with a catch

Two MoE models, Apache 2.0 license, and a custom chat format that will break your existing code.

20.1k stars Python Language Models
gpt-oss
Velocity · 7d
+58
★ / day
Trend
steady
star history

What it does

OpenAI released gpt-oss-120b and gpt-oss-20b, two Mixture-of-Experts models under Apache 2.0. The 120B parameter model (5.1B active) squeezes onto a single 80GB GPU via MXFP4 quantization; the 20B version (3.6B active) fits in 16GB. Both support reasoning effort tuning, chain-of-thought exposure, function calling, web browsing, Python execution, and structured outputs.

The interesting bit

The models were trained on a custom “harmony response format” and the README is explicit: “should only be used with this format; otherwise, they will not work correctly.” This is not a suggestion. Transformers users get automatic formatting via chat templates; everyone else must manually apply it or use the openai-harmony package. The repo itself is mostly reference implementations—educational PyTorch code that needs 4× H100s, plus more practical Triton and Metal paths.

Key highlights

  • Apache 2.0 license, no copyleft or patent strings attached
  • Configurable reasoning effort (low/medium/high) and full chain-of-thought visibility
  • Native agentic tools: browser, Python code execution, function calling
  • MXFP4 quantization enables single-GPU deployment for the 120B model
  • Integrations: Transformers, vLLM, Ollama, LM Studio, plus reference Triton/Metal/ PyTorch code

Caveats

  • The harmony format requirement is non-negotiable; existing chat templates will silently fail
  • Reference PyTorch implementation is deliberately inefficient (“educational purposes only”)
  • Windows is untested for reference implementations; use Ollama instead

Verdict

Worth a look if you want a commercially permissive, reasoning-capable model with actual open weights—not just an API wrapper. Skip if you’re expecting drop-in compatibility with standard chat formats or if your hardware tops out at a MacBook Air.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.