The Unofficial Manual for When AI Stops Autocompleting and Starts Shipping

Senior Editor

A community-built Chinese field guide treats OpenAI’s Codex as an engineering executioner rather than a chatbot, mapping the full workflow from sandbox rules to production pull requests.

bozhouDev/codex-orange-book

★1.5k stars

View on GitHub ↗

The Hype Moment: Why a Field Guide Appeared Now

OpenAI introduced Codex in May 2025 as a research preview of a cloud-based software engineering agent. Powered by codex-1—an o3 descendant optimized for programming through reinforcement learning on real-world tasks—it was pitched as something more autonomous than GitHub Copilot and more project-native than ChatGPT. It could write features, fix bugs, answer codebase questions, and propose pull requests inside isolated cloud sandboxes. For ChatGPT Plus, Pro, Business, and Enterprise users, it promised to move AI from the chat window into the repository.

But a product launch is not a workflow. Codex shipped with multiple surfaces—a desktop App, a CLI, an IDE extension, a Web interface, and Cloud background tasks—each with its own permission model, sandbox semantics, and rate limits. OpenAI’s official documentation explained what Codex could do; it did not fully prescribe how an individual developer or a small team should live with it day-to-day. Into that gap stepped the Codex Orange Book, an unofficial, Chinese-language guide last checked against the live product in June 2026. It is not a repository of code. It is a meticulously structured manual that treats Codex as an industrial tool rather than a magic button, and its arrival signals that the agentic coding era has matured enough to require its own operator’s handbook.

What the Orange Book Actually Is

The project is documentation wearing a repository’s clothes. Its maintainers describe it as a “full-linkage usage guide” covering Codex App, CLI, IDE Extension, and Web/Cloud interfaces. The name nods to the FDA’s Approved Drug Products with Therapeutic Equivalence Evaluations—the original Orange Book—which since 1980 has served as a reference for what is safe to substitute. The analogy is apt: the guide acts as a reference for what is safe to automate, and where human review remains mandatory.

Version 0.1.0 carries a disclaimer that it does not represent OpenAI’s official documentation or product commitments. That humility is necessary because Codex updates quickly; model names, quota locations, and command parameters shift underneath the user. The guide distributes itself as PDF, HTML, and Markdown, suggesting its audience includes developers who want to read offline, annotate, or share across teams. A companion public repository hosting a bilingual edition has accumulated 460 stars, indicating that the demand for structured, non-English Codex documentation is neither niche nor temporary.

The Core Thesis: From Completion to Delivery

The most valuable insight in the Orange Book is not a configuration trick. It is a historical argument. The guide traces four stages of AI programming tools: the Copilot completion era, the ChatGPT dialogue era, the Cursor project-collaboration era, and the current Codex engineering-agent era. In this framing, Copilot helped you finish lines, ChatGPT helped you think, Cursor helped you refactor inside an editor, and Codex is expected to execute whole tasks—from reading the project to running tests to summarizing diffs.

This shift from writing code to delivering tasks changes the unit of interaction. The guide insists that Codex should not be fed vague requests like “make it better.” Instead, it advocates a six-step workflow: decompose the requirement, plan before touching files, implement in small bounded steps, test against lint and build, review the diff for deletions and side effects, and only then commit or open a pull request. The boring parts—planning, testing, and review—are where the guide locates the real value. It is a workflow manual disguised as a product tutorial, and its rigor suggests that treating an agent like a chatbot is the fastest route to a broken build.

Ecosystem Lock-In as a Feature

The Orange Book maps the full surface area of OpenAI’s agent ecosystem with a thoroughness that borders on cartographic obsession. It catalogs the Codex App as an Electron-based desktop environment with persistent threads, a built-in terminal, and a review pane for inline diffing. It describes the CLI as a terminal-native agent with slash-command governance. It notes the IDE extension for VS Code, Cursor, and Windsurf. And it details Codex Cloud, which runs background tasks against GitHub repositories without keeping a local laptop awake.

Beyond the interfaces, the guide ventures into Codex’s extensibility layer: Skills, which are repeatable workflow templates; MCP servers, which act as standard sockets to external tools like databases or documentation; and AGENTS.md files, which serve as project-specific rulebooks telling the AI what it may and may not touch. It even covers Git worktrees as a mechanism for letting multiple agent threads operate on parallel branches without polluting the main line. The repository is, in essence, glue code of the most honorable kind: a curated interface layer between a rapidly shifting vendor product and a developer’s need for stable, repeatable process.

Position in the Field: Cursor and Claude Code in the Crosshairs

The guide does not pretend Codex exists in a vacuum. It explicitly positions the tool against its nearest rivals. ChatGPT, it argues, is an advisor: you ask, it answers, you execute. Codex is an intern: you delegate, it acts, you verify. Cursor is an AI editor for daily pair programming; Codex is an agent for task execution. Claude Code is a long-term terminal collaborator for complex, multi-step engineering; Codex is a multi-surface platform anchored to the OpenAI ecosystem, with advantages in App-to-CLI-to-Web continuity and GitHub pull-request integration.

The comparison tables are evenhanded. The guide admits that the choice depends on model capability, context window handling, toolchain fit, price, and team habit. It notes that GPT-5.5, released in April 2026, is rolling into Codex with improved benchmark scores and roughly twice the per-token cost of its predecessor. The Orange Book does not declare a winner. It simply provides the map so that teams can choose their own allegiance without discovering the tradeoffs by accident.

The Safety Obsession That Makes It Useful

If the first half of the Orange Book explains what Codex can do, the second half is a long meditation on what it must not do. The guide returns repeatedly to sandbox boundaries—read-only, workspace-write, and the dangerously seductive full-access mode—and warns against letting the agent touch production databases, payment cores, or un-backed-up repositories. It treats Git not as an afterthought but as a safety harness: initialize before the first prompt, commit before each task, and inspect diffs before accepting any change.

This caution is not pessimism. It reflects the reality of agentic coding. An AI that can run commands is an AI that can delete files. The guide’s insistence on approval gates, worktree isolation, and the prohibition of bulk deletions turns Codex from a liability into a manageable risk. In a landscape where vendors emphasize speed, the Orange Book’s most subversive contribution is its insistence that the review pane is the most important screen in the entire workflow.

Outlook: Documentation as Infrastructure

The Codex Orange Book is a symptom of a larger transition. As coding agents proliferate, the competitive bottleneck is shifting from raw model capability to workflow design. The guide’s case studies—building a pet-treat storefront, generating investor pitch decks, producing promotional videos—show that users are already asking Codex to deliver business artifacts, not merely source files. The manual’s existence proves that developers no longer need to be convinced that AI can write code; they need to know how to supervise an employee that never sleeps, never asks for a raise, and occasionally hallucinates a dependency.

The unresolved tension is whether unofficial guides like this become obsolete as OpenAI formalizes its documentation, or whether they remain permanently necessary because vendor docs inevitably lag behind weekly product mutations. For now, the Orange Book serves as a user-built control tower for an agentic future that the official manuals have not yet fully mapped.