Open-source your AI pair-programming sessions
A tool that turns your coding-agent chat logs into public Hugging Face datasets — with a political twist.

What it does DataClaw scrapes conversation history from Claude Code, Codex, and other coding agents, packages it into structured JSONL, and publishes it to Hugging Face as a dataset. It ships as both a Python CLI and an unsigned Mac menu-bar app. The author frames it as “performance art” — a response to Anthropic’s data-sharing restrictions.
The interesting bit The privacy workflow is unusually paranoid, in a good way. Before you can push anything, you must run a local export, review it, optionally scan for your full legal name, and sign attestations confirming you checked for PII. The tool also auto-redacts secrets via regex, entropy analysis, and username hashing. It is designed so you can literally paste a six-step prompt into Claude Code and have the agent export itself.
Key highlights
- Parses session logs including voice transcripts, images, tool calls, and token usage
- Multi-layer redaction: API keys, emails, usernames, high-entropy strings, plus custom rules
dataclaw jsonl-to-yamlanddiff-jsonlfor human review before publishing- Tagged
dataclawon Hugging Face to form a distributed open dataset - Mac app bundles everything; CLI works everywhere Python does
Caveats
- Mac DMG is currently unsigned, so Gatekeeper will complain on first launch
- Intel Macs must use the CLI; Apple Silicon only for the app
- The author explicitly warns automated redaction is “NOT foolproof”
Verdict Worth a look if you believe coding-agent transcripts should be public training data and you are willing to do the privacy homework. Skip it if you just want a backup of your chats — this tool wants you to publish.