← all repositories
peteromallet/dataclaw

Open-source your AI pair-programming sessions

A tool that turns your coding-agent chat logs into public Hugging Face datasets — with a political twist.

2.1k stars Python Coding AssistantsData Tooling
dataclaw
Velocity · 7d
+20
★ / day
Trend
steady
star history

What it does DataClaw scrapes conversation history from Claude Code, Codex, and other coding agents, packages it into structured JSONL, and publishes it to Hugging Face as a dataset. It ships as both a Python CLI and an unsigned Mac menu-bar app. The author frames it as “performance art” — a response to Anthropic’s data-sharing restrictions.

The interesting bit The privacy workflow is unusually paranoid, in a good way. Before you can push anything, you must run a local export, review it, optionally scan for your full legal name, and sign attestations confirming you checked for PII. The tool also auto-redacts secrets via regex, entropy analysis, and username hashing. It is designed so you can literally paste a six-step prompt into Claude Code and have the agent export itself.

Key highlights

  • Parses session logs including voice transcripts, images, tool calls, and token usage
  • Multi-layer redaction: API keys, emails, usernames, high-entropy strings, plus custom rules
  • dataclaw jsonl-to-yaml and diff-jsonl for human review before publishing
  • Tagged dataclaw on Hugging Face to form a distributed open dataset
  • Mac app bundles everything; CLI works everywhere Python does

Caveats

  • Mac DMG is currently unsigned, so Gatekeeper will complain on first launch
  • Intel Macs must use the CLI; Apple Silicon only for the app
  • The author explicitly warns automated redaction is “NOT foolproof”

Verdict Worth a look if you believe coding-agent transcripts should be public training data and you are willing to do the privacy homework. Skip it if you just want a backup of your chats — this tool wants you to publish.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.