← all repositories
xorq-labs/xorq

Git-native memory for agents that keep rewriting your pandas scripts

Xorq turns ephemeral agent work into durable, executable pipelines that survive across sessions, machines, and agents.

513 stars Python Coding AssistantsData Tooling
xorq
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

Xorq is a system for packaging tabular data pipelines into reproducible, content-addressed artifacts stored in git. It wraps Ibis expressions with caching, multi-engine execution, and environment pinning via uv, then catalogs them so agents or humans can discover and rerun them without reconstructing context from chat history.

The interesting bit

The catalog is literally a git repo: aliases as symlinks, entries as zipped builds, metadata as grep-able YAML sidecars. No API to learn, no service to run — an agent that clones the repo can list, filter, and execute pipelines with standard file operations. The authors compare it to “Unix pipes text streams between small programs; Xorq pipes Arrow streams between expressions.”

Key highlights

  • Declarative Ibis expressions compile to DataFusion, DuckDB, SQLite, pandas, Snowflake, Databricks, Trino, Postgres, or PyIceberg
  • Each entry ships with expr.yaml manifest, pinned requirements.txt, and built wheel for reproducible execution
  • CLI for agents (xorq init, xorq catalog add, xorq run) plus TUI for humans to browse schema, lineage, and git history
  • Claude Code plugin with four slash commands for building catalogs agent-side
  • Sklearn pipelines translatable to deferred expressions via Pipeline.from_instance()

Caveats

  • 511 stars; young project with evolving API (version 0.3.24 in examples)
  • Benchmark claim (Haiku 50% → 84% on DABStep) is specific to semantic catalog context injection, not general pipeline execution
  • README is thorough but documentation links suggest some features still need external reference

Verdict

Worth a look if you’re wrangling agent-generated data scripts and tired of losing work between sessions. Probably overkill if your pipelines already live in dbt or you don’t need cross-agent portability.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.