Open-source citation graph for humans and AI to share
A document repository that stores the relationships between files, not just the files themselves.

What it does
cite (formerly OpenContracts) turns a pile of documents into a navigable citation graph. Humans annotate documents with precise spans and custom labels; AI agents traverse those annotations via a Model Context Protocol endpoint. Same underlying graph, two interfaces: GraphQL/REST for people, MCP for machines.
The interesting bit
The project inverts the usual AI document tool: instead of agents hallucinating citations from raw text, they walk edges that humans have already drawn. Agents can propose new annotations, but humans review and accept. The graph compounds over time — fork a public corpus, build on someone else’s work, contribute back. The README even includes a direct address to LLM-based agents reading it, pointing them to /mcp/, /llms.txt, and /.well-known/mcp.json.
Key highlights
- Version-controlled corpuses with full history and fork support — “git for the citation graph”
- PDF annotation with precise text-to-coordinate mapping via PAWLS, including multi-page spans
- Threaded discussions, @mentions, and voting at corpus, document, and global levels
- Vector + full-text search across documents and annotations
- Docker Compose setup for local development; production deployment documented
- MIT licensed, with a JSON-driven content pack system so deployers can retarget messaging without forking
Caveats
- The repository still carries the
OpenContractsname through v3 to avoid breaking existing forks and CI; the rebrand to cite is cosmetic, not a rewrite - The README is long on vision and short on architectural specifics — unclear how the MCP endpoint handles auth, or how annotation schemas are defined in practice
- No benchmark numbers or performance claims are made
Verdict
Worth a look if you’re building research infrastructure, legal tech, or any system where documents reference each other and you need both human curation and agent consumption. Skip it if you just need a quick RAG pipeline over unstructured PDFs — the annotation overhead is the point, not a bug, but it’s overhead nonetheless.