Turning captions into structured graphs, with caveats
A pure-Python scene graph parser that extracts entities and relations from plain English sentences using hand-written rules over spaCy dependency trees.

What it does
SceneGraphParser (sng_parser) takes a sentence like “A woman is playing the piano in the room” and breaks it into a symbolic graph: nodes are noun phrases (with modifiers like articles and adjectives), edges are relations between them. The output is plain Python dicts and lists — no custom objects to pickle or unwrap.
The interesting bit
Unlike the original Stanford parser, this is pure Python and built on spaCy. The parsing itself runs on hand-crafted rules over dependency trees, not learned models. That makes it transparent and tweakable, but also brittle — the authors explicitly ask for help collecting failure cases.
Key highlights
- Single function call:
sng_parser.parse('your sentence here') - Ships with a tabular printer (
tprint) for quick inspection - Backend-swappable design, though only spaCy is implemented
- Output is vanilla Python structures for easy downstream use
- Developed for a CVPR 2019 oral paper on visual-semantic embeddings
Caveats
- All APIs are explicitly unstable — the README warns they “are subject to ANY change”
- English-only for now; requires
python -m spacy download en - Rule-based approach means it will stumble on anything the authors didn’t anticipate
Verdict
Useful if you need quick, interpretable scene graphs from captions and can tolerate some manual cleanup. Skip it if you need production-grade robustness or multilingual support — the authors are upfront that this is research code still under active development.