NLP cookie-cutter factory: clone, tweak, ship
A repo of ready-made spaCy project templates that turn "how do I even start?" into a four-command workflow.
What it does
This repository houses pre-built project templates for training, packaging, and serving spaCy NLP pipelines. You clone a template, fetch assets, run commands defined in a project.yml, and end up with a Python package you can ship. The templates cover pipelines, tutorials, third-party integrations, benchmarks, and experimental workflows.
The interesting bit
The real value isn’t the templates themselves—it’s the standardization. By forcing every project through the same weasel CLI and project.yml structure, Explosion turned reproducible NLP from a bespoke craft into something closer to cargo new or create-react-app. The maintenance scripts auto-update docs and configs across the entire repo, which suggests they actually dogfood this at scale.
Key highlights
- Five template categories: pipelines, tutorials, integrations, benchmarks, and experimental
- CLI-driven workflow:
clone→assets→run→ adjust and share - Requires Weasel (included in spaCy v3.7+, or
pip install weasel) - Auto-generated docs and config updates via included maintenance scripts
- Remote storage upload built into the workflow for team sharing
Caveats
- Requires a fresh virtual environment; version conflicts seem to be a known hazard
- The
experimentalcategory is explicitly “use at your own risk” - Previous version lives on a
masterbranch; current work is onmain
Verdict Worth bookmarking if you’re building spaCy pipelines more than once. Skip it if you’re looking for a generic ML workflow tool—this is spaCy-specific glue, not a framework.