← all repositories
explosion/projects

NLP cookie-cutter factory: clone, tweak, ship

A repo of ready-made spaCy project templates that turn "how do I even start?" into a four-command workflow.

1.4k stars Python ML FrameworksData Tooling
projects
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

What it does This repository houses pre-built project templates for training, packaging, and serving spaCy NLP pipelines. You clone a template, fetch assets, run commands defined in a project.yml, and end up with a Python package you can ship. The templates cover pipelines, tutorials, third-party integrations, benchmarks, and experimental workflows.

The interesting bit The real value isn’t the templates themselves—it’s the standardization. By forcing every project through the same weasel CLI and project.yml structure, Explosion turned reproducible NLP from a bespoke craft into something closer to cargo new or create-react-app. The maintenance scripts auto-update docs and configs across the entire repo, which suggests they actually dogfood this at scale.

Key highlights

  • Five template categories: pipelines, tutorials, integrations, benchmarks, and experimental
  • CLI-driven workflow: cloneassetsrun → adjust and share
  • Requires Weasel (included in spaCy v3.7+, or pip install weasel)
  • Auto-generated docs and config updates via included maintenance scripts
  • Remote storage upload built into the workflow for team sharing

Caveats

  • Requires a fresh virtual environment; version conflicts seem to be a known hazard
  • The experimental category is explicitly “use at your own risk”
  • Previous version lives on a master branch; current work is on main

Verdict Worth bookmarking if you’re building spaCy pipelines more than once. Skip it if you’re looking for a generic ML workflow tool—this is spaCy-specific glue, not a framework.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.