← all repositories
SE-ML/awesome-seml

A reading list for the 90% of ML work that isn't training models

Curated papers and guides on the messy infrastructure around machine learning: data pipelines, testing, deployment, and team coordination.

awesome-seml
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

What it does

This is an awesome-list that collects articles, papers, and tooling guides on software engineering practices for machine learning systems. It deliberately excludes core algorithm research and focuses on everything else: data ingestion, versioning, testing, deployment, governance, and how teams actually collaborate on ML projects.

The interesting bit

The list is organized by the pain points that emerge after a model leaves a Jupyter notebook. It also flags must-reads and peer-reviewed publications, so you can triage by depth rather than drowning in blog posts.

Key highlights

  • Covers six practical areas: overviews, data management, model training workflows, deployment/operations, social/team dynamics, and governance
  • Includes classic papers like Google’s “Hidden Technical Debt in Machine Learning Systems” and Microsoft’s SE4ML case study
  • Curated tooling section with open-source or free-for-research options: DVC, MLflow, Kubeflow, Great Expectations, and others
  • Maintainers also run a companion survey on adoption of these practices

Caveats

  • Some links are to commercial whitepapers or blog posts, so bias varies by source
  • Tooling descriptions are brief one-liners; you’ll need to dig deeper for comparisons

Verdict

Worth bookmarking if you’re moving from experiments to production ML, or if you’re a software engineer suddenly asked to “just deploy the model.” Less useful if you’re looking for hands-on tutorials or code samples.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.