← all repositories
ing-bank/popmon

ING open-sourced its dataframe babysitter

A Python library that watches your pandas or Spark data for distribution drift, then emails you when things go sideways.

512 stars Python LLMOps · EvalData Tooling
popmon
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

popmon bins your dataframe features into time-sliced histograms, then runs statistical tests to flag drift, shifts, outliers, and even changing correlations. It spits out a self-contained HTML report — no external dashboard required — or you can pipe the histogram data into Grafana/Kibana if you’re already married to one.

The interesting bit

The library extends pandas and Spark dataframes with a .pm_stability_report() method, so monitoring is one chained call away. It also handles higher-dimensional histograms, meaning it can track how two features co-vary over time, not just individual columns.

Key highlights

  • Works with both pandas and Spark (Scala 2.12/2.13 jars for histogrammar)
  • Auto-flags trends, peaks, and anomalies via built-in business rules
  • Modular pipeline for custom workflows, with debug visualizations
  • Optional diptest integration for unimodality testing
  • HTML reports work offline; Grafana/Kibana integrations available

Caveats

  • Spark setup requires manual JAR dependency management (version-specific Scala builds)
  • Time-axis binning in custom specs needs nanosecond values, which is documented but easy to trip over

Verdict

Data scientists and ML engineers running production pipelines who need drift detection without building a monitoring stack from scratch. Probably overkill if you just want a one-off distribution comparison.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.