ING open-sourced its dataframe babysitter
A Python library that watches your pandas or Spark data for distribution drift, then emails you when things go sideways.

What it does
popmon bins your dataframe features into time-sliced histograms, then runs statistical tests to flag drift, shifts, outliers, and even changing correlations. It spits out a self-contained HTML report — no external dashboard required — or you can pipe the histogram data into Grafana/Kibana if you’re already married to one.
The interesting bit
The library extends pandas and Spark dataframes with a .pm_stability_report() method, so monitoring is one chained call away. It also handles higher-dimensional histograms, meaning it can track how two features co-vary over time, not just individual columns.
Key highlights
- Works with both pandas and Spark (Scala 2.12/2.13 jars for histogrammar)
- Auto-flags trends, peaks, and anomalies via built-in business rules
- Modular pipeline for custom workflows, with debug visualizations
- Optional diptest integration for unimodality testing
- HTML reports work offline; Grafana/Kibana integrations available
Caveats
- Spark setup requires manual JAR dependency management (version-specific Scala builds)
- Time-axis binning in custom specs needs nanosecond values, which is documented but easy to trip over
Verdict
Data scientists and ML engineers running production pipelines who need drift detection without building a monitoring stack from scratch. Probably overkill if you just want a one-off distribution comparison.