← all repositories
uber/manifold

Uber's debugger for when your model 'works' except where it doesn't

A visual tool that clusters your model's failures to show which data slices and features are actually causing the pain.

1.7k stars JavaScript LLMOps · Eval
manifold
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

What it does

Manifold takes your model’s predictions, ground truth, and raw features, then visually surfaces where your model fails and what distinguishes those failures from successes. It works with any model—hence “model-agnostic”—as long as you can feed it the prediction arrays. You get a demo app for one-off CSV uploads or a React component to embed in your own pipeline.

The interesting bit

The tool doesn’t just plot aggregate AUC or RMSE. It uses k-Means clustering on per-instance performance scores to auto-segment your data into groups with similar error patterns, then ranks features by KL-Divergence between high-error and low-error slices. The geo view even handles Uber’s H3 hexagons natively—unsurprising given the source, but useful if your failures cluster spatially.

Key highlights

  • Performance Comparison View: clusters data by error similarity across models, letting you spot which model wins on which slice
  • Feature Attribution View: histograms and heatmaps ranked by distribution divergence between segment groups
  • Geo Feature View: lat/lng and H3 hex ID support with heatmap and aggregate hexagon coloring
  • Component or demo app: embed via React or run locally with yarn start at localhost:8080
  • Recommended dataset size: 10,000–15,000 instances; larger sets can be randomly subsampled

Caveats

  • The README notes a typo in “supoorted” for geo features and “intrisic” for opacity—small, but suggests maintenance attention may be uneven
  • Project is “stable and being incubated for long-term support”—not abandoned, but not actively evolving either
  • Data format is rigid: x, yPred, and yTrue must align perfectly by instance order

Verdict

Worth a look if you’re debugging production models and tired of aggregate metrics hiding slice-specific failures. Skip it if you need real-time monitoring or automated remediation—this is exploratory analysis, not ops.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.