← all repositories
PAIR-code/what-if-tool

Google's interactive debugger for ML models that lets you poke the black box

A visual tool for inspecting, editing, and stress-testing trained classification and regression models without writing code.

what-if-tool
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does The What-If Tool (WIT) is a visual interface for interrogating trained ML models. Load a dataset, run inference on thousands of examples, then slice, color, and rearrange the results to see where your model stumbles. You can manually edit individual data points—change someone’s age in a census record, flip a feature—and immediately re-run them through the model to watch the prediction shift. It also surfaces fairness metrics across dataset subsets.

The interesting bit The tool treats “what if I just changed this one thing?” as a first-class interaction. No notebooks, no scripts: click a data point, edit a value, hit go. This turns model debugging from a batch job into something closer to a spreadsheet with a live brain attached. It also integrates attribution values (via vanilla gradients in supported demos) so you can see which features drove a specific prediction.

Key highlights

  • Runs inside TensorBoard, Jupyter, or Colab—pick your workflow
  • Works with TensorFlow Estimators, Google AI Platform Prediction models, or any custom prediction function you wrap
  • Visualizes datasets with Facets Dive: scatter plots, confusion matrices, histograms, and small multiples faceted by any feature
  • Handles tens of thousands of examples, including image data (shows thumbnails for encoded image features)
  • Can analyze CSV files directly with no model server at all—just point it at a file with prediction columns

Caveats

  • TensorBoard mode requires gRPC to TensorFlow Serving, not REST; port 8500 if you’re using the standard Docker image
  • CSV-only mode disables the editing/re-inference loop since there’s no live model to query
  • The “absolutely no code required” claim applies to the UI; setting up your model to serve predictions still takes plumbing

Verdict Worth a look if you need to explain model behavior to stakeholders, audit for fairness, or debug why your classifier flips on edge cases. Skip it if you’re already happy with programmatic model analysis or your pipeline doesn’t expose predictions through one of the supported interfaces.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.