← all repositories
mozilla/bugbug

Mozilla's bug triage, outsourced to a Python script

A machine learning platform that reads Bugzilla tickets so humans don't have to sort them by hand.

565 stars Python Domain AppsData Tooling
bugbug
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

Bugbug trains classifiers on Mozilla’s bug and commit data to automate the tedious parts of software engineering: assigning bugs to the right developer, detecting regressions, picking which tests to run, and filtering spam. It plugs into Bugzilla and mozilla-central, with some GitHub issue support. Models cover everything from “is this actually a bug?” to “will this patch get backed out?”

The interesting bit

The project treats bug history as a replayable dataset — bug_snapshot.py lets you reconstruct a ticket’s state at any point in time, which matters when you’re training models that need to avoid peeking at the future. The “testselect” classifier also gets at a real infrastructure problem: running fewer tests without missing the one that breaks.

Key highlights

  • 18 built-in classifiers, from assignee suggestion to uplift approval
  • ~93% accuracy on the defect-vs-feature classifier (2,110 bugs, per README)
  • Keras models wrapped to fit scikit-learn pipelines via bugbug/nn.py
  • Training hooks into Mozilla’s Taskcluster CI with a PR keyword: Train on Taskcluster: <model>
  • Requires Python 3.12+, uses uv for dependency sync

Caveats

  • Hard-wired for Mozilla’s toolchain: Bugzilla, Mercurial, mozilla-central. GitHub support exists but looks secondary.
  • Repository mining takes 7+ hours; the README suggests adding limit=1024 just to test changes.
  • libgit2 v1.0.0 dependency is flagged as “might be required” and only in Debian experimental.

Verdict

Worth studying if you run triage or CI for a large project with messy historical data. Skip it if you’re looking for a drop-in solution for a small GitHub repo — the Mozilla-specific assumptions run deep, and the project says so itself.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.