← all repositories
JayYip/m3tl

BERT, but make it do five jobs at once

A wrapper around Hugging Face transformers that tries to make multi-task learning as easy as single-task, with pluggable strategies for sampling, loss combination, and gradient surgery.

544 stars Jupyter Notebook ML FrameworksLanguage Models
m3tl
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

M3TL is a convenience layer over Hugging Face transformers for multi-modal, multi-task learning. It exposes programmable modules—problem sampling, loss combination, gradient surgery, and post-transformer model architecture—that let you stack multiple NLP tasks (classification, NER, sequence tagging, masked LM, etc.) on a single shared backbone. The pitch: write MTL models with roughly the effort of a single-task model.

The interesting bit

The project doesn’t just wire tasks together; it treats MTL’s gnarly coordination problems as first-class, swappable components. Gradient surgery in particular is the kind of thing that usually lives in research code and never gets reused.

Key highlights

  • Built-in problem types: classification, multi-label, sequence labeling, masked LM, regression, contrastive learning, and more
  • Pluggable strategies for which tasks to sample, how to combine losses, and how to avoid gradient conflicts
  • Post-transformer model module is user-programmable
  • Claims various “SOTA MTL algorithms” included, though specifics aren’t enumerated in the README
  • Multi-modal support extends beyond text

Caveats

  • README is heavy on promises and light on implementation details; “tutorials” are referenced but not linked or summarized
  • No benchmarks, citation, or comparison against TencentNLP/PyText (the projects it criticizes as “naive”)
  • “SOTA MTL algorithms” are asserted, not listed

Verdict

Worth a look if you’re already in the Hugging Face ecosystem and need to bolt multiple NLP tasks onto one model without hand-rolling the coordination logic. Skip if you need rigorous comparisons or documentation before trusting a training pipeline.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.