One model, ten NLP tasks, no special casing
Salesforce's decaNLP forces everything from translation to sentiment analysis into the same question-answering mold, then trains a single network to handle the lot.

What it does
decaNLP is a multitask benchmark and training framework that unifies ten disparate NLP tasks—SQuAD, IWSLT, CNN/DailyMail summarization, MNLI, SST, and five others—by recasting each one as question answering. The bundled MQAN model learns all tasks jointly with no task-specific parameters or modules. The framework also supports single-task training, transfer learning, and zero-shot evaluation if you prefer to cherry-pick.
The interesting bit
The recasting trick is the core insight: instead of hand-crafting architectures for translation, parsing, dialogue, and so on, everything becomes “answer this question.” That lets one sequence-to-sequence model with attention handle the full decathlon. The README notes the framework can be adapted to other multitask approaches, so the QA framing is more starting point than straitjacket.
Key highlights
- Ten tasks in one model: QA, MT, summarization, NLI, sentiment, SRL, relation extraction, dialogue, semantic parsing, commonsense reasoning
- Joint training via round-robin sampling; optional jump-start pretraining on a subset of tasks
- Pretrained MQAN checkpoints available, including one that reportedly set a new state-of-the-art on WikiSQL
- Docker images provided for both CUDA and CPU setups
- TensorBoard logging built in; checkpoint resume supported
Caveats
- Multi-GPU training is marked “WIP”; single GPU only for now
- Original paper used PyTorch 0.3; repo migrated to 0.4 but exact replication requires checking out an older commit and using a legacy Docker image
- Full decaNLP training takes ~3 days for approximate results, ~7 days for full convergence on a single Volta GPU
- First run downloads and caches all datasets; the README warns this “might take a while”
- Validation is slow, especially ROUGE computation
Verdict Worth a look if you’re researching multitask NLP or want a unified benchmark to stress-test generalization. Skip it if you need production-ready multi-GPU training or if your problem maps cleanly to a single well-solved task.