microsoft/DeBERTa
Microsoft's implementation of DeBERTa, a decoding-enhanced BERT model with disentangled attention and ELECTRA-style pre-training.

Velocity · 7d
+1.0
★ / day
Trend
→steady
star history
This repository provides the official implementation of DeBERTa and DeBERTa V3, transformer-based language models that improve on BERT through disentangled attention mechanisms. The implementation includes code for model pre-training, continuous training, and fine-tuning on downstream NLP tasks including SuperGLUE benchmarks. Models ranging from 22M to 1.5B parameters are available via the Hugging Face model hub.