Beomi/KcBERT
Pretrained BERT model and WordPiece tokenizer trained on Korean comments, with associated datasets for fine-tuning Korean NLP tasks.
★491 stars Language Models

Velocity · 7d
+0.2
★ / day
Trend
→steady
star history
KcBERT provides a Korean-specific pretrained BERT model and WordPiece tokenizer trained on Korean comment data. The repository includes released datasets (v2022.3Q with 45GB and 340M records) and supports fine-tuning for downstream Korean NLP tasks via Colab notebooks. The project also led to the development of KcELECTRA, a newer model with improved performance across most tasks.