← all repositories

Beomi/KcBERT

Pretrained BERT model and WordPiece tokenizer trained on Korean comments, with associated datasets for fine-tuning Korean NLP tasks.

491 stars Language Models
KcBERT
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

KcBERT provides a Korean-specific pretrained BERT model and WordPiece tokenizer trained on Korean comment data. The repository includes released datasets (v2022.3Q with 45GB and 340M records) and supports fine-tuning for downstream Korean NLP tasks via Colab notebooks. The project also led to the development of KcELECTRA, a newer model with improved performance across most tasks.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.