Is turkish-bert open source?

Yes — stefan-it/turkish-bert is an open-source project tracked on heatdrop.

What language is turkish-bert written in?

stefan-it/turkish-bert is primarily written in Python.

How popular is turkish-bert?

stefan-it/turkish-bert has 578 stars on GitHub.

Where can I find turkish-bert?

stefan-it/turkish-bert is on GitHub at https://github.com/stefan-it/turkish-bert.

← all repositories

stefan-it/turkish-bert

Turkish NLP's model zoo, built by committee

Community-sourced data, a crowdsourced name, and more transformer variants than you can shake a kebab at.

★578 stars Python Language Models

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

BERTurk is a family of pre-trained Turkish language models—BERT, DistilBERT, ELECTRA, ConvBERT, and T5 variants—trained on filtered web corpora, Wikipedia, and community-contributed datasets. Everything is hosted on Hugging Face and benchmarked on standard Turkish NLP tasks.

The interesting bit

The project is genuinely community-driven: the training data, the “BERTurk” name, and even the logo (by Merve Noyan) came from the Turkish NLP community rather than a single lab. The README also doubles as a changelog stretching back to 2020, which makes the evolution oddly transparent—you can watch the model zoo grow from a single BERT checkpoint to a 1.42B-parameter T5 variant trained on FineWeb2.

Key highlights

13 model variants with training corpus sizes from 7GB (distilled) to 262GB (BERT5urk)
Two vocab sizes for BERT models: standard 32k and expanded 128k
ELECTRA and ConvBERT models trained on both the original 35GB corpus and the larger mC4 (242GB)
BERT5urk uses the UL2 objective in T5X for 2M steps on a v3-32 TPU pod
Evaluation tables with actual numbers: PoS tagging accuracy in the 93-95% range across variants

Caveats

The NER and sentiment sections are truncated in the provided README, so downstream performance beyond PoS tagging is unclear
No explicit comparison to non-community Turkish models (e.g., from major cloud providers)

Verdict

Worth bookmarking if you work on Turkish NLP and want battle-tested, openly documented baselines. Skip if you need multilingual coverage—this is Turkish-only by design.

Frequently asked

What is stefan-it/turkish-bert?: Community-sourced data, a crowdsourced name, and more transformer variants than you can shake a kebab at.
Is turkish-bert open source?: Yes — stefan-it/turkish-bert is an open-source project tracked on heatdrop.
What language is turkish-bert written in?: stefan-it/turkish-bert is primarily written in Python.
How popular is turkish-bert?: stefan-it/turkish-bert has 578 stars on GitHub.
Where can I find turkish-bert?: stefan-it/turkish-bert is on GitHub at https://github.com/stefan-it/turkish-bert.