Is CodeBERT open source?

Yes — microsoft/CodeBERT is open source, released under the MIT license.

What language is CodeBERT written in?

microsoft/CodeBERT is primarily written in Python.

How popular is CodeBERT?

microsoft/CodeBERT has 2.8k stars on GitHub.

Where can I find CodeBERT?

microsoft/CodeBERT is on GitHub at https://github.com/microsoft/CodeBERT.

← all repositories

microsoft/CodeBERT

Microsoft's six-model code-AI family, ready to pip install

A monorepo of pretrained transformers for code understanding, generation, review, and even execution traces.

★2.8k stars Python Language Models Coding Assistants

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does This repository bundles six related models—CodeBERT, GraphCodeBERT, UniXcoder, CodeReviewer, CodeExecutor, and LongCoder—into a single pip-installable lineage. Each is a pretrained transformer for code, loadable via Hugging Face’s standard transformers API exactly like RoBERTa. The base CodeBERT model was trained on natural-language-to-code pairs across six languages (Python, Java, JavaScript, PHP, Ruby, Go).

The interesting bit The family keeps branching into weirder capabilities: GraphCodeBERT injects data-flow structure, CodeReviewer digests actual code-review conversations, and CodeExecutor tries to predict execution traces rather than just syntax. It’s less a single tool than a research group’s longitudinal study of what you can pretrain a transformer to understand about code.

Key highlights

Drop-in Hugging Face compatibility: AutoModel.from_pretrained("microsoft/codebert-base")
Two variants for different jobs: standard CodeBERT for embeddings, CodeBERT-MLM for masked-token prediction
GraphCodeBERT adds data-flow edges for structure-aware tasks (clone detection, code translation)
LongCoder targets long-range code completion with sparse attention
Each model has its own subfolder with task-specific reproduction code

Caveats

The README is mostly a directory of paper links; actual tutorials and downstream task code live in per-model subfolders
Some subfolders use “will provide” phrasing, suggesting not everything is fully documented yet

Verdict Worth bookmarking if you’re doing empirical research on code intelligence or need a solid pretrained embedding baseline. Skip it if you want a single polished product—this is a lab’s paper-reproduction archive, not a unified framework.

Frequently asked

What is microsoft/CodeBERT?: A monorepo of pretrained transformers for code understanding, generation, review, and even execution traces.
Is CodeBERT open source?: Yes — microsoft/CodeBERT is open source, released under the MIT license.
What language is CodeBERT written in?: microsoft/CodeBERT is primarily written in Python.
How popular is CodeBERT?: microsoft/CodeBERT has 2.8k stars on GitHub.
Where can I find CodeBERT?: microsoft/CodeBERT is on GitHub at https://github.com/microsoft/CodeBERT.