Is gpt-code-clippy open source?

Yes — CodedotAl/gpt-code-clippy is open source, released under the Apache-2.0 license.

What language is gpt-code-clippy written in?

CodedotAl/gpt-code-clippy is primarily written in Python.

How popular is gpt-code-clippy?

CodedotAl/gpt-code-clippy has 3.3k stars on GitHub.

Where can I find gpt-code-clippy?

CodedotAl/gpt-code-clippy is on GitHub at https://github.com/CodedotAl/gpt-code-clippy.

← all repositories

CodedotAl/gpt-code-clippy

Open-source Copilot clone admits: most of our models score zero

A community effort to replicate GitHub Copilot that publishes its training recipes, its failures, and its honest confusion about which model to use.

★3.3k stars Python Coding Assistants Language Models Data Tooling

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does GPT-Code-Clippy fine-tunes GPT-2 and GPT-Neo on scraped GitHub code to generate code completions. It ships a VS Code extension, a HuggingFace demo, and a 159GB deduplicated training dataset built from SEART GitHub Search plus The Pile. The project is explicitly framed as an open-source answer to GitHub Copilot.

The interesting bit The README’s candor is the feature. The authors publish HumanEval results showing their fine-tuned models scoring 0.00% on pass@1 through pass@10, note that “None improve on the standard GPT-Neo 125M model except for APPs specific models,” and leave TODOs asking which model is recommended and how to train properly. This is less a product than a public lab notebook.

Key highlights

Dataset filtered by regex deduplication on alphanumeric “variables,” with source code and a datasheet available
Training hyperparameters fully documented: AdamW with GPT-3-style cosine decay for CodeClippy, Adafactor for 1.3B APPS fine-tuning “in part determined by hardware limitations”
VS Code extension exists but relies on HuggingFace Inference API
Multiple model variants on HuggingFace Hub, including 125M and 1.3B parameter sizes
Active issue tracking a data bug where wrong filenames may have corrupted language filtering

Caveats

HumanEval results show base GPT-Neo outperforming all CodeClippy variants; several models score literally zero
A known dataset bug means file extensions used for language filtering may be wrong, with unknown impact on training data quality
README contains multiple TODOs and no clear guidance on which model or training path to follow

Verdict Worth following if you’re researching open-source code generation or want to see how a community project documents its stumbles in real time. Skip if you need a working Copilot replacement today.

Frequently asked

What is CodedotAl/gpt-code-clippy?: A community effort to replicate GitHub Copilot that publishes its training recipes, its failures, and its honest confusion about which model to use.
Is gpt-code-clippy open source?: Yes — CodedotAl/gpt-code-clippy is open source, released under the Apache-2.0 license.
What language is gpt-code-clippy written in?: CodedotAl/gpt-code-clippy is primarily written in Python.
How popular is gpt-code-clippy?: CodedotAl/gpt-code-clippy has 3.3k stars on GitHub.
Where can I find gpt-code-clippy?: CodedotAl/gpt-code-clippy is on GitHub at https://github.com/CodedotAl/gpt-code-clippy.