← all repositories
yoeo/guesslang

VS Code's auto-detection secret is a 90%-accurate TensorFlow model

A small Python tool that guesses which of 54 languages a code snippet is written in, now running inside millions of editors.

897 stars Python Domain AppsML Frameworks
guesslang
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

Guesslang takes a blob of source code and names the programming language. It supports 54 languages—from Assembly to YAML—and claims over 90% accuracy. You can pipe text to a CLI, call it from Python, or ask for probability scores when it’s uncertain.

The interesting bit

The model isn’t just a regex party. It’s a TensorFlow neural network trained on actual code, and Microsoft baked it into VS Code’s automatic language detection. That’s a lot of trust for a side project with under 900 GitHub stars.

Key highlights

  • 54 supported languages, including oddballs like COBOL, DM, and Verilog
  • CLI with --probabilities flag for ranked guesses
  • Python API via Guess().language_name(source)
  • Powers VS Code’s paste-to-detect feature (since v1.60)
  • Training pipeline exists as separate guesslangtools repo

Caveats

  • Requires Python 3.7+ and TensorFlow, which means Visual C++ redistributables on Windows
  • The 90% accuracy claim lacks published benchmark details in the README
  • No clear indication of model size or inference speed

Verdict

Worth a look if you’re building anything that needs to syntax-highlight unknown code or route snippets to the right parser. Skip it if you need guaranteed accuracy for security-critical classification—this is educated guessing, not certainty.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.