VS Code's auto-detection secret is a 90%-accurate TensorFlow model
A small Python tool that guesses which of 54 languages a code snippet is written in, now running inside millions of editors.

What it does
Guesslang takes a blob of source code and names the programming language. It supports 54 languages—from Assembly to YAML—and claims over 90% accuracy. You can pipe text to a CLI, call it from Python, or ask for probability scores when it’s uncertain.
The interesting bit
The model isn’t just a regex party. It’s a TensorFlow neural network trained on actual code, and Microsoft baked it into VS Code’s automatic language detection. That’s a lot of trust for a side project with under 900 GitHub stars.
Key highlights
- 54 supported languages, including oddballs like COBOL, DM, and Verilog
- CLI with
--probabilitiesflag for ranked guesses - Python API via
Guess().language_name(source) - Powers VS Code’s paste-to-detect feature (since v1.60)
- Training pipeline exists as separate
guesslangtoolsrepo
Caveats
- Requires Python 3.7+ and TensorFlow, which means Visual C++ redistributables on Windows
- The 90% accuracy claim lacks published benchmark details in the README
- No clear indication of model size or inference speed
Verdict
Worth a look if you’re building anything that needs to syntax-highlight unknown code or route snippets to the right parser. Skip it if you need guaranteed accuracy for security-critical classification—this is educated guessing, not certainty.