← all repositories
ynqa/wego

Word2Vec and friends, written in Go from scratch

A Go-native toolkit for training word embeddings when you don't want to leave the gopher ecosystem.

wego
Velocity · 7d
+0.1
★ / day
Trend
steady
star history

What it does

wego implements three classic word embedding algorithms—Word2Vec, GloVe, and LexVec—directly in Go, with no Python bindings or C extensions required. You feed it a space-separated text corpus, it trains vectors, and you get a text file of word-to-vector mappings plus CLI tools to query them.

The interesting bit

The project leans into Go’s strengths rather than fighting them: HogWild! asynchronous updates mean training is nondeterministic between runs, but you skip the locking overhead. There’s also a REPL console for doing vector arithmetic (King − Man + Woman ≈ Queen) without leaving the terminal.

Key highlights

  • Three models: Word2Vec (CBOW and Skip-gram), GloVe, and LexVec
  • CLI for training, querying nearest neighbors, and interactive vector math
  • Go SDK with functional options for hyperparameters
  • Outputs standard text format compatible with other embedding tools
  • Inspired by chewxy’s “Data Science in Go” talk

Caveats

  • Training is nondeterministic by design (HogWild! algorithm), so reproducibility requires fixed seeds or multiple runs
  • Input format is strict: space-separated tokens only, no sentence boundaries or preprocessing built in
  • 506 stars suggests a niche audience; ecosystem maturity lags behind Python’s gensim or spaCy

Verdict

Worth a look if you’re building Go-native NLP pipelines and want to avoid Python interop. Skip it if you need production-grade preprocessing, deterministic training, or the broader model zoo that Python ecosystems provide.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.