Is word2vec open source?

Yes — dav/word2vec is open source, released under the Apache-2.0 license.

What language is word2vec written in?

dav/word2vec is primarily written in C.

How popular is word2vec?

dav/word2vec has 1.7k stars on GitHub.

Where can I find word2vec?

dav/word2vec is on GitHub at https://github.com/dav/word2vec.

← all repositories

dav/word2vec

Google Code's word2vec, rescued for Macs and mortals

A GitHub mirror of the original word2vec C implementation that applies community patches so it compiles on modern systems.

★1.7k stars C ML Frameworks Language Models RAG · Search

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

This is the original word2vec C implementation—CBOW and Skip-gram architectures for turning words into dense vectors—lifted from Google Code and patched to build on Mac OS X. It trains on text corpora and outputs word embeddings you can use for similarity queries or downstream NLP. The repo includes demo scripts that download a 100MB sample corpus and walk you through training.

The interesting bit

The value here isn’t novelty; it’s archaeology with maintenance. The original Google Code project went dormant, and this fork exists specifically to “apply and track community patches”—starting with Mac compilation fixes and a memory patch. It’s a preservation effort for a foundational tool that predates the Python reimplementations everyone uses now.

Key highlights

Original C implementation of CBOW and Skip-gram with hierarchical softmax and negative sampling
Patched makefile and source for Mac OS X compilation (the main reason this fork exists)
Memory leak patch applied from the original Google Code issue tracker
Includes demo-word.sh for quick end-to-end testing
Supports threaded training and binary/text output formats

Caveats

The compute-accuracy utility has a known segfault; the README flags this explicitly
Project file layout was altered from the original, which may confuse anyone comparing against old documentation
This is a 2013-era codebase; modern users likely want gensim, fastText, or torch/nn.Embedding instead

Verdict

Grab this if you need the original C implementation to compile on a Mac, or if you’re doing historical research on word embedding implementations. Skip it if you just need word vectors in production—use a modern library with active maintenance and no known segfaults.

Frequently asked

What is dav/word2vec?: A GitHub mirror of the original word2vec C implementation that applies community patches so it compiles on modern systems.
Is word2vec open source?: Yes — dav/word2vec is open source, released under the Apache-2.0 license.
What language is word2vec written in?: dav/word2vec is primarily written in C.
How popular is word2vec?: dav/word2vec has 1.7k stars on GitHub.
Where can I find word2vec?: dav/word2vec is on GitHub at https://github.com/dav/word2vec.