Is fastText_multilingual open source?

Yes — babylonhealth/fastText_multilingual is open source, released under the BSD-3-Clause license.

What language is fastText_multilingual written in?

babylonhealth/fastText_multilingual is primarily written in Jupyter Notebook.

How popular is fastText_multilingual?

babylonhealth/fastText_multilingual has 1.2k stars on GitHub.

Where can I find fastText_multilingual?

babylonhealth/fastText_multilingual is on GitHub at https://github.com/babylonhealth/fastText_multilingual.

← all repositories

babylonhealth/fastText_multilingual

Make 78 languages share one vector space, no neural training required

Pre-computed linear transforms let you compare "chat" and "кот" directly in fastText space.

★1.2k stars Jupyter Notebook Language Models

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Facebook’s fastText word vectors are monolingual: “chat” and “кот” (French and Russian for “cat”) sit in unrelated vector spaces, with cosine similarity near zero. This repo ships 78 pre-computed alignment matrices that rotate each language’s vectors into English space via simple linear transforms. Apply a matrix, and cross-lingual nearest neighbors suddenly work.

The interesting bit

The trick is an orthogonal transformation learned from SVD on a small bilingual dictionary—just 5,000 word pairs translated via Google Translate API. Because every language aligns to English (English gets the identity matrix), the system generalizes to language pairs it never saw paired directly. French-to-Russian translation works even though the training data was French-English and English-Russian.

Key highlights

78 pre-computed alignment matrices for fastText’s original 89-language set
Monolingual similarity relationships preserved exactly—no distortion within a language
Precision@1 hits 0.73 for French, 0.72 for Spanish, 0.60 for Russian; drops toward 0.06 for low-resource languages like Cebuano
Includes align_your_own.ipynb to learn custom matrices for new language pairs
Based on ICLR 2017 paper: Offline bilingual word vectors, orthogonal transformations and the inverted softmax

Caveats

Babylon Health no longer maintains the repo; paper authors are the contact for issues
Google Translate API coverage limits language selection (11 of Facebook’s original languages excluded)
Performance degrades sharply for non-European and low-resource languages

Verdict

Worth a look if you need quick cross-lingual word similarity without training a multilingual model from scratch. Skip it if you need sentence-level embeddings or state-of-the-art performance on distant language pairs—this is word-level, 2017-era technology.

Frequently asked

What is babylonhealth/fastText_multilingual?: Pre-computed linear transforms let you compare "chat" and "кот" directly in fastText space.
Is fastText_multilingual open source?: Yes — babylonhealth/fastText_multilingual is open source, released under the BSD-3-Clause license.
What language is fastText_multilingual written in?: babylonhealth/fastText_multilingual is primarily written in Jupyter Notebook.
How popular is fastText_multilingual?: babylonhealth/fastText_multilingual has 1.2k stars on GitHub.
Where can I find fastText_multilingual?: babylonhealth/fastText_multilingual is on GitHub at https://github.com/babylonhealth/fastText_multilingual.