Is fugashi open source?

Yes — polm/fugashi is open source, released under the MIT license.

What language is fugashi written in?

polm/fugashi is primarily written in C++.

How popular is fugashi?

polm/fugashi has 533 stars on GitHub.

Where can I find fugashi?

polm/fugashi is on GitHub at https://github.com/polm/fugashi.

← all repositories

polm/fugashi

Japanese tokenization that doesn't make you compile MeCab

A Cython wrapper that turns a finicky C++ tokenizer into a pip-installable Python library with sensible defaults.

★533 stars C++ Data Tooling

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

fugashi wraps MeCab, the venerable C++ Japanese morphological analyzer, in a Cython layer so you can pip install it on most platforms without touching a compiler. It ships with wheels for Linux, macOS (Intel), and Windows 64-bit, and bundles named-tuple access to UniDic’s rich feature data — lemmas, part-of-speech tags, and more — directly from Python.

The interesting bit

The author wrote this after finding that existing MeCab Python bindings were “hard to use and lack English documentation.” The fix: expose MeCab’s output as Python objects with .feature.lemma and .pos attributes, plus provide two curated dictionary packages — a 2013 “lite” version for quick starts, and the full 770MB modern UniDic for serious work.

Key highlights

pip install fugashi[unidic-lite] gets you tokenizing in seconds
Tagger assumes UniDic; GenericTagger works with arbitrary dictionaries via field-number access
create_feature_wrapper() lets you build named-tuple interfaces for custom dictionaries
Published at NLP-OSS 2020 with a proper academic citation
Interactive Streamlit demo at fugashi.streamlit.app

Caveats

No wheels for musl/Alpine Linux, PowerPC, or 32-bit Windows (build from source required)
Full UniDic needs a separate python -m unidic download step and ~770MB disk
Apple Silicon users: status unclear from README (Intel wheels mentioned explicitly)

Verdict

Worth a look if you’re doing Japanese NLP in Python and want MeCab’s accuracy without its deployment headaches. If you need Korean tokenization or refuse to install any C++ dependency at all, the README nudges you toward SudachiPy or pymecab-ko instead.

Frequently asked

What is polm/fugashi?: A Cython wrapper that turns a finicky C++ tokenizer into a pip-installable Python library with sensible defaults.
Is fugashi open source?: Yes — polm/fugashi is open source, released under the MIT license.
What language is fugashi written in?: polm/fugashi is primarily written in C++.
How popular is fugashi?: polm/fugashi has 533 stars on GitHub.
Where can I find fugashi?: polm/fugashi is on GitHub at https://github.com/polm/fugashi.