Is jieba-php open source?

Yes — fukuball/jieba-php is open source, released under the MIT license.

What language is jieba-php written in?

fukuball/jieba-php is primarily written in PHP.

How popular is jieba-php?

fukuball/jieba-php has 1.4k stars on GitHub.

Where can I find jieba-php?

fukuball/jieba-php is on GitHub at https://github.com/fukuball/jieba-php.

← all repositories

fukuball/jieba-php

PHP's answer to Chinese text segmentation

A PHP port of the popular jieba library that slices Chinese text into meaningful words without calling an LLM API.

★1.4k stars PHP Data Tooling Other AI

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

jieba-php segments Chinese text into words — a surprisingly hard problem since Chinese doesn’t use spaces. It offers three modes: precise (default), full (over-generates all possible words), and search-engine (splits long words for better recall). It also handles keyword extraction via TF-IDF, part-of-speech tagging, and custom dictionaries.

The interesting bit

The README is admirably honest: it admits LLMs now do better segmentation, but this runs locally, cheaply, and fast. Under the hood it uses a Trie tree to build a directed acyclic graph of possible word paths, then dynamic programming to find the highest-probability split. For unknown words, it falls back to an HMM with Viterbi decoding — classic NLP machinery that doesn’t need a GPU.

Key highlights

Three segmentation modes for different trade-offs between precision and recall
Supports traditional Chinese (switch dictionary to “big” mode)
CJK support: Chinese, Japanese, Korean text processing
Custom dictionaries with word frequency and part-of-speech tags
TF-IDF keyword extraction with stop-word filtering
Memory management and caching optimizations (critical given the ini_set('memory_limit', '1024M') in examples)

Caveats

Requires substantial memory: examples show 600M–1024M limits, suggesting the dictionary is loaded entirely into RAM
Manual installation path is tedious (multiple require_once statements); Composer is strongly preferred
README notes it originated as a translation of the Python jieba, though it now maintains its own branch

Verdict

Worth a look if you’re building search, indexing, or analytics in PHP and need Chinese segmentation without API dependencies. Skip it if you’re already running Python infrastructure or need state-of-the-art accuracy — the authors themselves suggest LLMs for that.

Frequently asked

What is fukuball/jieba-php?: A PHP port of the popular jieba library that slices Chinese text into meaningful words without calling an LLM API.
Is jieba-php open source?: Yes — fukuball/jieba-php is open source, released under the MIT license.
What language is jieba-php written in?: fukuball/jieba-php is primarily written in PHP.
How popular is jieba-php?: fukuball/jieba-php has 1.4k stars on GitHub.
Where can I find jieba-php?: fukuball/jieba-php is on GitHub at https://github.com/fukuball/jieba-php.