Is php-text-analysis open source?

Yes — yooper/php-text-analysis is open source, released under the MIT license.

What language is php-text-analysis written in?

yooper/php-text-analysis is primarily written in PHP.

How popular is php-text-analysis?

yooper/php-text-analysis has 533 stars on GitHub.

Where can I find php-text-analysis?

yooper/php-text-analysis is on GitHub at https://github.com/yooper/php-text-analysis.

← all repositories

yooper/php-text-analysis

NLP for the PHP holdouts: tokenize without leaving your stack

A PHP-native library that brings text analysis, sentiment scoring, and document classification to codebases that can't justify a Python microservice.

★533 stars PHP Language Models Data Tooling

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

php-text-analysis is a PHP library for common NLP and information-retrieval tasks: tokenization, stemming, frequency analysis, n-grams, sentiment scoring with VADER, keyword extraction with RAKE, and naive Bayes document classification. It exposes most operations through plain helper functions (tokenize(), stem(), vader(), naive_bayes()) so you can get from raw text to results in a few lines.

The interesting bit

The library wraps well-known algorithms—Porter stemmer, Penn TreeBank tokenizer, VADER sentiment—in a single Composer package with a consistent PHP API. That’s the value: not algorithmic novelty, but keeping text-processing logic inside a PHP monolith instead of ferrying data to a Python service.

Key highlights

Tokenizers are swappable; default is GeneralTokenizer, but PennTreeBankTokenizer and others can be passed by class name
normalize_tokens() accepts custom callbacks or string function names (e.g., mb_strtolower)
Built-in RAKE keyword extraction and VADER sentiment analysis, both invoked through one-line helpers
Naive Bayes classifier with a simple train()/predict() interface; movie-review example in the unit tests
N-gram generation defaults to bigrams but supports custom lengths and delimiters

Caveats

Documentation is split across an unfinished book repo and a wiki; the README itself is mostly a function reference
Some tokenizers “require parameters to be set upon instantiation”—the README notes this but doesn’t explain which ones or how
No benchmarks, accuracy metrics, or corpus size guidance is provided

Verdict

Worth a look if you’re maintaining a PHP application that needs light NLP and can’t absorb the operational cost of a second runtime. If you’re starting fresh or doing heavy text processing, Python’s ecosystem is still the pragmatic choice.

Frequently asked

What is yooper/php-text-analysis?: A PHP-native library that brings text analysis, sentiment scoring, and document classification to codebases that can't justify a Python microservice.
Is php-text-analysis open source?: Yes — yooper/php-text-analysis is open source, released under the MIT license.
What language is php-text-analysis written in?: yooper/php-text-analysis is primarily written in PHP.
How popular is php-text-analysis?: yooper/php-text-analysis has 533 stars on GitHub.
Where can I find php-text-analysis?: yooper/php-text-analysis is on GitHub at https://github.com/yooper/php-text-analysis.