Deprecated but instructive: a Python bridge to Java NLP
A thin Python wrapper around Stanford's Java CoreNLP server, now officially abandoned in favor of Stanza.

What it does
This package launches Stanford CoreNLP’s Java server in the background and talks to it over HTTP. You get tokenization, POS tagging, named entities, dependency parsing, and two pattern-matching mini-languages (tokensregex and semgrex) without writing Java. There’s also a base class for exposing your own Python annotators back into CoreNLP’s pipeline—though the README admits this relies on unreleased internal code.
The interesting bit
The bidirectional design is unusual: not only can Python call CoreNLP, but Python code can masquerade as a CoreNLP annotator via a lightweight service. That’s a neat inversion for shops with custom NER or tokenization already written in Python.
Key highlights
- Requires downloading the Java CoreNLP release and setting
$CORENLP_HOME - Ships with a CLI tool (
annotate) that pairs naturally withjqfor JSON wrangling - Supports tokensregex and semgrex queries directly from Python
- Installable via
pip install stanford-corenlp - Officially deprecated — the maintainers point to Stanza instead
Caveats
- The annotation-service feature (custom Python annotators plugged into CoreNLP) depends on experimental Stanford internals that “are not yet available for public use”
- The project is deprecated, so expect no maintenance
Verdict
Worth a quick read if you’re maintaining legacy CoreNLP integrations or curious how to bridge Python and Java NLP pipelines. Everyone else should skip directly to Stanza, which the authors themselves recommend.