← all repositories
apache/opennlp

A Java NLP toolkit that predates the transformer craze

Apache OpenNLP keeps classic ML approaches alive for tokenization, NER, and parsing without requiring a GPU farm.

1.6k stars Java Other AI
opennlp
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does Apache OpenNLP is a Java library for fundamental NLP tasks: tokenization, sentence splitting, POS tagging, named-entity recognition, chunking, parsing, and language detection. It exposes both a Java API and a CLI, and ships with pre-built demo models you can download for testing.

The interesting bit The 3.x release line modularizes the monolithic toolkit into fine-grained Maven artifacts (opennlp-runtime, opennlp-ml-maxent, opennlp-dl, etc.) so you import only what you need. Core *ME classes are finally thread-safe as of 3.0.0, eliminating the old dance of pooling model instances per thread. There’s also an ONNX adapter (opennlp-dl) if you want to bridge into modern neural models without abandoning the ecosystem.

Key highlights

  • Supports Maximum Entropy, Perceptron, Naive Bayes, and SVM classifiers out of the box
  • Modular architecture in 3.x reduces dependency bloat; API remains backward-compatible with 2.x
  • Thread-safe core classes as of 3.0.0
  • ONNX runtime integration with optional GPU module (opennlp-dl-gpu)
  • Bundled stopword lists for 11 languages
  • Plays nice with Apache Flink, NiFi, and Spark pipelines

Caveats

  • Requires JDK 21+ for 3.x (2.x branch stays on JDK 17 for maintenance)
  • Demo models are explicitly labeled for testing only; you must train your own for production use
  • Package namespace will migrate from opennlp to org.apache.opennlp in a future release (possibly 4.x), so expect some import churn eventually

Verdict Worth a look if you’re building JVM-based text pipelines and want battle-tested, lightweight NLP primitives without dragging in PyTorch or Hugging Face. Skip it if you need state-of-the-art transformer models natively; the ONNX bridge helps, but that’s not the project’s center of gravity.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.