← all repositories
mit-nlp/MITIE

MITIE: A no-strings-attached NER toolkit from MIT

Free, commercial-friendly named entity extraction and relation detection with pre-trained models for English, Spanish, and German.

3k stars C++ Other AI
MITIE
Velocity · 7d
+0.7
★ / day
Trend
steady
star history

What it does MITIE extracts named entities (people, places, organizations) and detects binary relations between them from raw text. It ships with pre-trained models for English, Spanish, and German, plus tools to train your own extractors. The core is C++, but bindings cover Python, R, Java, C, MATLAB, and a small ecosystem of third-party wrappers for OCaml, .NET, PHP, and Ruby.

The interesting bit The authors openly admit MITIE is “basically just a thin wrapper around dlib” — a refreshing dose of honesty in academic software. The actual heavy lifting comes from dlib’s machine learning toolkit, combined with distributional word embeddings and Structural SVMs. The value is in the packaging: pre-trained models built on CoNLL 2003, ACE, Wikipedia, Freebase, and Gigaword, ready to run from a command-line pipe or your language of choice.

Key highlights

  • Boost Software License: genuinely free, including commercial use
  • Pre-trained NER models for three languages; relation detection included
  • Python support spans 2.7 through 3.8+ using only standard library ctypes
  • Command-line streaming tool (ner_stream) for quick text markup
  • CMake and make builds, with optional OpenBLAS acceleration

Caveats

  • Model downloads are manual and split by language — no single “install and go” package
  • Java bindings require SWIG, CMake, and careful 64-bit Windows handling
  • The README’s “state-of-the-art” claim links to a wiki evaluation page, not inline numbers

Verdict Worth a look if you need a permissively-licensed, self-hosted NER solution without the dependency bloat of modern neural toolkits. Skip it if you want SOTA transformer-based accuracy or a batteries-included pip install.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.