← all repositories
chakki-works/seqeval

Finally, a Python NER evaluator that doesn't require Perl

seqeval replaces the venerable conlleval script with native Python metrics for sequence labeling tasks.

1.2k stars Python LLMOps · EvalData Tooling
seqeval
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does

seqeval computes standard classification metrics—accuracy, precision, recall, F1, and full reports—for sequence labeling tasks like named-entity recognition and POS tagging. It accepts the same list-of-lists format you’d already be using for token-level labels.

The interesting bit

The library has two personalities. “Default” mode deliberately mimics the original Perl conlleval script, warts and all, so your numbers stay comparable to twenty years of published NER papers. “Strict” mode actually validates against tagging schemes (IOB2, IOBES, BILOU, etc.), catching invalid sequences that default mode would silently score as correct. The README’s minimal example is telling: a prediction starting with I-NP instead of B-NP scores perfect 1.00 in default mode and 0.00 in strict mode.

Key highlights

  • Drop-in sklearn-style API: f1_score(y_true, y_pred) and classification_report()
  • Six tagging schemes supported, though IOBES and BILOU only work in strict mode
  • Self-described as “well-tested” against the original Perl conlleval
  • One-line install: pip install seqeval

Caveats

  • The README doesn’t specify how the “well-tested” claim was validated—no test coverage stats, no continuous integration badges
  • Strict mode requires you to pass both mode='strict' and a scheme argument; forget one and it presumably falls back to default behavior

Verdict

Anyone training NER or chunking models in Python who needs conlleval-compatible numbers without spawning a Perl process. If you’re doing non-bio sequence tasks or already have a working evaluation pipeline, this is just another dependency.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.