A transformer that flattens Chinese word lattices without the combinatorial mess
Research code for FLAT, an ACL 2020 paper that rethinks how to feed Chinese word-segmentation ambiguities into a transformer without exploding the sequence length.

What it does
FLAT tackles Chinese Named Entity Recognition by encoding a “flat lattice” — essentially all possible word spans from a lexicon — as a single sequence with position-aware attention, rather than building an actual lattice graph. The repo contains two variants: V0 without BERT and V1 with BERT, plus later memory-optimized versions (V2’s tensor.unique() deduplication and V3’s scalar position encoding). You point it at OntoNotes, MSRA, Weibo, or Resume datasets after downloading gigaword character/bigram embeddings and one of two word embedding sets.
The interesting bit
The cleverness is in the framing: instead of wrestling with graph neural networks over word lattices, FLAT treats lattice nodes as a flat sequence and uses relative position encoding to preserve span boundary information. The 2022 update then squeezes memory dramatically — Flat_scalar drops from 8.4GB to 1.3GB at 300-token sequences — by replacing the full relative position matrix with scalars.
Key highlights
- ACL 2020 paper implementation with reproducible scripts for four standard Chinese NER datasets
- Two BERT integration modes (V0 bare, V1 BERT-augmented) plus later memory-optimized variants
- Explicit memory benchmarks showing 6× reduction with scalar encoding vs. original
- Built on FastNLP 0.5.0 (somewhat dated stack: Python 3.7, PyTorch 1.2)
- fitlog integration for experiment tracking, though it’s opt-in
Caveats
- Dependency versions are frozen in 2019; expect friction with modern PyTorch/CUDA
- Pretrained embeddings require manual download from Google Drive or Baidu Pan, then path configuration in
paths.py - README is bilingual but thin on architecture details — you’ll need the paper for actual understanding
Verdict
Worth a look if you’re doing Chinese NER research or need a baseline that handles word segmentation ambiguity without GNN complexity. Skip it if you want a maintained, pip-installable library; this is paper reproduction code with a dusty dependency stack.