← all repositories
meta-toolkit/meta

A C++ NLP toolkit that actually builds on Windows

MeTA bundles tokenization, search indexes, topic models, and CRFs into one compiled toolkit for researchers who'd rather fight algorithms than package managers.

meta
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

MeTA is a C++ data-sciences toolkit for text analysis: tokenization with parse trees, compressed inverted/forward indexes, ranking functions, topic models, classification, graph algorithms, language models, and CRF-based POS tagging. It wraps liblinear and libsvm, supports UTF-8 for multilingual work, and runs multithreaded algorithms.

The interesting bit

The build guides are the real documentation here—exhaustive, platform-specific instructions for macOS, five Ubuntu versions, Arch, Fedora, CentOS, and Windows via AppVeyor. Someone clearly suffered through compiler hell so you don’t have to.

Key highlights

  • Compressed indexes with pluggable caching strategies
  • CRF implementation for POS tagging and shallow parsing
  • UTF-8 support for non-English text analysis
  • Multithreaded algorithms throughout
  • Published ACL 2016 demo paper with official citation

Caveats

  • Last meaningful activity appears to be 2016; Travis CI and AppVeyor badges suggest legacy CI infrastructure
  • Requires jemalloc, ICU, and CMake 3.2+—not header-only or trivial to drop into existing projects

Verdict

Good fit if you’re doing reproducible NLP research in C++ and need a unified, citable toolkit. Skip it if you want Python bindings, GPU acceleration, or a project with active maintenance.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.