NLPOptimize/flash-tokenizer
A high-performance C++ tokenizer implementation for LLM inference serving.

Velocity · 7d
+1.0
★ / day
Trend
→steady
star history
FlashTokenizer is a CPU-optimized tokenizer library written in C++ that implements BertTokenizer for LLM inference. It uses pybind11 for Python bindings and achieves reportedly 10x faster tokenization than HuggingFace’s BertTokenizerFast. The implementation uses trie-based data structures for the WordPiece algorithm and is designed as a drop-in replacement for HuggingFace tokenizers in production inference scenarios.