← all repositories

NLPOptimize/flash-tokenizer

A high-performance C++ tokenizer implementation for LLM inference serving.

flash-tokenizer
Velocity · 7d
+1.0
★ / day
Trend
steady
star history

FlashTokenizer is a CPU-optimized tokenizer library written in C++ that implements BertTokenizer for LLM inference. It uses pybind11 for Python bindings and achieves reportedly 10x faster tokenization than HuggingFace’s BertTokenizerFast. The implementation uses trie-based data structures for the WordPiece algorithm and is designed as a drop-in replacement for HuggingFace tokenizers in production inference scenarios.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.