Is tiktoken open source?

Yes — openai/tiktoken is open source, released under the MIT license.

What language is tiktoken written in?

openai/tiktoken is primarily written in Python.

How popular is tiktoken?

openai/tiktoken has 18.8k stars on GitHub and is currently accelerating.

Where can I find tiktoken?

openai/tiktoken is on GitHub at https://github.com/openai/tiktoken.

← all repositories

openai/tiktoken

A faster way to count the tokens that eat your API budget

tiktoken gives you OpenAI’s byte-pair encoder locally so you can count tokens exactly instead of guessing and racking up API charges.

★18.8k stars Python Other AI

View on GitHub ↗

Velocity · 7d

+8.0

★ / day

Trend

↗accelerating

star history

What it does

It is a library that implements the exact byte-pair encoding schemes used by OpenAI models like GPT-4o. It turns strings into sequences of integer tokens and back again—the same lossless translation the API performs before billing you. If you need to know whether your prompt fits in the context window or how much a completion will cost, this is the reference implementation.

The interesting bit

The README claims a 3–6× speedup over comparable open-source tokenizers when processing 1 GB of text, which matters when you are preprocessing large datasets. It also ships a tiktoken._educational submodule that lets you visualize how BPE merges text into subwords like “ing”—a rare case of a production tool including a built-in lesson.

Key highlights

Claims to be 3–6× faster than comparable open-source tokenizers on bulk text (per the project’s own benchmark on GPT-2 tokenization).
Supports model-specific encodings like o200k_base and cl100k_base, and can map directly from an API model name such as gpt-4o.
Reversible and lossless: you can encode arbitrary text, even unseen text, and decode back to the exact original string.
Includes an educational submodule for training and visualizing BPE merge procedures.
Extensible via custom Encoding objects or a tiktoken_ext namespace-package plugin system.

Caveats

The performance comparison is self-reported against specific older versions of transformers and tokenizers, so your mileage may vary.
The extension mechanism relies on namespace packages and explicitly warns against editable installs, which adds friction.
Production extension examples currently access private attributes (_pat_str, _mergeable_ranks) with a note that you should “load the arguments directly” instead, suggesting the public API for customization is still somewhat implicit.

Verdict

Essential if you are building prompt-management, pricing, or context-window logic around OpenAI models and need exact token counts offline. If you are using a different model family or do not care about precise pre-flight token accounting, you can safely ignore it.

Frequently asked

What is openai/tiktoken?: tiktoken gives you OpenAI’s byte-pair encoder locally so you can count tokens exactly instead of guessing and racking up API charges.
Is tiktoken open source?: Yes — openai/tiktoken is open source, released under the MIT license.
What language is tiktoken written in?: openai/tiktoken is primarily written in Python.
How popular is tiktoken?: openai/tiktoken has 18.8k stars on GitHub and is currently accelerating.
Where can I find tiktoken?: openai/tiktoken is on GitHub at https://github.com/openai/tiktoken.