← all repositories

FoundationVision/UniTok

A unified visual tokenizer that converts images into discrete tokens for use in autoregressive generation and multimodal understanding models.

UniTok
Velocity · 7d
+1.1
★ / day
Trend
steady
star history

UniTok is a unified visual tokenizer designed for both visual generation and understanding tasks. It provides discrete tokenization of images compatible with autoregressive generative models like LlamaGen and multimodal understanding models like LLaVA. The tokenizer supports unified multimodal LLMs including Chameleon and Liquid, enabling both image generation and comprehension within the same framework. It was published at NeurIPS 2025 as a Spotlight paper.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.