Is Dolphin open source?

Yes — bytedance/Dolphin is an open-source project tracked on heatdrop.

What language is Dolphin written in?

bytedance/Dolphin is primarily written in Python.

How popular is Dolphin?

bytedance/Dolphin has 9k stars on GitHub.

Where can I find Dolphin?

bytedance/Dolphin is on GitHub at https://github.com/bytedance/Dolphin.

← all repositories

bytedance/Dolphin

ByteDance's 3B-parameter VLM that reads documents like a human does

A single model that classifies document type, analyzes layout, then parses elements in parallel—no pipeline of separate tools required.

★9k stars Python Computer Vision Data Tooling

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Dolphin-v2 turns document images and PDFs into structured output: JSON, Markdown, or individual parsed elements like tables, formulas, and code blocks. It handles both clean digital documents and messy photographed pages through a single vision-language model, rather than chaining together separate OCR, layout, and parsing tools.

The interesting bit

The two-stage architecture is the practical hook. Stage 1 classifies the document type and predicts reading order; Stage 2 then chooses its strategy—holistic parsing for photographed documents, parallel element-wise parsing for digital ones. The “heterogeneous anchor prompting” is essentially giving different element types (tables vs. formulas vs. text) different prompt templates, which the README claims improves accuracy without bloating the architecture.

Key highlights

Single VLM handles classification, layout analysis, and parsing—no external OCR or table extractors
3B parameters for v2 (up from 0.3B in v1.5), with benchmarks showing improvement across text edit distance, formula CDM, and table TEDS scores on OmniDocBench
Parallel element decoding with configurable batch size for throughput tuning
Supports vLLM and TensorRT-LLM for accelerated inference
Hugging Face Transformers integration for standard model loading
Multi-page PDF parsing available since June 2025

Caveats

The demo link (http://115.190.42.15:8888/dolphin/) is HTTP and may be unreliable or region-blocked
“Call for Bad Cases” notice suggests the model still has visible failure modes the authors are cataloging
Changelog dates appear to use 2025 future dates (e.g., “2025.12.12”), which is likely a typo or non-standard dating—actual release timeline is unclear

Verdict

Worth evaluating if you’re currently maintaining a fragile pipeline of Tesseract + layout model + table extractor. Skip if you need guaranteed deterministic output or work in a domain with strict formatting requirements the model hasn’t seen—those “bad cases” are explicitly being collected for a reason.

Frequently asked

What is bytedance/Dolphin?: A single model that classifies document type, analyzes layout, then parses elements in parallel—no pipeline of separate tools required.
Is Dolphin open source?: Yes — bytedance/Dolphin is an open-source project tracked on heatdrop.
What language is Dolphin written in?: bytedance/Dolphin is primarily written in Python.
How popular is Dolphin?: bytedance/Dolphin has 9k stars on GitHub.
Where can I find Dolphin?: bytedance/Dolphin is on GitHub at https://github.com/bytedance/Dolphin.