Is ComfyUI-Florence2 open source?

Yes — kijai/ComfyUI-Florence2 is open source, released under the MIT license.

What language is ComfyUI-Florence2 written in?

kijai/ComfyUI-Florence2 is primarily written in Python.

How popular is ComfyUI-Florence2?

kijai/ComfyUI-Florence2 has 1.7k stars on GitHub.

Where can I find ComfyUI-Florence2?

kijai/ComfyUI-Florence2 is on GitHub at https://github.com/kijai/ComfyUI-Florence2.

← all repositories

kijai/ComfyUI-Florence2

Teaching ComfyUI to read your receipts

This repo wraps Microsoft’s Florence-2 vision model into ComfyUI nodes so you can caption images, detect objects, and ask documents questions inside your workflow.

★1.7k stars Python Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

ComfyUI-Florence2 plugs Microsoft’s Florence-2 vision foundation model into ComfyUI’s node graph. You can caption images, detect objects, segment scenes, and query documents by connecting nodes instead of writing inference scripts. A built-in loader node can also download official and community finetuned checkpoints directly from HuggingFace into the expected model path.

The interesting bit

The DocVQA node is the unusual angle here: it lets you point the model at a scanned receipt or form, ask a plain-English question like “What is the total amount?”, and get back an answer derived from the visual layout. That turns a generative-art tool into a surprisingly practical document parser.

Key highlights

Supports official Florence-2 variants (base, large, fine-tuned) and community finetunes, including captioners optimized for Stable Diffusion 3 and Flux.
DocVQA node extracts information from text-heavy images using natural-language prompts.
DownloadAndLoadFlorence2Model node auto-pulls weights from HuggingFace to ComfyUI/models/LLM.
Relies on transformers 4.38.0 or newer.

Caveats

The README explicitly warns that DocVQA accuracy hinges on image quality and question complexity.
The project is largely glue code between HuggingFace’s Florence-2 implementation and ComfyUI; don’t expect novel model architecture here.

Verdict

Handy if you want vision-language tasks inside ComfyUI’s visual workflow. If you’re comfortable with Python scripts or APIs, running Florence-2 directly through HuggingFace is probably cleaner.

Frequently asked

What is kijai/ComfyUI-Florence2?: This repo wraps Microsoft’s Florence-2 vision model into ComfyUI nodes so you can caption images, detect objects, and ask documents questions inside your workflow.
Is ComfyUI-Florence2 open source?: Yes — kijai/ComfyUI-Florence2 is open source, released under the MIT license.
What language is ComfyUI-Florence2 written in?: kijai/ComfyUI-Florence2 is primarily written in Python.
How popular is ComfyUI-Florence2?: kijai/ComfyUI-Florence2 has 1.7k stars on GitHub.
Where can I find ComfyUI-Florence2?: kijai/ComfyUI-Florence2 is on GitHub at https://github.com/kijai/ComfyUI-Florence2.