kha-white/manga-ocr
A specialized OCR model using Vision Encoder Decoder transformers to recognize Japanese text in manga images.

This repository provides an optical character recognition system specifically optimized for Japanese manga. It uses a custom end-to-end model based on Hugging Face Transformers’ Vision Encoder Decoder architecture. The system handles manga-specific challenges including vertical and horizontal text orientation, furigana annotations, text overlaid on images, diverse font styles, and low-quality images. Unlike typical OCR tools, it processes multi-line text bubbles in a single forward pass without requiring line-by-line splitting.