baudm/parseq
A deep learning model for scene text recognition using permuted autoregressive sequence modeling with vision transformers.

Velocity · 7d
+0.4
★ / day
Trend
→steady
star history
This repository implements PARSeq, a scene text recognition method published at ECCV 2022. It uses a vision transformer encoder-decoder architecture with permuted autoregressive sequence modeling to recognize text in natural scene images. The model has been adopted by major OCR libraries PaddleOCR and docTR, and includes support for PyTorch 2.0 and Lightning 2.0.