Parsing "一千零一夜" without losing your mind
A Python library that converts between Chinese numerals and Arabic digits, including the messy real-world stuff like dates, fractions, and temperatures.

What it does
cn2an converts Chinese numerals to Arabic digits and back. It handles plain numbers, mixed formats like “1百23”, formal uppercase financial characters (壹佰贰拾叁), and even RMB-style output with “元整” appended. Range is 10⁻¹⁶ to 10¹⁶ — roughly “a hundredth of a femtometer” to “a hundred quadrillion.”
The interesting bit
The sentence-level transform() function is where it gets practical: it finds numbers buried in running text and converts them in context, handling dates (二零零一年三月四日 → 2001年3月4日), fractions, percentages, and Celsius. There’s also a “direct” mode for literal digit-by-digit translation when you don’t want smart interpretation — useful for phone numbers or codes that look like dates but aren’t.
Key highlights
- Four input modes:
strict(proper numerals only),normal(casual 一二三),smart(mixed 1百23), anddirect(literal string translation) - Bidirectional:
cn2anandan2cnwithlow,up, andrmboutput styles - HTTP API for non-Python consumers (Java, Go, JS)
- Performance: ~29k conversions/sec for Chinese→Arabic, ~67k for the reverse (v0.5.1, max-length test data on a 2.3 GHz dual-core i5)
- Python 3.7+; tested on Ubuntu, Windows, macOS with 3.7, 3.9, 3.11
Caveats
- Sentence transformation is marked “experimental” and “may cause unexpected conversions” — the README’s warning, not mine
- No Python 2 support; you’d need to fork and patch
Verdict
Worth a look if you’re building Chinese ASR pipelines, financial document processors, or any NLP system that needs to normalize the glorious chaos of Chinese number writing. Skip it if your inputs are already clean Arabic digits or if you need guaranteed-perfect sentence-level extraction today.