← all repositories
Ailln/cn2an

Parsing "一千零一夜" without losing your mind

A Python library that converts between Chinese numerals and Arabic digits, including the messy real-world stuff like dates, fractions, and temperatures.

760 stars Python Other AI
cn2an
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

cn2an converts Chinese numerals to Arabic digits and back. It handles plain numbers, mixed formats like “1百23”, formal uppercase financial characters (壹佰贰拾叁), and even RMB-style output with “元整” appended. Range is 10⁻¹⁶ to 10¹⁶ — roughly “a hundredth of a femtometer” to “a hundred quadrillion.”

The interesting bit

The sentence-level transform() function is where it gets practical: it finds numbers buried in running text and converts them in context, handling dates (二零零一年三月四日 → 2001年3月4日), fractions, percentages, and Celsius. There’s also a “direct” mode for literal digit-by-digit translation when you don’t want smart interpretation — useful for phone numbers or codes that look like dates but aren’t.

Key highlights

  • Four input modes: strict (proper numerals only), normal (casual 一二三), smart (mixed 1百23), and direct (literal string translation)
  • Bidirectional: cn2an and an2cn with low, up, and rmb output styles
  • HTTP API for non-Python consumers (Java, Go, JS)
  • Performance: ~29k conversions/sec for Chinese→Arabic, ~67k for the reverse (v0.5.1, max-length test data on a 2.3 GHz dual-core i5)
  • Python 3.7+; tested on Ubuntu, Windows, macOS with 3.7, 3.9, 3.11

Caveats

  • Sentence transformation is marked “experimental” and “may cause unexpected conversions” — the README’s warning, not mine
  • No Python 2 support; you’d need to fork and patch

Verdict

Worth a look if you’re building Chinese ASR pipelines, financial document processors, or any NLP system that needs to normalize the glorious chaos of Chinese number writing. Skip it if your inputs are already clean Arabic digits or if you need guaranteed-perfect sentence-level extraction today.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.