LLMs face their toughest test: passing a fortune-telling exam
A benchmark that grades AI models on Chinese astrology, because reasoning about career and marriage from birth charts is apparently a legitimate ML evaluation now.

What it does
MingLi-Bench runs multiple-choice tests from the annual Global Fortune Teller Competition (2022–2025) against LLMs via a tidy Python CLI. It covers Bazi (八字) and Ziwei Doushu (紫微斗数) across twelve life categories—career, health, marriage, wealth, and the rest of the human condition. Scoring is exact-match against ground truth, no partial credit for poetic ambiguity.
The interesting bit
The --astro flag is the clever isolation layer: it injects pre-computed astrological charts so you’re testing reasoning, not whether the model can correctly convert a lunar birth date into heavenly stems and earthly branches. The authors also recommend --cot so the model can talk itself through the chart before committing to an answer—essentially chain-of-thought for chi distribution.
Key highlights
- 160 normalized questions from an actual professional competition, not synthetic fluff
- Pre-computed charts via iztro separate chart derivation from interpretive reasoning
- CLI auto-routes through OpenRouter or native providers (OpenAI, Anthropic, Google, DeepSeek, Doubao/Volcengine)
- Filter by year, category, or sample size; shuffle options to catch position bias
- Outputs per-question JSON, summary text, and raw response files for post-mortem debugging
Caveats
- The README doesn’t publish any actual model scores or leaderboards, so you’ll be running your own comparisons blind
- 160 questions is modest; year-filtering drops it further
- No mention of how human fortune tellers score on the same set, so “benchmark” is a generous framing
Verdict
Grab this if you’re building Chinese-cultural LLM evals or just want to watch GPT-4o reason about someone’s 灾劫 (calamity) cycle. Skip it if you need established, peer-reviewed benchmarks with published baselines—this is more niche tooling than settled science.