Is insuranceqa-corpus-zh open source?

Yes — chatopera/insuranceqa-corpus-zh is an open-source project tracked on heatdrop.

What language is insuranceqa-corpus-zh written in?

chatopera/insuranceqa-corpus-zh is primarily written in Python.

How popular is insuranceqa-corpus-zh?

chatopera/insuranceqa-corpus-zh has 1.1k stars on GitHub.

Where can I find insuranceqa-corpus-zh?

chatopera/insuranceqa-corpus-zh is on GitHub at https://github.com/chatopera/insuranceqa-corpus-zh.

← all repositories

chatopera/insuranceqa-corpus-zh

A Chinese insurance Q&A dataset that makes you buy a license

Real-world insurance questions and expert answers, translated and packaged for machine learning—but the data itself sits behind a store checkout.

★1.1k stars Python Data Tooling Language Models Chat Assistants

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

This is a Chinese question-answering corpus for the insurance domain, translated from the original English insuranceQA dataset. It pairs roughly 12,889 training questions with 21,325 answers, plus validation and test splits, and includes 200 hard negative candidates per question for answer-selection tasks.

The interesting bit

The project ships two flavors: raw translated Q&A text, and a “pool” version that’s already tokenized, stop-word-stripped, and labeled—ready to feed into models without the usual NLP janitorial work. The negatives are retrieval-based, so they’re plausible distractors rather than random noise.

Key highlights

~27K expert-curated answers across train/valid/test splits
Each question carries 1–5 positive answers and 200 retrieval-built negatives
Dual-format release: raw translated corpus or preprocessed ML-ready data
Python package (insuranceqa_data) handles loading via simple API calls
Bundled baseline models: deep QA, CNN/TensorFlow, n-grams, word2vec

Caveats

The actual corpus download requires purchasing a license from the Chatopera store; the PyPI package is just a downloader stub
Data is research-use-only with attribution requirements (Chunsong License + original paper citation)
README still lists Python 2.x as supported, which may signal stale maintenance

Verdict

Worth a look if you’re building Chinese insurance chatbots or benchmarking answer-selection models in a narrow domain. Skip it if you need open, frictionless data or a modern, actively maintained pipeline.

Frequently asked

What is chatopera/insuranceqa-corpus-zh?: Real-world insurance questions and expert answers, translated and packaged for machine learning—but the data itself sits behind a store checkout.
Is insuranceqa-corpus-zh open source?: Yes — chatopera/insuranceqa-corpus-zh is an open-source project tracked on heatdrop.
What language is insuranceqa-corpus-zh written in?: chatopera/insuranceqa-corpus-zh is primarily written in Python.
How popular is insuranceqa-corpus-zh?: chatopera/insuranceqa-corpus-zh has 1.1k stars on GitHub.
Where can I find insuranceqa-corpus-zh?: chatopera/insuranceqa-corpus-zh is on GitHub at https://github.com/chatopera/insuranceqa-corpus-zh.