← all repositories

guhhhhaa/4675-scifi

A Chinese NLP corpus containing approximately 4,675 science fiction novels formatted as a training dataset.

4675-scifi
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

This repository is a Chinese natural language processing corpus comprised of science fiction novels, compiled by a former Baidu Tieba sci-fi forum moderator. It is explicitly designed to serve as an AI corpus for training NLP models on Chinese science fiction text. The dataset includes works sourced from the forum and additional sci-fi novel archives.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.