← all repositories
jayleicn/animeGAN

When your GAN training is just a toy project, but the dataset is 143k anime faces

A learning exercise that accidentally produced a clean pipeline for scraping, cleaning, and generating anime portraits with DCGAN.

1.3k stars Jupyter Notebook Image · Video · AudioML Frameworks
animeGAN
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does

AnimeGAN trains a DCGAN on ~143,000 cropped anime faces to generate new character portraits and interpolate between latent codes. It ships with a pretrained model, a Jupyter notebook for tinkering, and a full data pipeline from Danbooru tags to face crops.

The interesting bit

The author is admirably honest: this is “a toy project to learn PyTorch and GANs.” Yet the README doubles as a field report on what actually stabilizes training—adding noise to the discriminator’s inputs and labels, keeping the generator deeper than the discriminator, and the surprising discovery that binary noise {-1, 1} works but looks worse than Gaussian. The dataset construction guide is the hidden gem: multi-process scraping with gallery-dl, face detection via python-animeface, and a cleaned 115,085-image release.

Key highlights

  • Pretrained DCGAN model included; run inference in the bundled notebook
  • Full dataset pipeline documented: scrape by tag, detect faces, crop, clean
  • Curated dataset release (115k images, 126 tags) on GitHub Releases and BaiduYun
  • Author’s training notes read like a lab notebook—unverified but specific
  • Single-command training: python main.py --dataRoot path_to_dataset/

Caveats

  • Code is “mostly borrowed” from the official PyTorch DCGAN example; heavy influence from chainer-DCGAN and IllustrationGAN
  • Dataset tags become semantically meaningless after face cropping (e.g., “uniform” may show no uniform)
  • Author explicitly notes: “I did not carefully verify them” regarding training tips

Verdict

Grab this if you want a minimal, working DCGAN baseline with a ready-made niche dataset and honest notes on training instability. Skip it if you need architectural novelty or production-grade generation; the author would agree.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.