← all repositories

lucasjinreal/weibo_terminater

A Weibo web scraper that collects posts, comments, and followers to build NLP training corpora.

2.3k stars Python Data Tooling
weibo_terminater
Velocity · 7d
+0.7
★ / day
Trend
steady
star history

This repository provides a Python-based scraper for Weibo (Chinese social media platform) that collects posts, comments, user followers, and other content. It is described as an NLP corpus preparation tool, intended to gather text data for natural language processing research. The tool uses Selenium with Firefox geckodriver to navigate Weibo pages and extract content. While the author has moved to autonomous driving work, they maintain the project for corpus collection purposes.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.