← all repositories

flairNLP/fundus

A news crawler Python package from Humboldt University for extracting articles from online sources and the CC-NEWS corpus.

462 stars Python Data Tooling
fundus
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

Fundus is a static news crawler that extracts articles from live websites or the CommonCrawl CC-NEWS dataset. It provides a Python interface to programmatically retrieve and parse news content from various publishers. The tool includes a publisher coverage tracker and a disclaimer about filtering publishers based on AI training usage, indicating it is designed to support NLP data collection workflows.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.