flairNLP/fundus
A news crawler Python package from Humboldt University for extracting articles from online sources and the CC-NEWS corpus.

Velocity · 7d
+0.4
★ / day
Trend
→steady
star history
Fundus is a static news crawler that extracts articles from live websites or the CommonCrawl CC-NEWS dataset. It provides a Python interface to programmatically retrieve and parse news content from various publishers. The tool includes a publisher coverage tracker and a disclaimer about filtering publishers based on AI training usage, indicating it is designed to support NLP data collection workflows.