watercrawl/WaterCrawl
A web crawler that extracts and transforms web content into markdown format optimized for LLM consumption.

Velocity · 7d
+3.4
★ / day
Trend
→steady
star history
WaterCrawl is a web scraping and crawling application built with Python, Django, Scrapy, and Celery that extracts content from websites and converts HTML into markdown. It specifically targets AI/LLM use cases by preparing web data in formats suitable for model training or RAG pipelines. The tool supports structured data extraction and includes Docker deployment options.