paulpierre/markdown-crawler
A multithreaded web crawler that converts websites into markdown files for use in LLM RAG pipelines and knowledge bases.

Velocity · 7d
+0.5
★ / day
Trend
→steady
star history
This tool recursively crawls websites and generates markdown files for each page, preserving document structure like tables and images. It uses BeautifulSoup for HTML parsing and supports multithreading for faster crawling with resumable sessions. The output is designed to be easily chunked and processed for retrieval augmented generation systems, LLM fine-tuning datasets, and agent knowledge bases.