egoist/sitefetch
A CLI tool and library that crawls websites and extracts readable text content for consumption by AI models.

Velocity · 7d
+3.4
★ / day
Trend
→steady
star history
Sitefetch is a web scraping tool that recursively fetches an entire website and saves the content as plain text files. It uses Mozilla’s Readability library to extract clean article content from pages, supports path matching with glob patterns to target specific pages, and allows CSS selectors for precise content extraction. The extracted text can be fed directly into LLMs for tasks like RAG, analysis, or building context from web sources.