← all repositories

egoist/sitefetch

A CLI tool and library that crawls websites and extracts readable text content for consumption by AI models.

1.7k stars TypeScript Data Tooling
sitefetch
Velocity · 7d
+3.4
★ / day
Trend
steady
star history

Sitefetch is a web scraping tool that recursively fetches an entire website and saves the content as plain text files. It uses Mozilla’s Readability library to extract clean article content from pages, supports path matching with glob patterns to target specific pages, and allows CSS selectors for precise content extraction. The extracted text can be fed directly into LLMs for tasks like RAG, analysis, or building context from web sources.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.