← all repositories

SkywalkerDarren/chatWeb

A system that crawls web pages and documents, extracts text, stores embeddings in a vector database, and uses GPT-3.5 to answer questions based on retrieved content.

913 stars Python RAG · SearchLLMOps · Eval
chatWeb
Velocity · 7d
+0.8
★ / day
Trend
steady
star history

ChatWeb crawls any webpage or extracts text from PDF, DOCX, and TXT files. It generates embedded summaries using GPT-3.5’s embedding API, stores vector-text mappings in a vector database (FAISS or pgvector), and retrieves the most similar text chunks via nearest neighbor search to generate answers. It improves accuracy by generating vectors from keywords rather than raw questions, effectively breaking through token limits by extracting relevant content from large texts.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.