kennethleungty/Llama-2-Open-Source-LLM-CPU-Inference
A guide for running quantized Llama 2 and open-source LLMs on CPU for local document question-and-answer applications.

Velocity · 7d
+0.9
★ / day
Trend
→steady
star history
This project demonstrates how to run quantized open-source LLMs locally on CPU for document Q&A, enabling private deployments without GPU costs. It combines LangChain for orchestration, FAISS for vector similarity search, sentence-transformers for embeddings, and C Transformers/GGML for efficient CPU inference of models like Llama 2. Users provide documents and can query them to get answers generated by the local LLM.