← all repositories

kennethleungty/Llama-2-Open-Source-LLM-CPU-Inference

A guide for running quantized Llama 2 and open-source LLMs on CPU for local document question-and-answer applications.

Llama-2-Open-Source-LLM-CPU-Inference
Velocity · 7d
+0.9
★ / day
Trend
steady
star history

This project demonstrates how to run quantized open-source LLMs locally on CPU for document Q&A, enabling private deployments without GPU costs. It combines LangChain for orchestration, FAISS for vector similarity search, sentence-transformers for embeddings, and C Transformers/GGML for efficient CPU inference of models like Llama 2. Users provide documents and can query them to get answers generated by the local LLM.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.