abetlen/llama-cpp-python
Python library providing high-level bindings to run quantized LLM inference via llama.cpp.

Velocity · 7d
+8.8
★ / day
Trend
→steady
star history
This package wraps llama.cpp in Python, offering both low-level ctypes access and a high-level API for text completion. It ships an OpenAI-compatible web server, supports function calling, vision models, and multiple concurrent model sessions. Integration with LangChain and LlamaIndex enables use in RAG and agent pipelines.