← all repositories

abetlen/llama-cpp-python

Python library providing high-level bindings to run quantized LLM inference via llama.cpp.

llama-cpp-python
Velocity · 7d
+8.8
★ / day
Trend
steady
star history

This package wraps llama.cpp in Python, offering both low-level ctypes access and a high-level API for text completion. It ships an OpenAI-compatible web server, supports function calling, vision models, and multiple concurrent model sessions. Integration with LangChain and LlamaIndex enables use in RAG and agent pipelines.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.