containers/ramalama
A container-native tool for serving AI models for inference from any source.

Velocity · 7d
+4.2
★ / day
Trend
→steady
star history
RamaLama simplifies local AI model serving by packaging models as OCI container images, eliminating host configuration requirements. It supports multiple inference backends including vllm and llamacpp, and automatically detects available hardware such as NVIDIA CUDA or Intel GPUs. Users interact with the tool through familiar container patterns via Podman.