← all repositories

containers/ramalama

A container-native tool for serving AI models for inference from any source.

2.9k stars Python Inference · Serving
ramalama
Velocity · 7d
+4.2
★ / day
Trend
steady
star history

RamaLama simplifies local AI model serving by packaging models as OCI container images, eliminating host configuration requirements. It supports multiple inference backends including vllm and llamacpp, and automatically detects available hardware such as NVIDIA CUDA or Intel GPUs. Users interact with the tool through familiar container patterns via Podman.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.