← all repositories

b4rtaz/distributed-llama

A C++ distributed inference engine that clusters home devices together via tensor parallelism to accelerate LLM serving.

distributed-llama
Velocity · 7d
+3.2
★ / day
Trend
steady
star history

This project implements distributed LLM inference by connecting multiple devices into a cluster, leveraging tensor parallelism and high-speed synchronization over Ethernet to accelerate model serving. It supports running large models like Llama 3 and Qwen 3 across heterogeneous hardware including CPUs, GPUs via Vulkan, and even Raspberry Pi devices. The system provides a single-command setup for running popular quantized models across the distributed cluster.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.