bigscience-workshop/petals
A peer-to-peer platform that distributes large language model inference and fine-tuning across volunteer machines like BitTorrent.

Petals enables running massive LLMs (up to 405B parameters) on consumer hardware by distributing model layers across a network of connected devices. It uses BitTorrent-style peer-to-peer architecture combined with tensor and pipeline parallelism to achieve fast inference and fine-tuning. Users can interact with models like Llama 3.1, Mixtral, and BLOOM directly from Python using HuggingFace-compatible APIs, or access them through a public chatbot interface.