Is petals open source?

Yes — bigscience-workshop/petals is open source, released under the MIT license.

What language is petals written in?

bigscience-workshop/petals is primarily written in Python.

How popular is petals?

bigscience-workshop/petals has 10.3k stars on GitHub.

Where can I find petals?

bigscience-workshop/petals is on GitHub at https://github.com/bigscience-workshop/petals.

← all repositories

bigscience-workshop/petals

LLM inference, BitTorrent-style: one layer per volunteer GPU

Petals lets you run and fine-tune models like Llama 3.1 405B from a desktop by distributing layers across a public swarm of consumer GPUs.

★10.3k stars Python Inference · Serving Language Models

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Petals is a distributed inference and fine-tuning network for very large language models. Instead of loading an entire 70B or 405B model into one machine, you host a few layers on your GPU and route the rest through a pool of peers. The interface wraps this fragmentation in a familiar AutoDistributedModelForCausalLM API that behaves like standard Transformers.

The interesting bit

The project treats model weights like torrent pieces: your computer only caches the shards it serves, and the network stitches together a full forward pass across dozens of machines. You still get PyTorch-level access to hidden states, custom sampling, and fine-tuning methods, just over a WAN.

Key highlights

Supports distributed execution of Llama 3.1 (up to 405B), Mixtral 8x22B, Falcon 40B+, and BLOOM 176B
Claims inference and fine-tuning up to 10× faster than offloading, with single-batch speeds reaching ~6 tokens/sec for Llama 2 70B and ~4 tokens/sec for Falcon 180B (per the project’s paper)
Exposes full model internals—hidden states, custom paths, arbitrary fine-tuning—rather than hiding everything behind a black-box API
The network is community-run; you can also spin up a private swarm for sensitive data
Hosting a server does not let others execute custom code on your machine

Caveats

Your prompts travel through volunteer machines, so privacy depends on strangers’ hardware; the authors explicitly note this and recommend a private swarm for sensitive workloads
Throughput and availability are tied to whichever peers are currently online and serving the model layers you need

Verdict

Ideal for researchers and hobbyists who need to poke around inside huge models without renting A100 clusters. If you require guaranteed latency, strict data isolation, or production SLAs, you should look at managed infrastructure instead.

Frequently asked

What is bigscience-workshop/petals?: Petals lets you run and fine-tune models like Llama 3.1 405B from a desktop by distributing layers across a public swarm of consumer GPUs.
Is petals open source?: Yes — bigscience-workshop/petals is open source, released under the MIT license.
What language is petals written in?: bigscience-workshop/petals is primarily written in Python.
How popular is petals?: bigscience-workshop/petals has 10.3k stars on GitHub.
Where can I find petals?: bigscience-workshop/petals is on GitHub at https://github.com/bigscience-workshop/petals.