BlackSamorez/tensor_parallel
A library that automatically shards PyTorch models across multiple GPUs for parallel training and inference.

Velocity · 7d
+0.5
★ / day
Trend
→steady
star history
This library provides tensor parallelism to split large PyTorch models across multiple GPUs, enabling execution of models too large for a single device. It works with standard HuggingFace transformers and requires only wrapping the model with tp.tensor_parallel. Supports both inference (model.generate) and training (backward pass) workloads with examples including FLAN-T5 fine-tuning and OPT model parallelism.