← all repositories

huggingface/transformers-bloom-inference

Fast inference implementation for the BLOOM 176B language model with multi-GPU support and int8 quantization.

transformers-bloom-inference
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

This repository provides demos and packages for running efficient inference on the BLOOM large language model. It supports inference via HuggingFace accelerate and DeepSpeed Inference, with options for fp16/bf16 and int8 quantized deployment on multi-GPU setups. It leverages LLM.int8() and ZeroQuant techniques for post-training quantization to reduce memory footprint while maintaining generation quality.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.