← all repositories

huggingface/text-generation-inference

A Rust and Python gRPC inference server for serving large language models with optimization for text generation workloads.

text-generation-inference
Velocity · 7d
+8.1
★ / day
Trend
steady
star history

Text Generation Inference (TGI) is a production inference server designed for serving large language models. Built with Rust for performance-critical components and Python for higher-level logic, it provides gRPC APIs for text generation. The system includes optimizations for quantization and supports major model architectures including transformers, Falcon, BLOOM, StarCoder, and GPT variants. It is the backbone infrastructure powering Hugging Face’s production services including Hugging Chat and the Inference API.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.