huggingface/text-generation-inference
A Rust and Python gRPC inference server for serving large language models with optimization for text generation workloads.

Text Generation Inference (TGI) is a production inference server designed for serving large language models. Built with Rust for performance-critical components and Python for higher-level logic, it provides gRPC APIs for text generation. The system includes optimizations for quantization and supports major model architectures including transformers, Falcon, BLOOM, StarCoder, and GPT variants. It is the backbone infrastructure powering Hugging Face’s production services including Hugging Chat and the Inference API.