AI-Hypercomputer/JetStream
Google's throughput and memory optimized inference engine for running LLMs on TPUs and GPUs.

Velocity · 7d
+0.5
★ / day
Trend
→steady
star history
JetStream is an LLM inference engine designed for high throughput and memory efficiency on XLA-based accelerators, primarily TPUs with GPU support coming. It provides reference implementations for both Jax and Pytorch model execution, enabling efficient serving of models like Llama, Gemma, and GPT variants on Google Cloud TPU infrastructure.