← all repositories

AI-Hypercomputer/JetStream

Google's throughput and memory optimized inference engine for running LLMs on TPUs and GPUs.

JetStream
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

JetStream is an LLM inference engine designed for high throughput and memory efficiency on XLA-based accelerators, primarily TPUs with GPU support coming. It provides reference implementations for both Jax and Pytorch model execution, enabling efficient serving of models like Llama, Gemma, and GPT variants on Google Cloud TPU infrastructure.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.