← all repositories

microsoft/onnxruntime-genai

ONNX Runtime GenAI is a C++ runtime for efficiently running large language models on device with support for CUDA, DirectML, TensorRT, and other hardware accelerators.

onnxruntime-genai
Velocity · 7d
+1.1
★ / day
Trend
steady
star history

This repository provides a specialized runtime for executing generative AI models in the ONNX format. It implements the complete generative AI loop including model preprocessing, ONNX Runtime-based inference, logits processing, search and sampling, KV cache management, and grammar-based constrained decoding for tool calling. The project supports a wide range of LLM architectures including Llama, Gemma, Mistral, Phi, Qwen, Whisper, DeepSeek, and Granite, across multiple hardware backends.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.