← all repositories

OpenNMT/CTranslate2

A C++ and Python library providing an optimized inference runtime for Transformer models on CPU and GPU.

4.5k stars C++ Inference · Serving
CTranslate2
Velocity · 7d
+1.8
★ / day
Trend
steady
star history

CTranslate2 implements a custom runtime that applies performance optimization techniques like weights quantization, layer fusion, and batch reordering to accelerate Transformer model inference and reduce memory usage. The library converts models from frameworks including OpenNMT, Fairseq, Marian, and Hugging Face Transformers into an optimized format, then serves them on CPU and GPU with support for encoder-decoder, decoder-only, and encoder-only architectures.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.