Skip the ONNX dance: load PyTorch/TF/Keras models straight into TensorRT
Tencent's Forward parses trained models directly into TensorRT engines, cutting out manual conversion steps.

What it does Forward is a C++ inference wrapper that ingests trained models from PyTorch, TensorFlow, Keras, or ONNX and feeds them into NVIDIA TensorRT without making you hand-craft intermediate formats or network definitions. It exposes both C++ and Python APIs, supports FP32/FP16/INT8 precision modes, and covers CV, NLP, recommender, and some niche models like BERT and FaceSwap.
The interesting bit The “parsing” angle is the hook: instead of the usual export-to-ONNX-then-pray workflow, Forward claims to parse native framework formats (.pb, .pth, .h5, .onnx) directly into TensorRT engine graphs. That could save steps, though the README is light on exactly how this parsing differs from standard TensorRT ONNX parsing under the hood.
Key highlights
- Direct ingestion: TensorFlow (.pb), PyTorch (.pth), Keras (.h5), ONNX (.onnx) — no explicit intermediate conversion step required
- Precision modes: FLOAT, HALF, INT8 supported
- Extensible: documented path to add custom layer support via TensorRT plugin API
- Dual APIs: C++ and Python bindings
- BERT-specific demo and support, plus CV/NLP/recommender model coverage
Caveats
- Heavy dependencies: CUDA ≥10.0, cuDNN ≥7, TensorRT ≥7.0, plus framework-specific libraries (PyTorch ≥1.7, TensorFlow ≥1.15 with manual .so placement on Linux)
- Build matrix is fragmented: separate CMake targets per framework (Fwd-Torch, Fwd-Tf, Fwd-Keras, Fwd-Onnx, each with Python variants)
- README is almost entirely in Chinese; English version exists but isn’t shown in this source
- 555 stars suggests limited community traction so far
Verdict Worth a look if you’re already locked into TensorRT and tired of maintaining ONNX export pipelines for heterogeneous model sources. Skip it if you need broad non-NVIDIA hardware support or a mature, English-first community ecosystem.