← all repositories

OpenPPL/ppl.nn

A C++ high-performance deep-learning inference engine supporting ONNX models and LLMs like LLaMA, ChatGLM, and Baichuan.

ppl.nn
Velocity · 7d
+0.8
★ / day
Trend
steady
star history

PPLNN (Primitive Library for Neural Network) is a high-performance deep-learning inference engine written in C++. It runs ONNX models and has strong support for OpenMMLab ecosystems. The library includes an LLM engine with Flash Attention, split-k attention, group-query attention, dynamic batching, and tensor parallelism for distributed inference. It supports INT8 quantization including groupwise KV cache and per-token per-channel quantization. The project explicitly supports popular LLM architectures including LLaMA, ChatGLM, Baichuan, and InternLM.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.