QwenLM/qwen.cpp

A C++ implementation for running Qwen language models with support for quantization and streaming generation.

★627 stars C++ Inference · Serving Language Models

View on GitHub ↗

Velocity · 7d

+0.6

★ / day

Trend

→steady

star history

qwen.cpp is a pure C++ implementation of Qwen-LM based on ggml, enabling efficient inference of Qwen models on CPU and GPU. It supports model quantization, streaming text generation with typewriter effect, and provides Python bindings. The project was merged into llama.cpp in December 2023 and has since been deprecated in favor of that ongoing effort.