← all repositories

mit-han-lab/TinyChatEngine

On-device LLM/VLM inference engine written in C++ with SmoothQuant and AWQ quantization support for x86, ARM, and CUDA.

TinyChatEngine
Velocity · 7d
+0.9
★ / day
Trend
steady
star history

TinyChatEngine is a from-scratch C++ implementation for running compressed LLMs and VLMs on edge devices like laptops, cars, and robots. It uses SmoothQuant and AWQ quantization techniques for model compression and supports Intel/AMD x86, Apple M1/M2 ARM, and Nvidia CUDA platforms. The library enables real-time inference with better privacy since data stays local.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.