pytorch/torchchat
A PyTorch-native toolkit for running LLMs locally on servers, desktops, and mobile devices with quantization and multiple deployment options.

torchchat is a codebase demonstrating how to run large language models locally using PyTorch. It supports running models via Python (eager and compiled modes), AOT Inductor for optimized server/desktop execution, and ExecuTorch for mobile deployment on iOS and Android. The project supports popular LLMs including Llama 3, Llama 2, Mistral, and DeepSeek R1, with multimodal capabilities for models like Llama 3.2 11B, and offers multiple quantization schemes for memory-efficient inference.