riccardomusmeci/mlx-llm
A Python library providing tooling to run and serve multiple LLM families on Apple Silicon in real-time using Apple's MLX framework.

mlx-llm is a Python package that enables loading and running various large language models on Apple Silicon hardware by leveraging Apple’s MLX optimization framework. It provides a create_model() API to instantiate models with pre-trained weights from HuggingFace and supports out-of-the-box deployment of LLaMA 2/3, Phi3, Mistral, TinyLLaMA, Gemma, OpenELM, and SmolLM2 families. The library aims to enable real-time LLM inference on Apple devices.