← all repositories
jjang-ai/mlxstudio

A Mac-native AI workbench that skips the cloud entirely

MLX Studio wraps Apple's MLX framework into a SwiftUI desktop app for local LLMs, image generation, and coding agents—no API keys, no PyTorch in the hot path.

mlxstudio
Velocity · 7d
+8.2
★ / day
Trend
steady
star history

What it does

MLX Studio is a macOS desktop app for running large language models, vision models, image generators, and speech tools locally on Apple Silicon. It downloads models from HuggingFace’s mlx-community, launches an OpenAI-compatible API server, and provides chat, image editing, and agentic coding interfaces. A separate CLI engine (vmlx) is available via PyPI for terminal-first users.

The interesting bit

The project ships its own quantization scheme, JANG, which claims to beat standard MLX quantization at extreme compression—specifically 2-bit adaptive mixed-precision that reportedly outperforms MLX 4-bit on MiniMax M2.5. The app also implements Anthropic’s /v1/messages API format locally, so you can point the official Claude SDK at your own machine.

Key highlights

  • Native Swift + Metal rewrite (vMLX v2) claims 50–95 tokens/second on M-series chips, with zero PyTorch in the inference path
  • Supports 65+ model families including MoE architectures (DeepSeek, Qwen, Mixtral), vision-language models, and hybrid SSM models like Mamba variants
  • Built-in tool calling with 26+ tools: file I/O, shell execution, web search, git operations, and MCP (Model Context Protocol) server connections
  • Drag-and-drop model installation; code-signed and notarized DMG releases with no Homebrew or Xcode required
  • OpenAI-compatible endpoints for chat, completions, images, embeddings, audio speech/transcription, plus Anthropic Messages API compatibility

Caveats

  • macOS ARM64 only; no Intel Mac or Linux support mentioned
  • The README’s performance claims for JANG quantization lack independent verification or methodology detail
  • Some features (MCP, certain model families) appear to be recent additions with limited documentation depth

Verdict

Apple Silicon developers who want a self-contained local AI stack without wrangling Python environments should look here. If you’re on Linux, need cloud redundancy, or want battle-tested quantization with peer-reviewed benchmarks, this isn’t your tool.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.