← all repositories
OpenMOSS/MOSS

Fudan's 16B LLM ships with plugins and a conscience

A Chinese research lab open-sourced a bilingual ChatGPT rival that can search the web, solve equations, and generate images—while admitting its own limitations upfront.

12.1k stars Python Language ModelsChat Assistants
MOSS
Velocity · 7d
+11
★ / day
Trend
steady
star history

What it does MOSS is a 16-billion-parameter bilingual (Chinese/English) conversational model from Fudan University. It handles multi-turn dialogue, instruction following, and—crucially—can invoke four built-in plugins: web search, text-to-image generation, calculator, and equation solver. The project releases not just quantized model weights (INT4/INT8 for consumer GPUs), but the full training data and a separate preference model trained on ~180k human feedback comparisons.

The interesting bit The authors are unusually candid about the model’s limitations, warning users that it “may generate misleading replies containing factual errors” and explicitly disclaiming liability for harmful outputs. This honesty is refreshing in a field prone to overselling. The training pipeline is also fully documented: base pre-training on ~700B tokens, SFT on ~1.1M dialogues (some synthesized from GPT-3.5-turbo), then preference optimization—essentially reproducing the RLHF stack without the corporate opacity.

Key highlights

  • Runs on single A100 (FP16) or single RTX 3090 (INT4/8); INT4 needs only ~7.8GB to load, ~12GB for a conversation round
  • Four native plugins: search, image generation, calculator, equation solver—no external API orchestration required
  • Full data release: ~1.1M SFT dialogues, ~300k plugin-augmented conversations, and preference data forthcoming
  • Apache 2.0 code, CC BY-NC 4.0 data, AGPL 3.0 model weights—a three-tier license structure worth reading before commercial use
  • Companion repos for deployment (MOSS Vortex), web search backend, Flutter frontend, and Go backend

Caveats

  • Triton dependency limits inference to Linux/WSL; Windows and macOS are explicitly unsupported “for now”
  • Quantized models don’t support model parallelism, so multi-GPU setups only work with FP16
  • Several promised models (moss-moon-003, moss-moon-003-plugin, preference model) were listed as “coming soon” at last README update

Verdict Worth a look if you’re building Chinese-language LLM applications or studying reproducible RLHF pipelines. Skip if you need a polished, turnkey product—this is a research release with rough edges and hardware constraints that demand engineering patience.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.