An 8B model that thinks it's a data science team
DeepAnalyze tries to automate the full data pipeline—cleaning, analysis, visualization, and report generation—without human hand-holding.

What it does DeepAnalyze is an open-source 8B parameter model fine-tuned to act as an autonomous data-science agent. Feed it CSVs, Excel files, databases, JSON, or even plain text, and it attempts to run the full workflow: data prep, analysis, modeling, visualization, and final report generation. The project ships with a WebUI, a Jupyter integration, and a CLI, plus vLLM deployment scripts and quantized variants for GPUs with as little as 16GB of VRAM.
The interesting bit
The authors trained on a 500K-sample dataset (also released) and explicitly target “open-ended data research” rather than single-task notebooks. The JupyterUI demo is particularly clever: the model outputs <Analyze>, <Code>, and <Execute> tags that get mapped directly to Markdown and executable cells, turning the LLM into a literal notebook author.
Key highlights
- Fully open weights, training data, and inference code on Hugging Face
- Quantized to 4-bit and 8-bit with FP8 KV cache; runs on consumer GPUs (16GB) up to datacenter A100s (80GB)
- Multiple interfaces: browser-based WebUI (two versions), Jupyter Lab extension, and a Rich-based CLI in English or Chinese
- Docker sandbox for code execution in the v2 WebUI
- OpenAI-style API endpoint support added by community contributors
Caveats
- The README notes the demo UIs are “initial versions” and invites further development
- API keys for a hosted version require filling out a Google or Feishu form—no self-serve signup
- Actual accuracy or benchmark comparisons against other data-science agents aren’t shown in the provided sources
Verdict Worth a look if you want a local, hackable alternative to closed AI data analysts. Skip it if you need proven enterprise reliability or don’t have the GPU budget to self-host.