← all repositories
WecoAI/aideml

AutoML that actually searches, not just grids

AIDE treats ML pipeline code as a tree-search problem, using LLM patches to explore and metric feedback to prune.

aideml
Velocity · 7d
+1.6
★ / day
Trend
steady
star history

What it does AIDE ML is an open-source reference implementation of an agent that writes, debugs, and benchmarks machine-learning code until your metric stops improving. You point it at a dataset, describe the goal and evaluation in plain English, and it runs a tree search where each node is a Python script and each child is an LLM-generated patch. It outputs the best code it found plus an HTML visualization of the whole solution tree.

The interesting bit The project leans into search rather than the more common linear agent loop. OpenAI’s MLE-Bench (75 Kaggle competitions) reportedly found this tree-search approach won four times more medals than the best linear agent — though the README doesn’t specify which competitions or medal tiers, so take the headline figure as cited rather than verified. The repo itself is positioned as a research-friendly base for swapping in new heuristics, evaluators, or LLM backends.

Key highlights

  • Natural-language task spec: aide data_dir=… goal="Predict churn" eval="AUROC"
  • Ships with CLI, Streamlit UI, and HTML tree visualizer
  • Model-neutral: OpenAI, Anthropic, Gemini, or any OpenAI-compatible local LLM
  • Docker support and development install via pip install -e .
  • Active research uptake: cited/forked by OpenAI, Meta, Sakana AI, METR, and SJTU projects

Caveats

  • The “4× more medals” claim comes from an external benchmark (MLE-Bench), not the authors’ own tests
  • Fully local LLM setups are supported but the README warns to “expect some performance drop”
  • The evaluator defaults to gpt-4o even when you switch the coding model to a local alternative

Verdict Worth a look if you’re researching agent architectures or need a quick, opinionated AutoML baseline for tabular or structured data. Skip it if you want production-grade experiment tracking — that’s the commercial Weco platform, not this repo.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.