← all repositories
Xiangyue-Zhang/auto-deep-researcher-24x7

A grad student that never sleeps, asks for stipend, or drinks your coffee

Autonomous agent that runs deep-learning experiment loops overnight while keeping LLM costs near zero by sleeping during GPU training.

auto-deep-researcher-24x7
Velocity · 7d
+20
★ / day
Trend
steady
star history

What it does You write a PROJECT_BRIEF.md with a goal, constraints, and fallback rules. The agent edits code, launches training, watches the run via PID and nvidia-smi, parses logs, and plans the next iteration — repeating until you hit your metric or it exhausts your patience. It is essentially a control loop wrapped around Claude Code or an OpenAI-compatible API, with a local/SSH/Slurm execution backend.

The interesting bit The monitoring phase makes zero LLM calls. The agent literally sleeps while the GPU churns, only waking the model for the brief THINK and REFLECT phases. The README claims this keeps a 24-hour cycle to about $0.08 in API spend. It also keeps a crash-safe experiment ledger and append-only journals (DEAD_ENDS.md, INSIGHTS.md) so the agent remembers what failed without growing token context forever.

Key highlights

  • Leader-Worker architecture: a planning head and a separate tool-execution worker, with explicit handoff and stricter CLI parsing.
  • Slurm backend: submits via sbatch --parsable over a transient SSH call; liveness is judged by sacct and wall-clock backstops, so a dead cluster does not leave zombie monitors.
  • Domestic-LLM presets: one-word aliases (deepseek, qwen, kimi, glm) auto-fill base URLs and env keys for Chinese API endpoints.
  • Human override files: HUMAN_DIRECTIVE.md for temporary redirects, PROJECT_BRIEF.md for stable constraints, and MEMORY_LOG.md for rolling state.
  • Optional safety rails: a stagnation signal from the ledger, a violation scanner, and a max_cycles_per_hour cap to stop budget-burn loops.

Caveats

  • The “500+ autonomous cycles” and “52% improvement” claims are self-reported from the authors’ own projects, not independent benchmarks.
  • Requires an NVIDIA GPU and an API key; the “zero-cost” framing refers to monitoring, not the training compute or the initial model calls.
  • The README is admirably explicit that this is glue code around existing LLM APIs and training scripts, not a replacement for thinking.

Verdict Worth a look if you are already running PyTorch experiments and want to automate the edit-launch-monitor-reflect drudgery without racking up API bills. Skip it if you expect the agent to invent novel architectures or write your paper for you — the authors explicitly beg you not to do that.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.