← all repositories
getmaxun/maxun

A no-code scraper that records your clicks and turns them into APIs

Maxun lets you browse a website normally, then replays your actions as a scheduled data-extraction robot.

15.8k stars TypeScript Data ToolingAgents
maxun
Velocity · 7d
+16
★ / day
Trend
steady
star history

What it does

Maxun is an open-source web data platform built around four “robots”: Extract (structured data via point-and-click recording or natural-language LLM prompts), Scrape (full-page Markdown/HTML), Crawl (site-wide discovery), and Search (automated web queries with time filters). You can self-host it with Docker or use the hosted version, and there is a Node SDK plus CLI for developers who want to trigger runs programmatically.

The interesting bit

The recorder mode is the standout: you literally browse a site, click through pagination, log in if needed, and Maxun memorizes the choreography. It also claims auto-recovery when a target site redesigns its layout, which—if it works—is the difference between a brittle script and a durable pipeline.

Key highlights

  • Recorder + AI dual mode: either replay your own clicks or describe what you want in plain English and let an LLM handle extraction.
  • Scheduled runs and REST endpoints: turn any captured workflow into a recurring job with API access.
  • Login-aware extraction: can scrape behind authenticated sessions.
  • MCP and spreadsheet integrations: exports to Google Sheets, Airtable, and speaks Model Context Protocol for AI-agent plumbing.
  • AGPLv3 licensed: fully self-hostable, though the project nudges commercial users to contribute back.

Caveats

  • The README explicitly notes the project is in “early stages of development,” so expect rough edges and breaking changes.
  • Auto-recovery from layout changes is promised but not demonstrated or benchmarked in the sources.

Verdict

Worth a look if you need recurring, structured data from sites that fight simple curl scripts—especially if you want non-technical teammates to build the robots. Hardcore developers already happy with Playwright and a cron job may find it over-engineered.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.