← all repositories
HKUDS/Paper2Slides

Your research paper, now with a Doraemon theme

An LLM pipeline that turns PDFs into slides or posters, complete with checkpoint recovery and natural-language styling.

Paper2Slides
Velocity · 7d
+20
★ / day
Trend
steady
star history

What it does Feed it a PDF, Word doc, or even a PowerPoint, and Paper2Slides runs a four-stage pipeline: RAG indexing, content extraction, layout planning, and final image generation. Output is a set of PNG slides or a poster, optionally merged into a PDF. There’s a web UI, but the CLI is fully scriptable.

The interesting bit The checkpoint system is the quiet workhorse. Each stage writes a JSON checkpoint, so you can resume after a crash, switch styles without re-parsing, or regenerate images while keeping the same plan. The README also notes that “fine-grained layout instructions ground well; fine-grained element styling does not” — a useful, honest hint for prompt engineering.

Key highlights

  • Supports PDF, Word, Excel, PowerPoint, and Markdown inputs
  • Normal mode uses RAG for long documents; --fast skips indexing for quick previews
  • Parallel generation (--parallel N) speeds up multi-slide output
  • Custom styles via natural language prompts (the Totoro example is in the README)
  • Image generation defaults to Gemini via OpenRouter, with a fallback to direct Google API

Caveats

  • Requires external LLM and image-gen API keys; no local-only mode is mentioned
  • Image generation model must support image responses, or you need to configure MIME types correctly
  • The “fast” mode only works if the full document fits in the LLM context window

Verdict Worth a look if you regularly turn papers into conference posters or deck slides and want to automate the busywork. Skip it if you need pixel-perfect manual control or can’t route documents through third-party LLM APIs.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.