← all repositories
polyrabbit/hacker-news-digest

Hacker News, but you actually have time to read it

A static-site generator that scrapes HN, extracts article content, and feeds it to ChatGPT for summaries with auto-generated illustrations.

753 stars Python Chat AssistantsOther AI
hacker-news-digest
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does This project scrapes Hacker News, pulls the main content from each linked article using a machine-learning-based extraction algorithm, generates summaries via OpenAI’s GPT-3.5-turbo API, and renders everything as a static site deployed to GitHub Pages. If OpenAI is unreachable, it falls back to a local Google T5 model. The result is a readable digest with thumbnails, embedded videos/PDFs, and RSS feeds.

The interesting bit The whole pipeline runs on GitHub Actions and publishes to GitHub Pages — no server, no database, just scheduled scraping and templated HTML. The content extraction uses a scoring algorithm rather than simple heuristics, and the project even localizes summaries into Chinese with a single prompt tweak.

Key highlights

  • Static site generated entirely via GitHub Actions (badge shows it’s actively building)
  • Dual-model summarization: OpenAI API with local Google T5 fallback
  • Auto-extracted illustrations and embedded media (YouTube, PDFs, GitHub gists)
  • Sortable and filterable by points, comments, or time; RSS fully supported
  • Chinese translation available via LLM prompt modification

Caveats

  • The README’s “score algorithm” for content extraction links to a Jupyter notebook tutorial, but the actual implementation details aren’t shown inline
  • TODO list includes switching to the official Hacker News API and improving web scraping (currently not using headless browsers)
  • “Seamless” appears twice in the README; the project itself is more pragmatic than that word suggests

Verdict Worth a look if you want a self-hosted HN digest or need a reference for building cheap, serverless content pipelines. Skip it if you need real-time comments or a polished UI — the author admits the homepage could be prettier.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.