Is paperai open source?

Yes — neuml/paperai is open source, released under the Apache-2.0 license.

What language is paperai written in?

neuml/paperai is primarily written in Python.

How popular is paperai?

neuml/paperai has 1.8k stars on GitHub.

Where can I find paperai?

neuml/paperai is on GitHub at https://github.com/neuml/paperai.

← all repositories

neuml/paperai

Bulk LLM reports over scientific paper corpora

A configuration-driven tool that runs hundreds of RAG prompts against indexed research papers and spits out structured reports.

★1.8k stars Python RAG · Search Domain Apps

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

paperai ingests scientific paper databases (pre-built with its sibling project paperetl), indexes them with vector embeddings, then runs bulk LLM inference via YAML configuration files. You define columns, queries, and prompt templates; it generates Markdown, CSV, or annotated PDF reports across the top-N matching articles. Think of it as a spreadsheet where some columns are filled by metadata and others by a medical LLM reading the relevant sections for you.

The interesting bit

The “generated columns” feature is the clever hook: each column can run its own retrieval query to rank document sections, then pipe the top matches into a RAG pipeline with a custom LLM prompt. This lets one report ask “what’s the sample size?” while another column independently asks “list possible causes” — all against the same paper, with context windows managed automatically.

Key highlights

Built on txtai embeddings + SQLite + LLM; not reinventing the wheel, wiring it together for research workflows
YAML report schemas define bulk operations: vector queries, standard metadata columns, and dynamic LLM-generated columns
Outputs Markdown, CSV, or annotated PDFs with answers overlaid on originals
Ships with CLI shell, single-query runner, and report builder entry points
Docker support; Python 3.10+; Colab notebooks for medical research examples

Caveats

Requires a separate paperetl step to build the input database; not a one-command end-to-end pipeline
The README’s architecture diagram has light/dark mode variants that may render oddly depending on GitHub’s current theme

Verdict

Worth a look if you’re doing systematic literature reviews, meta-analyses, or any repeatable extraction over large paper corpora. Skip it if you just need ad-hoc chat-with-PDF; this is for batch operations at scale.

Frequently asked

What is neuml/paperai?: A configuration-driven tool that runs hundreds of RAG prompts against indexed research papers and spits out structured reports.
Is paperai open source?: Yes — neuml/paperai is open source, released under the Apache-2.0 license.
What language is paperai written in?: neuml/paperai is primarily written in Python.
How popular is paperai?: neuml/paperai has 1.8k stars on GitHub.
Where can I find paperai?: neuml/paperai is on GitHub at https://github.com/neuml/paperai.