Data Tooling

Data Tooling

newcomers · velocity + momentum
01
microsoft/markitdown
+258 ★/daysteady

A Python utility that converts PDFs, PowerPoints, Excel files, images, audio, and even YouTube videos into Markdown—optimized for feeding text to language models, not for pretty human reading.

147.4k Python Data Tooling · explained
03
firecrawl/firecrawl
+166 ★/daysteady

Firecrawl turns the messy web into clean markdown and structured data so your AI agents don't have to squint at HTML.

129.9k TypeScript Data Tooling · explained
04
virgiliojr94/book-to-skill
+121 ★/daysteady

This tool turns any PDF or EPUB into a Claude Code skill, so you can query frameworks and patterns from the actual text instead of hallucinating chapter 7.

4.6k Python Coding Assistants · explained
05
toon-format/toon
+107 ★/daysteady

A new serialization format that trades braces for whitespace and turns uniform arrays into schema-aware tables, cutting token counts by ~40% without losing the JSON data model.

24.5k TypeScript LLMOps · Eval · explained
06
google/langextract
+110 ★/daysteady

LangExtract turns wall-of-text documents into structured, verifiable data by making the LLM show its work.

36.8k Python Data Tooling · explained
07
run-llama/liteparse
+80 ★/daysteady

Run-LLama's Rust-core tool extracts text, bounding boxes, and screenshots locally, with an escape hatch to cloud OCR when documents get nasty.

9.5k Rust Data Tooling · explained
08
unclecode/crawl4ai
+90 ★/daysteady

The most-starred crawler on GitHub exists because its creator refused to pay $16 for a bad API.

68k Python Data Tooling · explained
09
docling-project/docling
+88 ★/daysteady

Docling turns chaotic office documents into structured, AI-ready formats without sending your data to the cloud.

61.1k Python Data Tooling · explained
11
OpenSenseNova/SenseNova-Skills
+69 ★/daysteady

SenseNova-Skills bundles concrete office capabilities—slide decks, data analysis, infographics, and deep research—as modular agent plugins you drop into OpenClaw or Hermes.

3.8k Python Agents · explained
12
opendatalab/MinerU
+80 ★/daysteady

MinerU turns messy PDFs, Office files, and images into structured markdown so your RAG pipeline stops choking on scrambled text.

66.8k Python Data Tooling · explained
13
Manavarya09/design-extract
+58 ★/daysteady

A CLI that points Playwright at any URL and emits Tailwind configs, Figma variables, shadcn themes, and even graded report cards.

3.1k JavaScript Coding Assistants · explained
15

A tool that keeps formulas, charts, and layout intact while translating scientific papers into 34K+ stars worth of languages.

34.6k Python Other AI · explained
16

A Python toolkit that reverse-engineers alpha-blended logos, strips C2PA manifests, and diffuses away invisible fingerprints like SynthID.

3k Python Computer Vision · explained
17
pathwaycom/pathway
+49 ★/daysteady

Pathway lets you write ETL pipelines in Python, then executes them in a Rust engine built on Differential Dataflow.

63.1k Python RAG · Search · explained
18
Kaelio/ktx
+33 ★/daysteady

ktx is a local context layer that ingests your data stack and business knowledge so Claude, Codex, and other agents query warehouses with approved metrics instead of inventing SQL.

943 TypeScript Agents · explained
20
yamadashy/repomix
+38 ★/daysteady

Repomix collapses entire codebases into a single AI-friendly file, because context windows are hungry and copy-pasting is undignified.

26.1k TypeScript Data Tooling · explained
loading more…

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.