Data Tooling

underdogs · picking up speed

+36% /wk +51 ★/day↗accelerating

Token Monitor reads local logs from two dozen AI coding tools to surface live token burn, costs, and limits in one place, synced across all your machines.

★ 990 JavaScript LLMOps · Eval · explained

simonlin1212/a-stock-data

+13% /wk +152 ★/day↗accelerating

Because your coding assistant shouldn't need a securities API cheat sheet to analyze A-shares.

★ 7.9k Data Tooling · explained

0xMassi/webclaw

+14% /wk +43 ★/day↗accelerating

Turns URLs into clean markdown and JSON so AI agents don't have to parse HTML soup.

★ 2.1k Rust RAG · Search · explained

virgiliojr94/book-to-skill

+13% /wk +190 ★/day↗accelerating

It turns technical PDFs and EPUBs into on-demand Claude Code skills, so you can query a book's actual frameworks instead of hoping the model remembers them.

★ 10.1k Python Coding Assistants · explained Feature

wiltodelta/remove-ai-watermarks

+9.4% /wk +58 ★/day↗accelerating

It removes the visible Gemini sparkle, invisible SynthID fingerprints, and C2PA metadata that AI image generators embed in every output.

★ 4.3k Python Computer Vision · explained

OpenDCAI/DataFlow

+9.6% /wk +96 ★/day↗accelerating

It exists to automate the tedious pipeline of turning noisy PDFs and plain text into structured training data for domain-specific LLMs.

★ 7k Python Data Tooling · explained

MarkPDFdown/markpdfdown

+8.4% /wk +23 ★/day↗accelerating

Uses multimodal LLMs to transcribe PDFs into Markdown, preserving complex layouts that traditional extractors mangle.

★ 1.9k Python Data Tooling · explained

llmsresearch/paperbanana

+6.1% /wk +19 ★/day↗accelerating

This unofficial rebuild of a Google Research project chains specialized agents to turn rough text and data into publication-ready academic figures.

★ 2.2k Python Creative · Design · explained

Ontos-AI/knowhere

+5.4% /wk +16 ★/day↗accelerating

A pipeline that turns messy PDFs and slides into structured, navigable memory for AI agents instead of flat text shards.

★ 2k Python RAG · Search · explained

Zipstack/unstract

+3.5% /wk +35 ★/day↗accelerating

Unstract turns document extraction into a prompt-and-deploy workflow instead of a regex archaeology dig.

★ 6.9k Python Domain Apps · explained

Kaelio/ktx

+2.7% /wk +5.9 ★/day↗accelerating

ktx is a local context layer that ingests your data stack and business knowledge so Claude, Codex, and other agents query warehouses with approved metrics instead of inventing SQL.

★ 1.5k TypeScript Agents · explained

coderamp-labs/gitingest

+1.1% /wk +24 ★/day↗accelerating

It turns Git repositories into flat, token-counted text digests so you can stop manually concatenating files for LLM prompts.

★ 15.2k Python Data Tooling · explained

getmaxun/maxun

+1.6% /wk +37 ★/day↗accelerating

Maxun is an open-source platform for developers who would rather record a browsing session than write another brittle web scraper.

★ 16.9k TypeScript Data Tooling · explained

huohuoer/wechat-cli

+5.2% /wk +13 ★/day↗accelerating

Scans running WeChat process memory to extract SQLCipher keys, then exposes your chat database as a JSON-first CLI designed for AI agent consumption.

★ 1.7k Coding Assistants · explained

ray-r-ren/agent-apprenticeship

+1.2% /wk +2.3 ★/day↗accelerating

It captures the messy reality of long-horizon agent tasks and turns execution traces into reusable, shareable learning signals.

★ 1.3k Python Agents · explained Feature

tmwgsicp/wechat-download-api

+6.3% /wk +8.0 ★/day→steady

It scrapes WeChat public articles and serves them as RSS, Markdown, and JSON because Tencent won’t.

★ 896 Python Coding Assistants · explained

the-momentum/open-wearables

+2.9% /wk +9.1 ★/day↗accelerating

Open Wearables wants to unify fitness tracker data behind one self-hosted API so developers can stop writing bespoke OAuth flows for every wearable brand.

★ 2.2k Python Domain Apps · explained

allenai/olmocr

+0.6% /wk +16 ★/day↗accelerating

olmOCR exists because LLMs cannot train on PDFs until someone strips the formatting chaos and restores natural reading order.

★ 19.2k Python Data Tooling · explained

buxuku/SmartSub

+1.4% /wk +8.4 ★/day↗accelerating

A cross-platform Electron app that runs speech recognition locally, then hands the results off to anything from Baidu to DeepSeek for translation.

★ 4.3k TypeScript Data Tooling · explained

lance-format/lance

+0.7% /wk +7.3 ★/day↗accelerating

Lance treats vectors, images, and embeddings as first-class citizens instead of awkward guests at Parquet's SQL-only party.

★ 6.9k Rust RAG · Search · explained

loading more…