Data Tooling

Data Tooling

newcomers · gaining speed
01
yooper/php-text-analysis
+0.1 ★/daysteady

A PHP-native library that brings text analysis, sentiment scoring, and document classification to codebases that can't justify a Python microservice.

533 PHP Language Models · explained
02
AdolfVonKleist/Phonetisaurus
+0.1 ★/daysteady

Trains models that guess how words sound, because you can't ship a pronunciation dictionary for every proper noun the user will invent.

516 Shell Data Tooling · explained
04
FerreroJeremy/ln2sql
+0.1 ★/daysteady

A Python tool that parses natural language questions and turns them into executable SQL using only a database dump—no live connection required.

521 Python Language Models · explained
05
openml/OpenML
+0.2 ★/daysteady

A 2013-vintage open-science platform for sharing ML experiments, datasets, and results—now being retired in favor of a FastAPI rewrite.

741 PHP Data Tooling · explained
06
synyi/poplar
+0.1 ★/daysteady

A browser-native NLP annotation component for when you need to label text without leaving the DOM.

529 TypeScript Data Tooling · explained
08
mlcommons/ck
+0.2 ★/daysteady

A community-built automation framework trying to make ML benchmarking reproducible across the chaos of GPUs, containers, and constantly shifting software stacks.

647 Python LLMOps · Eval · explained
09
hackingmaterials/matminer
+0.2 ★/daysteady

Matminer collects scattered materials-science datasets and featurizers into one library so researchers can stop writing the same data-prep scripts.

601 HTML Domain Apps · explained
10
meta-toolkit/meta
+0.2 ★/daysteady

MeTA bundles tokenization, search indexes, topic models, and CRFs into one compiled toolkit for researchers who'd rather fight algorithms than package managers.

714 C++ Language Models · explained
12
fhamborg/Giveme5W1H
+0.2 ★/daysteady

A Python library that reverse-engineers the 5W1H structure from news articles, because someone finally decided to treat reporters' training as a spec.

533 HTML Data Tooling · explained
13
ryouchinsa/Rectlabel-support
+0.2 ★/daysteady

RectLabel is a commercial macOS app whose support repo reveals an unusually deep stack of offline ML models for labeling images and video.

553 Jupyter Notebook Data Tooling · explained
14
explosion/prodigy-recipes
+0.2 ★/daysteady

A public repo of commented, tweakable scripts for Explosion's commercial annotation tool.

507 Jupyter Notebook Data Tooling · explained
15
CornellNLP/ConvoKit
+0.2 ★/daysteady

A scikit-learn-flavored toolkit that turns messy conversations into measurable social signals.

635 Jupyter Notebook Data Tooling · explained
16
atilika/kuromoji
+0.2 ★/daysteady

A self-contained Java morphological analyzer that ships its own dictionaries so you don't have to wrestle with MeCab.

1k Java Data Tooling · explained
19
adbar/German-NLP
+0.2 ★/daysteady

Someone finally catalogued the chaos of German-language NLP resources so you don't have to hunt through CLARIN portals at 2am.

524 Learning · explained
loading more…

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.