Is skrub open source?

Yes — skrub-data/skrub is open source, released under the BSD-3-Clause license.

What language is skrub written in?

skrub-data/skrub is primarily written in Python.

How popular is skrub?

skrub-data/skrub has 1.6k stars on GitHub.

Where can I find skrub?

skrub-data/skrub is on GitHub at https://github.com/skrub-data/skrub.

← all repositories

skrub-data/skrub

Pandas meets scikit-learn without the duct tape

A library that finally treats messy dataframes as first-class citizens in ML pipelines.

★1.6k stars Python Data Tooling

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

skrub bridges the awkward gap between raw, messy dataframes and scikit-learn’s tidy numerical world. It provides transformers and tools that handle dirty categorical data, text, and other real-world column types without forcing you into a preprocessing rabbit hole.

The interesting bit

The project evolved from dirty_cat, a focused tool for encoding messy categories, into something broader: making entire dataframes ML-ready. The name change signals ambition beyond just cleaning up strings.

Key highlights

Built specifically for pandas-like dataframes, not as an afterthought
Handles “dirty” categorical data (typos, inconsistencies, rare categories) that standard encoders choke on
Integrates with scikit-learn pipelines without custom glue code
Active community with Discord, learning materials, and example galleries
1,618 stars and steady development under the skrub-data org

Caveats

The README is thin on specifics; you’ll need to dig into the website and examples to understand actual capabilities
Formerly dirty_cat — some documentation and Stack Overflow answers may still reference the old name

Verdict

Worth a look if you spend more time wrestling data into shape than training models. Skip it if your data is already clean numerical matrices or you live entirely in deep-learning frameworks.

Frequently asked

What is skrub-data/skrub?: A library that finally treats messy dataframes as first-class citizens in ML pipelines.
Is skrub open source?: Yes — skrub-data/skrub is open source, released under the BSD-3-Clause license.
What language is skrub written in?: skrub-data/skrub is primarily written in Python.
How popular is skrub?: skrub-data/skrub has 1.6k stars on GitHub.
Where can I find skrub?: skrub-data/skrub is on GitHub at https://github.com/skrub-data/skrub.