← all repositories
hackingmaterials/matminer

A Python toolkit for mining materials data without reinventing the wheel

Matminer collects scattered materials-science datasets and featurizers into one library so researchers can stop writing the same data-prep scripts.

601 stars HTML Domain AppsData Tooling
matminer
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

Matminer is a Python library that gathers datasets, data-retrieval methods, and featurizers for materials science into a single package. It handles the tedious work of finding, formatting, and citing community-developed data so you can focus on analysis rather than wrangling. Python 3.11+ required.

The interesting bit

The library tracks provenance for you: every dataset and featurizer carries a citations() method that spits out BibTeX entries. It’s a small feature that solves a real pain point—academic papers where the data sources are vaguely waved at in a footnote.

Key highlights

  • Bundles community datasets and featurizers in one importable library
  • Built-in citation tracking via citations() methods on datasets, retrievers, and featurizers
  • Companion projects for automation (automatminer) and benchmarking (matbench)
  • Active since at least 2018 with a dedicated help forum
  • Separate examples repo with worked demonstrations

Caveats

  • The README is thin on specifics: no dataset counts, no performance claims, no architecture overview
  • The examples and deeper docs live in separate repositories, so you’ll be clicking around

Verdict

Materials scientists doing ML on structure-property relationships should grab this to skip boilerplate data loading. Everyone else can pass—there’s nothing generic here worth repurposing.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.