Is magpie open source?

Yes — inspirehep/magpie is open source, released under the MIT license.

What language is magpie written in?

inspirehep/magpie is primarily written in Python.

How popular is magpie?

inspirehep/magpie has 687 stars on GitHub.

Where can I find magpie?

inspirehep/magpie is on GitHub at https://github.com/inspirehep/magpie.

← all repositories

inspirehep/magpie

CERN's multi-label classifier: physics abstracts to keywords

A Keras wrapper that trains word2vec embeddings, then slaps labels on text—born from sorting High Energy Physics papers.

★687 stars Python ML Frameworks Language Models

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Magpie is a Python wrapper around a Keras neural network for multi-label text classification. You feed it pairs of .txt files and .lab files, it builds word2vec embeddings, normalizes with a scaler, and trains a model to predict multiple labels per document. It was built at CERN to auto-categorize physics abstracts and extract keywords.

The interesting bit

The whole pipeline is deliberately chunked into swappable pieces: word2vec, scaler, Keras model. You can pre-train embeddings on your full corpus, save them, and hot-swap later. There’s also a batch_train() mode when your data won’t fit in RAM—unusual thoughtfulness for a small research tool.

Key highlights

Built on Yoon Kim’s CNN-for-text architecture, adapted by Mark Berger’s follow-up work
Three-file format: .txt for text, .lab for labels (one per line), matched by filename
init_word_vectors() combines word2vec training + scaler fitting in one call
batch_train() for memory-constrained training
Not on PyPI; install via pip install git+https://... with dependency version gotchas

Caveats

Last tagged release is v2.1.1; unclear how actively maintained
Dependency versions are finicky enough that the README warns about checking setup.py
No GPU guidance, no modern embedding options (BERT, etc.)—this is firmly word2vec-era

Verdict

Grab it if you need a quick, hackable multi-label baseline with inspectable word embeddings. Skip it if you want SOTA or a batteries-included library—this is research glue code that happens to be well-documented glue code.

Frequently asked

What is inspirehep/magpie?: A Keras wrapper that trains word2vec embeddings, then slaps labels on text—born from sorting High Energy Physics papers.
Is magpie open source?: Yes — inspirehep/magpie is open source, released under the MIT license.
What language is magpie written in?: inspirehep/magpie is primarily written in Python.
How popular is magpie?: inspirehep/magpie has 687 stars on GitHub.
Where can I find magpie?: inspirehep/magpie is on GitHub at https://github.com/inspirehep/magpie.