← all repositories
a-r-j/graphein

Turning proteins into graph neural network fuel

A Python toolkit that converts protein structures, RNA, and biological networks into graph representations ready for deep learning.

1.2k stars Jupyter Notebook Domain AppsData ToolingML Frameworks
graphein
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does Graphein takes raw biological data—PDB structures, AlphaFold predictions, RNA dot-bracket notation, protein-protein interactions, even small molecules—and builds graph representations from them. It outputs standard PyData formats and graph objects compatible with PyTorch Geometric and DGL. There’s also a CLI for batch processing via YAML configs.

The interesting bit The breadth is the point. Most tools handle one biological graph type; Graphein covers proteins (residue-level and atomic), RNA, molecular graphs, PPI networks, and gene regulatory networks under one API. It also extracts protein meshes and subgraphs, and includes ready-made dataloaders—essentially trying to be the scikit-learn of biological graph construction.

Key highlights

  • Supports PDB, AlphaFold DB, and FoldComp datasets directly
  • RNA graphs from dot-bracket notation with optional sequence constraints
  • Molecular graphs from SMILES, SDF, MOL2, and PDB files
  • PPI and gene regulatory network construction with multiple edge sources (STRING, BioGRID)
  • CLI batch processing and extensive Colab tutorial notebooks
  • Published at NeurIPS 2022

Caveats

  • README is mostly feature listings and API examples; no performance benchmarks or comparison to alternatives
  • The “Protein Tensor Module” (v1.6.0) is mentioned but not explained in the truncated README
  • Some code snippets in README have minor formatting errors (e.g., malformed Colab badge links)

Verdict Worth a look if you’re doing geometric deep learning on biological structures and tired of writing your own graph construction pipelines. Skip it if you only need one specific graph type and already have a working solution.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.