← all repositories
thunlp/OpenKE

A Swiss Army knife for embedding knowledge graphs

OpenKE bundles a decade of knowledge-graph embedding research into one PyTorch-and-C++ toolkit so you don't have to reimplement TransE through RotatE from scratch.

4k stars Python Other AIML Frameworks
OpenKE
Velocity · 7d
+1.3
★ / day
Trend
steady
star history

What it does OpenKE trains vector representations (embeddings) of entities and relations in large knowledge graphs. It wraps PyTorch models with C++ preprocessing and negative sampling, aiming for GPU-friendly training without the usual Python overhead. The project also maintains TensorFlow and bare-metal C++ variants under related repos.

The interesting bit The toolkit is essentially a reproducibility engine: the authors include hyperparameter scripts that recover published Hits@10 scores on FB15K237 and WN18RR, and they ship pre-trained embeddings for Wikidata, Freebase, and the Chinese XLORE graph. That saves you from training on 86 million Freebase entities yourself.

Key highlights

  • PyTorch branch covers RESCAL, DistMult, ComplEx, Analogy, TransE/H/R/D, SimplE, and RotatE (plus an adversarial RotatE variant)
  • TensorFlow and Fast-TransX (C++) branches available for lighter or legacy deployments
  • Evaluation includes filtered link prediction with type-constrained corruption for large entity sets
  • Pre-trained embeddings downloadable for Wikidata-5M, full Wikidata, Freebase, and XLORE
  • Published at EMNLP 2018; still actively referenced in the OpenSKL ecosystem

Caveats

  • You must compile C++ extensions with make.sh before the PyTorch quick-start works
  • Data format is rigid: raw entity/relation names must be pre-mapped to integer IDs; the README warns that wrong formatting “may cause segmentation fault”
  • The README truncates mid-sentence in the XLORE section, so some resource links are incomplete

Verdict Worth a look if you’re doing knowledge-graph research and need a baseline implementation that actually matches the papers. Skip it if you want a modern, end-to-end pipeline with automatic entity resolution and no C++ compilation step.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.