← all repositories
awslabs/dgl-ke

AWS's graph embedding toolkit that outruns rivals on billion-edge KGs

A research-backed tool for training knowledge graph embeddings at scale, built on DGL and benchmarked against GraphVite and PyTorch-BigGraph.

1.3k stars Python ML FrameworksOther AI
dgl-ke
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

What it does DGL-KE trains, evaluates, and runs inference on knowledge graph embeddings — dense vector representations of entities and relations in graphs. It wraps six established models (TransE, TransR, RESCAL, DistMult, ComplEx, RotatE) in a command-line interface that works on single machines, GPUs, or distributed clusters.

The interesting bit The speed claims are specific and large: on a graph with 86M nodes and 338M edges, it finishes in 100 minutes on 8 GPUs or 30 minutes on a 4-machine CPU cluster, reportedly 2–5× faster than competing tools. The README pins these numbers to exact hardware and rival systems, which is rarer than you’d expect.

Key highlights

  • Built on Deep Graph Library (DGL); PyTorch backend
  • Distributed training via dglke_dist_train, single-machine via dglke_train
  • Includes evaluation (dglke_eval) and two inference modes: link prediction and embedding similarity
  • Ships with benchmark comparisons against GraphVite (FB15k) and PyTorch-BigGraph (Freebase)
  • Published at SIGIR 2020 with a full optimization paper

Caveats

  • The project appears in maintenance mode; the README now redirects TransE/DistMult/RotatE users to the newer GraphStorm project
  • Installation still uses sudo pip3, which may surprise modern Python developers

Verdict Worth a look if you’re embedding billion-scale knowledge graphs and need reproducible speed benchmarks. Skip it if you only need standard small-graph KGE — GraphStorm or plain PyTorch-BigGraph may be more actively maintained.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.