Teaching agents to walk knowledge graphs one hop at a time
A 2017 RL framework that treats multi-hop KG reasoning as pathfinding with embedding-based states and a reward function that cares about accuracy, diversity, and efficiency—not just getting there.

What it does DeepPath trains a policy-based RL agent to find reasoning paths through knowledge graphs. The agent navigates in a vector space built from KG embeddings (TransE, TransH, TransD, TransR), sampling relations to extend its current path. It supports two downstream tasks: fact prediction and link prediction, evaluated on NELL-995 and FB15k-237.
The interesting bit The reward function is deliberately multi-objective: it penalizes inaccurate, redundant, and wasteful paths rather than rewarding any successful arrival. The agent’s “state” is a continuous embedding vector, not a discrete graph position—so it reasons in latent space rather than crawling raw triples.
Key highlights
- Policy-gradient RL with continuous states derived from pre-trained KG embeddings
- Reward shaping across three axes: accuracy, path diversity, and reasoning efficiency
- Outperforms path-ranking algorithms (PRA-style) and pure embedding methods on Freebase and NELL benchmarks
- Ships with pre-discovered reasoning paths, so you can skip training and jump straight to evaluation
- TensorFlow implementation with shell scripts for relation-specific tasks
Caveats
- The README contains a typo (“accuravy”) that has survived since 2017—suggesting minimal maintenance
- Requires manual dataset placement and relation-name hunting in
NELL-995/tasks/ - Training “might take sometime” (author’s words); no hardware guidance or timing estimates given
- Built on 2017-era TensorFlow; compatibility with modern versions is unclear
Verdict Worth a look if you’re studying historical KG reasoning methods or need a baseline that combines RL with symbolic structure. Skip it if you want maintained code, modern frameworks, or plug-and-play APIs—this is research artifact territory.