← all repositories
SimmerChan/KG-demo-for-movie

A movie knowledge graph you can actually interrogate

A hands-on Chinese tutorial repo that builds a film KG from scratch and wires it to a simple question-answering interface.

1.3k stars Python Other AI
KG-demo-for-movie
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does This repo is a complete walkthrough for building a movie knowledge graph and hooking it up to a basic KBQA (knowledge-base question answering) system. It crawls film data from The Movie DB, stores it in MySQL, maps it to an RDF ontology via D2RQ, loads it into Apache Jena Fuseki, and exposes a Streamlit web interface where you can ask natural-language questions that get translated into SPARQL queries.

The interesting bit The project is deliberately old-school: Python 3.6, Jena 3.5.0, D2RQ 0.8.1, and template-based NL-to-SPARQL conversion using REfO pattern matching rather than modern LLMs. That makes it a useful fossil for understanding how KBQA worked before transformers ate the world — or for building lightweight, deterministic pipelines where you control every step.

Key highlights

  • End-to-end pipeline: crawler → MySQL → D2RQ mapping → RDF → Jena Fuseki → Streamlit frontend
  • Includes Docker setup (recommended) alongside manual local installation instructions
  • Ships with a Protégé-built ontology (ontology.owl) and custom inference rules (rules.ttl)
  • Pattern-matching NL-to-SPARQL via REfO with jieba segmentation and external dictionaries for film/person names
  • Companion Zhihu column and WeChat group for Chinese-language learners in the KG/NLP space

Caveats

  • Jena 3.5.0 has a known bug where Fuseki restart fails; workaround is deleting TDB files and reloading
  • Python 3.6 and pinned dependency versions are end-of-life territory
  • The Movie DB crawler requires you to register and supply your own API key
  • Author notes one helper script (tradition2simple) has unknown provenance

Verdict Worth a weekend if you’re learning knowledge graphs or need a deterministic, explainable KBQA baseline. Skip it if you want production-ready tooling or modern neural approaches — this is pedagogy, not infrastructure.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.