← all repositories
liuhuanyong/TextGrapher

Turning news articles into knowledge graphs, the hard way

A Chinese NLP project that extracts entities and relationships from text and renders them as interactive HTML graphs.

1.5k stars Python RAG · SearchData Tooling
TextGrapher
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

TextGrapher takes a Chinese news article, runs it through keyword extraction, named entity recognition, and subject-verb-object parsing, then dumps the results into a force-directed graph saved as graph.html. The API is two lines: instantiate CrimeMining(), call .main(content). The examples lean heavily on criminal cases and corporate scandals—ZTE, the Wei Zexi medical fraud case, a campus murder.

The interesting bit

The author is upfront that this is an attempt at a genuinely hard problem: how to represent document semantics in a structured, glanceable form. The pipeline fuses three extraction layers—frequency, entities, and syntactic triples—rather than betting on any single technique. That frankness is refreshing in a field prone to hand-waving.

Key highlights

  • Ships with working examples on real Chinese news events (see screenshots)
  • Combines keyword, NER, and SVO extraction into one graph view
  • Output is a self-contained HTML file—no frontend build step
  • ~1,500 stars suggests the idea resonates, even if the implementation is rough

Caveats

  • The README explicitly warns that “NLP performance limits” create “multiple deficiencies” in extraction quality
  • Class name CrimeMining() suggests the code may be narrowly tuned to crime/corporate scandal text; unclear how it generalizes
  • No mention of model versions, dependencies, or installation instructions

Verdict

Worth a look if you’re prototyping Chinese text-to-graph pipelines and need a baseline to beat. Skip it if you need production-grade accuracy or English-language support—the author doesn’t claim either.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.