Is Giveme5W1H open source?

Yes — fhamborg/Giveme5W1H is open source, released under the Apache-2.0 license.

What language is Giveme5W1H written in?

fhamborg/Giveme5W1H is primarily written in HTML.

How popular is Giveme5W1H?

fhamborg/Giveme5W1H has 533 stars on GitHub.

Where can I find Giveme5W1H?

fhamborg/Giveme5W1H is on GitHub at https://github.com/fhamborg/Giveme5W1H.

← all repositories

fhamborg/Giveme5W1H

News summarization by brute-force journalism

A Python library that reverse-engineers the 5W1H structure from news articles, because someone finally decided to treat reporters' training as a spec.

★533 stars HTML Data Tooling Language Models

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does Giveme5W1H parses news articles and extracts phrases answering the classic journalistic questions: who, what, when, where, why, and how. It exposes both a Python 3.6+ library and a RESTful API, and expects input in a JSON format matching the companion news-please crawler’s output.

The interesting bit The system leans on Stanford CoreNLP for heavy linguistic lifting, but wraps it in a scoring pipeline that ranks candidate phrases per question rather than treating extraction as a single-shot classification problem. The “learn weights” tooling also suggests the authors acknowledge their heuristics need tuning per domain.

Key highlights

Requires running a separate Stanford CoreNLP Server (port 9000), which initializes lazily and can take minutes on first use
REST API runs on port 9099 with a browser-playground for testing articles
Caches CoreNLP and enhancer output to disk to avoid reprocessing
Ships with file-handler utilities for batch-processing JSON article folders
Academic lineage: published at INRA 2019, Apache 2.0 licensed

Caveats

The README warns that some “Additional Information” is outdated, which is… not ideal for a documentation section
Manual CoreNLP server management is mandatory; the authors explicitly rejected transparent integration due to startup latency
No GPU acceleration mentioned; this is CPU-bound NLP from the CoreNLP era

Verdict Worth a look if you’re building news analysis pipelines and need structured event summaries without training your own models. Skip it if you want modern transformer-based extraction or a fully self-contained library — this is a 2019-vintage system with 2019-vintage dependencies.

Frequently asked

What is fhamborg/Giveme5W1H?: A Python library that reverse-engineers the 5W1H structure from news articles, because someone finally decided to treat reporters' training as a spec.
Is Giveme5W1H open source?: Yes — fhamborg/Giveme5W1H is open source, released under the Apache-2.0 license.
What language is Giveme5W1H written in?: fhamborg/Giveme5W1H is primarily written in HTML.
How popular is Giveme5W1H?: fhamborg/Giveme5W1H has 533 stars on GitHub.
Where can I find Giveme5W1H?: fhamborg/Giveme5W1H is on GitHub at https://github.com/fhamborg/Giveme5W1H.