Is Annotated-Semantic-Relationships-Datasets open source?

Yes — davidsbatista/Annotated-Semantic-Relationships-Datasets is an open-source project tracked on heatdrop.

How popular is Annotated-Semantic-Relationships-Datasets?

davidsbatista/Annotated-Semantic-Relationships-Datasets has 709 stars on GitHub.

Where can I find Annotated-Semantic-Relationships-Datasets?

davidsbatista/Annotated-Semantic-Relationships-Datasets is on GitHub at https://github.com/davidsbatista/Annotated-Semantic-Relationships-Datasets.

← all repositories

davidsbatista/Annotated-Semantic-Relationships-Datasets

A junk drawer of labeled entity pairs, curated with care

Someone finally collected all the scattered NLP relation-extraction datasets into one repo so you don't have to hunt through decade-old conference websites.

★709 stars Data Tooling

View on GitHub ↗

Annotated-Semantic-Relationships-Datasets

Not currently ranked — collecting fresh signals.

star history

What it does

This repository gathers 20+ publicly available datasets for training supervised models to extract semantic relationships between entities or nominals. It covers English and Portuguese, spans 2005–2020, and sorts everything into three buckets: traditional closed-class relation extraction, open information extraction (untyped relations), and distantly supervised data.

The interesting bit

The curation is the product. Each dataset includes original citations, year, language, and class count in tidy tables—no more digging through ACL Anthology PDFs to figure out what SemEval 2010 Task 8 actually contains. The author also distinguishes annotation regimes that papers often conflate: manually labeled, open-class, and silver-standard distant supervision.

Key highlights

13 traditional IE datasets, including classics like SemEval 2007/2010, AImed (protein interactions), and Wikipedia person-to-person relations with 53 labels
4 open IE datasets: ReVerb, ClausIE, and two IJCNLP/EMNLP sets
4 distantly supervised sets, including Google’s 2013 relation extraction corpus and a 2020 hybrid distant-supervision-plus-crowdsourcing corpus for phenotype-gene relations
Portuguese coverage: ReRelEM (4 relation types) and DBpediaRelations-PT (10 types, manually revised after distant supervision)
All datasets hosted directly or linked with original paper citations

Caveats

README descriptions vary in depth; some datasets get paragraphs, others get a sentence
No code, no loaders, no unified format—this is purely a data catalog with downloads
A few entries are external links rather than hosted files (e.g., Riedel’s 2010 ECML data, Google’s corpus)

Verdict

Worth bookmarking if you’re building or benchmarking relation extractors and need to know which dataset fits your language, domain, and supervision setup. Skip it if you want preprocessed tensors or a training framework—this is just the raw material, well-organized.

Frequently asked

What is davidsbatista/Annotated-Semantic-Relationships-Datasets?: Someone finally collected all the scattered NLP relation-extraction datasets into one repo so you don't have to hunt through decade-old conference websites.
Is Annotated-Semantic-Relationships-Datasets open source?: Yes — davidsbatista/Annotated-Semantic-Relationships-Datasets is an open-source project tracked on heatdrop.
How popular is Annotated-Semantic-Relationships-Datasets?: davidsbatista/Annotated-Semantic-Relationships-Datasets has 709 stars on GitHub.
Where can I find Annotated-Semantic-Relationships-Datasets?: davidsbatista/Annotated-Semantic-Relationships-Datasets is on GitHub at https://github.com/davidsbatista/Annotated-Semantic-Relationships-Datasets.