Is MedCAT open source?

Yes — CogStack/MedCAT is open source, released under the Apache-2.0 license.

What language is MedCAT written in?

CogStack/MedCAT is primarily written in Python.

How popular is MedCAT?

CogStack/MedCAT has 531 stars on GitHub.

Where can I find MedCAT?

CogStack/MedCAT is on GitHub at https://github.com/CogStack/MedCAT.

← all repositories

CogStack/MedCAT

MedCAT: NLP for health records that actually ships models

A Python toolkit that extracts medical concepts from clinical text and links them to SNOMED-CT and UMLS—pre-trained, license-warts and all.

★531 stars Python Domain Apps

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

MedCAT runs named entity recognition on electronic health records, then maps the extracted terms to biomedical ontologies like SNOMED-CT and UMLS. It comes with four public model packs (including a >4M concept UMLS Full model trained on MIMIC-III), so you’re not starting from zero. The project has since moved to CogStack/cogstack-nlp, with MedCAT v2 incoming.

The interesting bit

The maintainers ship actual downloadable models—not just architecture—which is rarer than it should be in medical NLP. The catch: you need NIH/UMLS credentials to get them, and the project now uses the Elastic License 2.0, which is not OSI-approved. The Dutch model pack even bundles a separate negation detection model, suggesting the tool handles clinical nuance beyond bare extraction.

Key highlights

Pre-trained models for UMLS (small and full) and SNOMED International, plus a Dutch variant
Built on spaCy v3 with optional Hugging Face Transformers integration
CPU-only install path available (saves ~10 GB vs. default GPU dependencies)
Live demo trained on full SNOMED-CT + MIMIC-III (spins up on demand, so first load is slow)
Logging disabled by default—library users control their own noise

Caveats

Repository is deprecated; active development moved to CogStack/cogstack-nlp
MedCAT v1.16.x is the final v1 release; v2 is “soon” per the README, with no date
Model downloads require UMLS/NIH authentication—no anonymous grab-and-go

Verdict

Worth a look if you’re doing clinical NLP and need ontology-linked output without training from scratch. Skip if you need fully open licensing or can’t navigate UMLS credentialing. Check the new repo first—this one’s a redirect with history.

Frequently asked

What is CogStack/MedCAT?: A Python toolkit that extracts medical concepts from clinical text and links them to SNOMED-CT and UMLS—pre-trained, license-warts and all.
Is MedCAT open source?: Yes — CogStack/MedCAT is open source, released under the Apache-2.0 license.
What language is MedCAT written in?: CogStack/MedCAT is primarily written in Python.
How popular is MedCAT?: CogStack/MedCAT has 531 stars on GitHub.
Where can I find MedCAT?: CogStack/MedCAT is on GitHub at https://github.com/CogStack/MedCAT.