← all repositories
CogStack/MedCAT

MedCAT: NLP for health records that actually ships models

A Python toolkit that extracts medical concepts from clinical text and links them to SNOMED-CT and UMLS—pre-trained, license-warts and all.

530 stars Python Domain Apps
MedCAT
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

MedCAT runs named entity recognition on electronic health records, then maps the extracted terms to biomedical ontologies like SNOMED-CT and UMLS. It comes with four public model packs (including a >4M concept UMLS Full model trained on MIMIC-III), so you’re not starting from zero. The project has since moved to CogStack/cogstack-nlp, with MedCAT v2 incoming.

The interesting bit

The maintainers ship actual downloadable models—not just architecture—which is rarer than it should be in medical NLP. The catch: you need NIH/UMLS credentials to get them, and the project now uses the Elastic License 2.0, which is not OSI-approved. The Dutch model pack even bundles a separate negation detection model, suggesting the tool handles clinical nuance beyond bare extraction.

Key highlights

  • Pre-trained models for UMLS (small and full) and SNOMED International, plus a Dutch variant
  • Built on spaCy v3 with optional Hugging Face Transformers integration
  • CPU-only install path available (saves ~10 GB vs. default GPU dependencies)
  • Live demo trained on full SNOMED-CT + MIMIC-III (spins up on demand, so first load is slow)
  • Logging disabled by default—library users control their own noise

Caveats

  • Repository is deprecated; active development moved to CogStack/cogstack-nlp
  • MedCAT v1.16.x is the final v1 release; v2 is “soon” per the README, with no date
  • Model downloads require UMLS/NIH authentication—no anonymous grab-and-go

Verdict

Worth a look if you’re doing clinical NLP and need ontology-linked output without training from scratch. Skip if you need fully open licensing or can’t navigate UMLS credentialing. Check the new repo first—this one’s a redirect with history.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.