allenai/scispacy
A spaCy-based NLP pipeline with pretrained models for tokenization, POS tagging, syntactic parsing, and named entity recognition in scientific/biomedical text.

Velocity · 7d
+0.7
★ / day
Trend
→steady
star history
This repository provides custom spaCy pipes and pretrained models tailored for scientific and biomedical documents. It includes a specialized tokenizer with domain-specific tokenization rules, a POS tagger and syntactic parser trained on biomedical corpora, entity span detection models, and NER models for biomedical terminology. Separate model packages are available for different tasks and sizes.