← all repositories

allenai/scispacy

A spaCy-based NLP pipeline with pretrained models for tokenization, POS tagging, syntactic parsing, and named entity recognition in scientific/biomedical text.

2k stars Python Domain AppsData Tooling
scispacy
Velocity · 7d
+0.7
★ / day
Trend
steady
star history

This repository provides custom spaCy pipes and pretrained models tailored for scientific and biomedical documents. It includes a specialized tokenizer with domain-specific tokenization rules, a POS tagger and syntactic parser trained on biomedical corpora, entity span detection models, and NER models for biomedical terminology. Separate model packages are available for different tasks and sizes.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.