michiyasunaga/LinkBERT
A BERT-style pretrained language model enhanced with document hyperlinks and citation links during pretraining for improved knowledge-intensive understanding.

LinkBERT extends BERT pretraining by feeding linked documents (hyperlinks, citations) into the same context alongside single documents, enabling the model to capture cross-document knowledge. It serves as a drop-in BERT replacement and demonstrates improved performance on general language understanding, knowledge-intensive question answering, and cross-document tasks. Models are released in base and large sizes for both general and biomedical domains.