tomaarsen/attention_sinks
A library that modifies pre-trained LLMs to generate fluent text indefinitely beyond their original training context using attention sink mechanisms.

The library adapts existing transformer-based LLMs to use a sliding window attention variant that maintains the ability to produce coherent text over arbitrarily long sequences. It does not require retraining — modifications are applied post-hoc to the attention mechanism. The project provides benchmark code comparing perplexity across multiple model families including Llama-2, Falcon, Mistral, and GPT-J under long-context generation scenarios.