← all repositories

TransformerLensOrg/TransformerLens

A library for inspecting and editing the internal activations of GPT-2 style language models to reverse-engineer learned algorithms.

TransformerLens
Velocity · 7d
+2.5
★ / day
Trend
steady
star history

TransformerLens is a mechanistic interpretability library that loads over 50 open source language models and exposes their internal activations. It allows users to cache any intermediate activation and attach functions to edit, remove, or replace activations as the model runs. The library’s goal is to reverse engineer the algorithms learned by trained models directly from their weights, supporting research into understanding how large language models work internally.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.