andyzoujm/representation-engineering
A research paper and codebase for analyzing population-level representations in deep neural networks to improve AI transparency.

This repository contains the official implementation for a research paper introducing Representation Engineering (RepE), a top-down approach to AI transparency. It provides methods for monitoring and manipulating high-level cognitive phenomena in deep neural networks by analyzing population-level representations. The work draws from cognitive neuroscience and offers baselines and techniques for studying how representations encode information across transformer-based language models.