Trying to Automate Detection of Translation Heads

Jul 05, 2024

This project was submitted by Erik Nordby. It was one of the top submissions in our AI Alignment course (Mar 2024). Participants worked on these projects for 4 weeks.

For my project, I decided to work on one of the 200 Open Problems in Mechanistic Interpretability. Specifically, I tried automating ways to find translation heads