Dissecting the Development of Toy Models of Superpoisition

Jul 06, 2024

This project was submitted by Joe Emerson. It was one of the top submissions in our AI Alignment course (Mar 2024). Participants worked on these projects for 4 weeks.

This is a project outline that expands on work by Chen et al. (2023) on the development of Anthropic’s Toy Models of Superposition. This project has lots of low hanging fruit for anyone looking to put in some work learning SLT.