Replicating Toy Models of Universality

Jul 06, 2024

This project was submitted by Nathan Reed. It was one of the top submissions in our AI Alignment course (Mar 2024). Participants worked on these projects for 4 weeks.

For my March 2024 AI Safety Fundamentals project, I replicated Bilal Chughtai et al’s paper A Toy Model of Universality. Due to time and skill constraints, I was only able to test the logit attributions (Section 5.1) and the embeddings and unembeddings (Section 5.2) using MLP models trained on the group C_113, but this was sufficient to confirm the overall thesis of the paper.

Read the full piece here.

BlueDot Impact

Discussion about this post

Ready for more?