This project was submitted by Nathan Reed. It was one of the top submissions in our AI Alignment course (Mar 2024). Participants worked on these projects for 4 weeks.
For my March 2024 AI Safety Fundamentals project, I replicated Bilal Chughtai et al’s paper A Toy Model of Universality. Due to time and skill constraints, I was only able to test the logit attributions (Section 5.1) and the embeddings and unembeddings (Section 5.2) using MLP models trained on the group C_113, but this was sufficient to confirm the overall thesis of the paper.
Read the full piece here.
