Treechop - Reproducing the tree gridworld for investigating goal misgeneralisation

Jul 04, 2024

This project was submitted by Amy Andrews. It was one of the top submissions in our AI Alignment course (Mar 2024). Participants worked on these projects for 4 weeks.

I aimed to create a realization of the description of a tree gridworld given in Shah et al (2022), ready to be used in a project to replicate their goal misgeneralization results by also being able to run a reinforcement learning agent through it. Through this project, I aimed to build up reinforcement-learning environment software engineering experience and eventually to get a ‘gearsy’ inside-view/ understanding of this specific instance of inner alignment and goal misgeneralization.

Read the full piece here.

BlueDot Impact

Discussion about this post

Ready for more?