This project was submitted by Annie Sorkin. It was one of the top submissions in our AI Alignment course (Mar 2024). Participants worked on these projects for 4 weeks.
Transferring debate to an abstract algebra MMLU dataset is not trivial.
When GPT-3.5 is used as a judge, the outcomes may be sensitive to exact prompt phrasing.
GPT-3.5 may perform worse in judging the debate than answering the question directly.
We proposed a universal prompting approach that avoids most of the self-defeating behavior.
Read the full piece here.
