Exploring the Use of Constitutional AI to Reduce Sycophancy in LLMs

Jul 04, 2024

This project was submitted by Aleksandr Eliseev. It was one of the top submissions in our AI Alignment course (Mar 2024). Participants worked on these projects for 4 weeks.

In the scope of this research, we’ve attempted to fine-tune the 4-bit quantized Mistral 7B model using the Constitutional AI technique (Bai 2022) with a constitution aimed at reducing Sycophancy. An approach similar to synthetically generated data (Wei 2024) has been used as the training data. We found a constitution that reduces sycophancy by ~26.5%, but the sycophancy of the fine-tuned models has increased after the fine-tuning.

Read the full piece here.

BlueDot Impact

Discussion about this post

Ready for more?