This project was submitted by Aleksandr Eliseev. It was one of the top submissions in our AI Alignment course (Mar 2024). Participants worked on these projects for 4 weeks.
In the scope of this research, we’ve attempted to fine-tune the 4-bit quantized Mistral 7B model using the Constitutional AI technique (Bai 2022) with a constitution aimed at reducing Sycophancy. An approach similar to synthetically generated data (Wei 2024) has been used as the training data. We found a constitution that reduces sycophancy by ~26.5%, but the sycophancy of the fine-tuned models has increased after the fine-tuning.
Read the full piece here.
