This project won the “Technical research” prize on our AI Alignment (2024 Jun) course. Participants worked on these projects for 4 weeks. The text below is an excerpt from the final project.
Summary
In this post, I motivate an extension of constitutional AI (CAI) and present one possible concrete execution of that strategy.
TL;DR: When generating AI feedback during the CAI process, principles from the constitution are randomized for each pair of red-teamed prompts and initial responses. A helpful-only model then critiques its initial responses and subsequently revises them. Instead of randomizing selecting principles, I propose we choose principles based on the context provided by each particular prompt/response pair. I call this contextual constitutional AI.
This is intended only as a preliminary insight as part of my AISF: Alignment course project. Due to limited time and funding, I have made certain decisions that have made my investigation into this approach easier.
