Contextual Constitutional AI

Oct 14, 2024

This project was submitted by Akshat Naik. It won the “Technical Research” prize in our AI Alignment course (Jun 2024). Participants worked on these projects for 4 weeks. The text below is an excerpt from the final project.

Summary

In this post, I motivate an extension of constitutional AI (CAI) and present one possible concrete execution of that strategy.

TL;DR: When generating AI feedback during the CAI process, principles from the constitution are randomized for each pair of red-teamed prompts and initial responses. A helpful-only model then critiques its initial responses and subsequently revises them. Instead of randomizing selecting principles, I propose we choose principles based on the context provided by each particular prompt/response pair. I call this contextual constitutional AI.

This is intended only as a preliminary insight as part of my AISF: Alignment course project. Due to limited time and funding, I have made certain decisions that have made my investigation into this approach easier.

Full post

View the full project.

BlueDot Impact

Discussion about this post

Ready for more?