This project was submitted by Francesca Gomez. It was one of the top submissions in our AI Alignment course (Jun 2024). Participants worked on these projects for 4 weeks. The text below is an excerpt from the final project.
Abstract
Despite recognition of the potential of longer term catastrophic risks from advanced AI models from Frontier AI labs and policy makers, standard methodologies for managing longer term catastrophic risks from AI models remain elusive.
This experiment explores how Monte Carlo simulations, a risk modelling technique widely used in other industries for risk assessment, can be used to help reduce uncertainty around the longer-term risk of losing control to AI.
Following the dominant approach of defending model safety based on limited model capabilities and protective measures (also called controls), the Monte Carlo simulation quantifies and compares these factors, to arrive at a range of risk likelihoods.
The work contributes:
A Monte Carlo model to combine information on qualitative assessments of dangerous capabilities and protective measures to provide quantitive risk likelihood predictions
A potential approach for augmenting Responsible Scaling Policies with quantifiable ranges to demonstrate how claims over future safety could be tested and proven
Run the Monte Carlo model on Kaggle.
Code also available here: github.com/francescini/monte-carlo
Full project
You can view the full project here.
