A Monte Carlo Simulation for estimating the risk of loss of control to AI

Oct 17, 2024

This project was submitted by Francesca Gomez. It was one of the top submissions in our AI Alignment course (Jun 2024). Participants worked on these projects for 4 weeks. The text below is an excerpt from the final project.

Abstract

Despite recognition of the potential of longer term catastrophic risks from advanced AI models from Frontier AI labs and policy makers, standard methodologies for managing longer term catastrophic risks from AI models remain elusive.

This experiment explores how Monte Carlo simulations, a risk modelling technique widely used in other industries for risk assessment, can be used to help reduce uncertainty around the longer-term risk of losing control to AI.

Following the dominant approach of defending model safety based on limited model capabilities and protective measures (also called controls), the Monte Carlo simulation quantifies and compares these factors, to arrive at a range of risk likelihoods.

The work contributes:

A Monte Carlo model to combine information on qualitative assessments of dangerous capabilities and protective measures to provide quantitive risk likelihood predictions
A potential approach for augmenting Responsible Scaling Policies with quantifiable ranges to demonstrate how claims over future safety could be tested and proven

Run the Monte Carlo model on Kaggle.

Code also available here: github.com/francescini/monte-carlo

Full project

You can view the full project here.

BlueDot Impact

Discussion about this post

Ready for more?