This project was submitted by Reworr R. It was the runner-up for the “Technical Research” prize in our AI Alignment course (Jun 2024). Participants worked on these projects for 4 weeks. The text below is an excerpt from the final project.
Project Overview
The LLM-Hack Agent Honeypot is a project designed to monitor, capture, and analyze autonomous AI Hacking Agents in the real world.
How It Works:
Simulation: We deploy a simulated “vulnerable” service to attract potential threats.
Catching Mechanisms: This service incorporates specific counter-techniques designed to detect and capture AI-Hacking Agents.
Monitoring: We monitor and log all interactions, waiting for potential attacks from LLM-powered agents.
Capture and Analysis: When an AI agent engages with our system, we capture the attempt and their system prompt details.
Why?
Our objectives aim to improve awareness of AI Hacking Agents and their current state of risks by understanding their real-world usage and studying their algorithms and behavior in the wild.
Full project
You can view the full project here.
