This project was submitted by Guido Bergman. It was an Outstanding Submmission for the Technical AI Safety Project Sprint (Jan 2026). Participants worked on these projects for 5 weeks. The text below is an excerpt from the final project.
TL;DR
Existing Linux security tools are highly effective but require significant expertise to configure and maintain. aegish[1] is a prototype Linux shell that adds an LLM layer: it intercepts every command, screens it through static analysis, then sends it to an LLM that classifies it as ALLOW, WARN, or BLOCK using a natural-language decision tree. The LLM understands command intent, is easy to configure, and may help scale defenses alongside the growing capability of AI-driven attacks. In production mode, kernel-level enforcement (Landlock) provides a final safety net.
9 LLMs from 4 providers were benchmarked on their ability to distinguish intent: they were tasked to BLOCK 676 harmful commands (extracted from GTFOBins) and to correctly classify 496 harmless commands as either ALLOW or WARN.
The harmless benchmark proved to be saturated (96.8–100% harmless acceptance rate for all models). The real differentiator was the malicious detection rate, where 4 of 9 models exceeded 95%.
Surprisingly, smaller models outperform flagships. GPT-5 Mini beats GPT-5.1, and Claude Haiku 4.5 beats both Claude Opus 4.6 and Claude Sonnet 4.5.
Beyond the LLM, aegish includes several safeguards (input canonicalization, command substitution resolution, script inspection, and role-based trust levels), many of which were directly motivated by bypass vectors discovered during security testing. However, the system as a whole has not been hardened to the level required for adversarial deployment.
Full Project
You can view the full project here.


