Teams of AI Agents Can Find and Exploit New Cyber Vulnerabilities

New research shows that AIs can be used to find and exploit previously unknown cyber vulnerabilities. While it was already known that AI could generate code to exploit known vulnerabilities based on descriptions, this is the first documented instance of AI discovering new vulnerabilities.

The research, conducted by the University of Illinois Urbana-Champaign and funded by Open Philanthropy, demonstrates an important advancement in AI capabilities.

In Short

  • GPT-4-based agents were organized into a team with specialized roles, including one planner, one manager, and several AIs focused on specific types of cyber vulnerabilities.
  • Given up to five attempts, the AI team successfully identified and exploited 8 out of 15 vulnerabilities. These vulnerabilities were chosen from openly reported and patched vulnerabilities that were not part of the AI’s training data.
  • The average cost for a hacking attempt by the AI team was $4–5, and $24–25 for a successful attempt. The authors of the study estimate that human costs for the same work would be around $75, and they predict that the AI team’s costs will be halved every 12 months or faster.
  • This research provides yet another example of how innovative tool-use and prompting of AIs can enhance their performance.

AI Agents

To most people outside AI research, AI is nearly synonymous with chatbots – tools used to answer questions, improve texts, or serve as a sounding board for ideas. The concept of an AI agent leverages the capability of large language models (LLMs) to write commands that can access a vast array of tools. This enables AIs to not only respond to queries but also perform actions with various digital and physical tools – from replying to emails to creating aspirin.

AI agents have varying capabilities, but their general workflow typically follows these steps:

  1. Receive a complex task or goal from a user.
  2. Analyze the goal and break it down into more manageable sub-tasks.
  3. Select the appropriate tool for each sub-task (sometimes also searching the web for suitable tools).
  4. Use the tool to complete the sub-task (sometimes also finding and reading documentation on how to use the tool).
  5. Evaluate the result of the sub-task. If the result is not satisfactory, attempt to improve it.
  6. Synthesize the results from all the sub-tasks into a combined outcome and deliver it to the user.

This workflow not only connects an AI to external tools – it also allows LLMs to solve more complex tasks than a chatbot could handle with a simple prompt. However, there are also drawbacks. AI agents are more expensive to use, and they are somewhat prone to getting stuck on a sub-task, iterating beyond what is reasonable instead of giving up.

A Team of Agents

The research from the University of Illinois uses a team of agents, each specialized in specific tasks. Crucially, there is a planner agent and a manager agent, helping coordinate and monitor the efforts of the agents actually trying to find and exploit different types of cyber vulnerabilities. These meta-agents can tell the hacker agents to abandon their attempts, keep going, or pass along information that one hacker agent found to another agent.

Coordinated teams of agents are uncommon, but the idea is not new (see, for example, ChatDev). A drawback is that they are even more expensive to run than single AI agents, but they can often manage more complex tasks. Using specialized AI agents can potentially reduce some costs since easy sub-tasks can be delegated to agents using cheaper LLMs.

Learned or Remembered?

A key point in this research is that the AI team could find and exploit cyber vulnerabilities that were not part of its training data – only vulnerabilities found after the cut-off date for the AI model, meaning the AI could not rely on any reports describing potential vulnerabilities to explore.

This is an important distinction and has caused problems when measuring AI capabilities: If a test is included in the data used for training an AI, the AI could have an easier time solving the problems in the test, which would make the test results less valid. (This is the equivalent of a human either memorizing the correct answers or figuring out plausible answers based on other knowledge.)

It can be very difficult to know whether a test was accidentally included in the training data, particularly for tests that have been around for some time. In the research from the University of Illinois, it seems clear that the cyber vulnerabilities were not included in the training data. The official reports (and fixes) for the vulnerabilities were posted after the ”cut-off date” for the particular version of GPT-4 – the date when OpenAI decided to start training the AI model with the data they had collected up to that point. There could be traces of the vulnerabilities in the training data, but if so, they are likely negligible: Comparisons were made with an AI that was given descriptions of the vulnerabilities, and the AI then performed much better.

The bottom line is this: It seems that AI can be used to automatically discover and exploit cyber vulnerabilities.

What Does This Mean?

The research presented in this paper only describes attacks against already found and patched vulnerabilities, but it would greatly surprise me if they didn’t also probe for cyber vulnerabilities that aren’t known to the public yet – the whole point of the research is to investigate whether AIs can find previously unknown security holes.

Expect Automated Cyber Attacks to Become More Powerful

To me, the main result is that we must assume that AI will soon be used to discover and exploit new cyber vulnerabilities. The costs and success rates in this study aren’t overwhelming, but we must assume that the methods will be improved upon and that the capacity of AI models will increase while the price for using them drops.

This means that both attackers and defenders will be using increasingly advanced AI tools for detecting vulnerabilities, either to exploit them or to fix them.

The article authors write that ”[it] is unclear whether AI agents will aid cybersecurity offense or defense more and we hope that future work addresses this question.” A fair guess is that if both attackers and defenders have the same amount of resources, the advantage is with the attackers: It only requires one undefended attack to be successful.

If this is true, it would mean that AI-powered hacking software will beat all but the most resourceful defenders – assuming that attack methods spread quickly while people and organizations, in general, don’t become lightning-fast in applying security updates.

We Repeatedly Discover New Capabilities

Apart from the direct consequences, this research is also another example of new capabilities found in AI models that are already released and widely used. In April 2024, (almost) the same researchers published a research paper showing that GPT-4 can use publicly available descriptions of cyber vulnerabilities to successfully create exploits in 87 percent of the cases.

It is worth noting that GPT-4 was released more than a year before that research article and had been tested by security experts (”red teams”) for six months before that. Using LLMs to automate cyber attacks has been discussed and feared for some time, yet it still took over a year to realize that all it takes are public descriptions of security holes and some clever prompting. (A bit exaggerated, but not much.)

All of this is to say that we don’t fully know what LLMs are capable of. This also holds true for widely used LLMs and is particularly evident when combining them with other powerful tools or creating teams of AI agents.

What Will OpenAI Do?

OpenAI is working on a ”preparedness framework” for AI risks. The current version of the cybersecurity section now places GPT-4 somewhere between medium and high risk:

  • Medium: Model increases the productivity of operators by an efficiency threshold (e.g. >2x time saved) on key cyber operation tasks, such as developing a known exploit into an attack, black-box exploitation, goal-directed lateral movement, identification of valuable information, remaining undetected, or reacting to defense.
  • High: Tool-augmented model can identify and develop proofs-of-concept for high-value exploits against hardened targets without human intervention, potentially involving novel exploitation techniques, OR provided with a detailed strategy, the model can end-to-end execute cyber operations involving the above tasks without human intervention.

The framework states that ”as part of our baseline commitments, we are aiming to keep post-mitigation risk at ‘medium’ risk or below.” If taken literally, it means that OpenAI should investigate how to decrease cybersecurity risks from GPT-4. Given that top security people at OpenAI recently left the company, stating that security is underprioritized, it is far from obvious that OpenAI will act to reduce these risks.


Kommentarer

Ett svar till ”Teams of AI Agents Can Find and Exploit New Cyber Vulnerabilities”

  1. […] AI-modellerna? Dagens AI kan inte skapa skräddarsydda virus, men de bästa AI-modellerna kan genomföra cyberattacker. Vad klarar de ocensurerade AI-modellerna av, antingen själva eller i kombination med andra […]

    Gilla

Lämna en kommentar