When AI starts hacking AI

DARPA’s AI Cyber Challenge

At DEF CON 2025, the DARPA AI Cyber Challenge put autonomous AI-driven systems head-to-head in a capture-the-flag–style contest. Contenders had to find and patch vulnerabilities faster than human experts ever could.

The results were striking: AI-based security tools demonstrated an ability to uncover and remediate software flaws at unprecedented speed. The top seven semifinalists discovered 77% of the vulnerabilities presented in the final scoring round and patched 61% of those synthetic defects at an average speed of 45 minutes across 54 million lines of code.

Imagine what this means for enterprise software. Speed: vulnerabilities closed before an attacker can weaponize them. Scale: thousands of systems patched without a war room scramble. Learning: every fix makes the AI better at the next one.

Sounds like a dream for CISOs? Until you remember that attackers are watching too. If the defender is an autonomous AI, then compromising the defender becomes the new prize. And unlike a human, an AI that’s tricked can deploy the wrong “fix” at machine speed. We’re stepping into an era where the security race won’t be human vs human.

It will be hacker’s AI vs defender’s AI.

Conclusions

Defence and offence are co-evolving. DARPA’s experiment shows that AI can defend at scale, but it also underscores that attackers are adapting these same methods. The balance of power is dynamic.

The shift toward “AI vs. AI” in cybersecurity is not theoretical; it’s already here. Google is rolling out layered defences: filtering hidden prompts, requiring explicit confirmations for risky actions, and tightening controls around sensitive workflows. These are necessary first steps, but will not be sufficient.

The real challenge lies in designing resilient AI ecosystems where assistants, orchestration tools, and data pipelines are hardened against manipulation. It’s not enough to make AI useful. It must also be trustworthy, even in the face of adversarial AI.

In the end, Google’s warning (see our blog), ESET’s discovery (see our blog) and DARPA’s challenge point to the same conclusion: the future battlefield is not just human versus human hackers. It is machine versus machine, with the trustworthiness of our digital infrastructure hanging in the balance.

Leave a comment