Beyond Code Exploits: Red Teaming the New AI Attack Surface

As a security professional working on AI Red Teaming at Scale, I've run live red teaming exercises with members of Congress, the UK Parliament, and NATO. Here's what I've learned:

Your next incident won't be a zero-day, but a risk you didn't test for.

The more we deploy AI, the faster we amplify failure. That’s what happens when you put unpredictable systems into production without treating them as an attack surface.

In traditional cybersecurity, we’re trained to hunt for CVEs (documented software bugs), patch them, and trace exploit chains. It's predictable. If you only watch for software bugs, you're blind to how AI actually fails. This is why AI risk is a company-wide problem, requiring security, ML, product, and leadership to work together.

AI incidents show up as misuse, prompt steering, data leakage, or model drift.

These experiences taught me that AI security comes down to four fundamentals:

Treat model behavior as an attack surface, not just the code.
Map threats to a risk matrix.
Score them with a standard like OWASP AI Vulnerability Scoring System (AIVSS), so everyone speaks the same language on severity.
Measure residual risk with evaluations that mimic real attacker workflows. You have to test it like an enemy, not an engineer.

This approach moves you from reactive triage to proactive incident prevention.

AI Breaks the Old Security Model

Cybersecurity taught us to hunt for buffer overflows and exploit chains. AI failures are messier. They come from misuse, unexpected drift, and emergent behaviors that require fundamentally different fixes.

Nation-state campaigns already reflect this shift. Threat actors seed narratives, manipulate supply chains, and weaponize data so models learn dangerous patterns. Those are systemic manipulations of inputs, outputs, and training pipelines, not code exploits.

Real examples are blunt and uncomfortable. A grocery chatbot suggested mixing bleach and ammonia. A language model once returned a direct death plea to a user. A music generator produced detailed instructions for a weaponized incendiary. None of those required exploiting software; they were failures of intent handling, alignment, and guardrails.

Mapping Threats

The problem is that existing security frameworks weren't built for these kinds of failures, so I put together a new AI Risk Matrix that is mapped along three axes:

As we move from tools to agents, collectives, where AI units coordinate with goals of their own, are far closer than many expect.

Layered on top of the three axes above are six amplifiers of risk:

Risk from Adversaries
Risk from Unforced Errors
Risk from Emergent Behavior
Risk from Misaligned Goals
Risk from Dependencies
Risk from Societal Impact

This framing forces you to treat model behavior as an evolving attack surface, not an engineering curiosity. Once a threat landscape is viewed through this matrix, new operational processes must follow to address these findings.

Build an Enterprise Playbook

The enterprise playbook borrows from DAST (Dynamic Application Security Testing) and SAST (Static Application Security Testing), but adapts those lessons for AI failure modes. Adversarial AI red teaming is a required piece of that playbook. It is no longer optional if you want to understand how systems behave under real-world pressure.

Test the full attack surface: Attack the model the way an adversary would, using prompt injections, context manipulation, dialog hijacking, fictionalization, and jailbreaks.
Score vulnerabilities with a standard: Use a framework like OWASP AIVSS to compare risks across models and vendors.
Track residual risk continuously: Red teaming is not a one-time audit, but an ongoing process that monitors the threats that remain after defenses are in place.

This means we must establish recurring adversarial exercises: monthly for high-risk systems, quarterly for others. Automate baseline attack patterns while rotating in human red teamers to discover novel failure modes. Set up alerting when model outputs drift beyond acceptable risk thresholds, and maintain incident playbooks specifically for AI failures.

At Scale, we built this into the Discovery platform and ran live red teaming events with policymakers and enterprise teams. Those exercises made abstract threats concrete and showed how enterprises need to prepare.

Measure Residual Risk

Adversarial red teaming is the new penetration test. It measures how a model behaves under realistic adversarial pressure, not just accuracy on a benchmark. Unlike static evaluations, good red teaming adapts as the system changes and attackers iterate.

Residual risk is not a single number. It is a curve you must track over time as models and threat tactics evolve. Dashboards should show risk trending up or down, just like vulnerability counts in traditional cyber. More importantly, red team findings and AI threat intelligence must feed directly into retraining, fine-tuning, governance, and incident playbooks. AI red teaming must be a continuous control, not a one-off audit.

We need to run adversarial exercises that blend human creativity with automated attack suites. Measure how quickly defenses respond to new failure modes. Residual risk only falls when learnings flow back into the model lifecycle. Think of it as proactive incident response; run before an incident ever happens.

Getting Ahead of the Threat

Enterprise AI adoption is outpacing defenses, and shadow AI is proliferating. Regulators will push traceability and risk reporting in the next few years. Autonomous, coordinated model behaviors will expand the threat surface. If enterprises do not act now, trust in internal tools and public-facing AI systems will erode.

Adversarial red teaming is how we prepare. Stress-test these systems now so we are not blindsided later. The stakes are huge. Here’s how:

Treat AI security as a living discipline and test it continuously.
Start small with internal red teaming exercises and scale them.
Align on standards like OWASP AIVSS to make risks comparable and actionable.
Partner with vendors, peers, and government on evaluation frameworks.
Make red team findings part of the model lifecycle: detection, mitigation, retraining, verification.

We cannot fight tomorrow’s battles with yesterday’s playbook. Threats now scale at machine speed, and attackers are already ahead. In a mostly opaque system where we can only test inputs and outputs, this proactive defense is our most critical advantage.

The stakes extend far beyond individual companies. AI systems are becoming critical infrastructure for healthcare, finance, and public services. When these systems fail–as all systems are prone to failure–the impact will cascade through entire sectors. That is why action is imperative.

Push your systems to the limit before your adversary, or the AI itself, does it for you.

As a security professional working on AI Red Teaming at Scale, I've run live red teaming exercises with members of Congress, the UK Parliament, and NATO. Here's what I've learned:

Your next incident won't be a zero-day, but a risk you didn't test for.

The more we deploy AI, the faster we amplify failure. That’s what happens when you put unpredictable systems into production without treating them as an attack surface.

AI incidents show up as misuse, prompt steering, data leakage, or model drift.

These experiences taught me that AI security comes down to four fundamentals:

Treat model behavior as an attack surface, not just the code.
Map threats to a risk matrix.
Score them with a standard like OWASP AI Vulnerability Scoring System (AIVSS), so everyone speaks the same language on severity.
Measure residual risk with evaluations that mimic real attacker workflows. You have to test it like an enemy, not an engineer.

This approach moves you from reactive triage to proactive incident prevention.

AI Breaks the Old Security Model

Mapping Threats

The problem is that existing security frameworks weren't built for these kinds of failures, so I put together a new AI Risk Matrix that is mapped along three axes:

As we move from tools to agents, collectives, where AI units coordinate with goals of their own, are far closer than many expect.

Layered on top of the three axes above are six amplifiers of risk:

Risk from Adversaries
Risk from Unforced Errors
Risk from Emergent Behavior
Risk from Misaligned Goals
Risk from Dependencies
Risk from Societal Impact

Build an Enterprise Playbook

Test the full attack surface: Attack the model the way an adversary would, using prompt injections, context manipulation, dialog hijacking, fictionalization, and jailbreaks.
Score vulnerabilities with a standard: Use a framework like OWASP AIVSS to compare risks across models and vendors.
Track residual risk continuously: Red teaming is not a one-time audit, but an ongoing process that monitors the threats that remain after defenses are in place.

Measure Residual Risk

Getting Ahead of the Threat

Adversarial red teaming is how we prepare. Stress-test these systems now so we are not blindsided later. The stakes are huge. Here’s how:

Treat AI security as a living discipline and test it continuously.
Start small with internal red teaming exercises and scale them.
Align on standards like OWASP AIVSS to make risks comparable and actionable.
Partner with vendors, peers, and government on evaluation frameworks.
Make red team findings part of the model lifecycle: detection, mitigation, retraining, verification.

Push your systems to the limit before your adversary, or the AI itself, does it for you.

Beyond Code Exploits: Red Teaming the New AI Attack Surface

AI Breaks the Old Security Model

Mapping Threats

Build an Enterprise Playbook

Measure Residual Risk

Getting Ahead of the Threat

The future of your industry starts here

Beyond Code Exploits: Red Teaming the New AI Attack Surface

AI Breaks the Old Security Model

Mapping Threats

Build an Enterprise Playbook

Measure Residual Risk

Getting Ahead of the Threat

The future of your industry starts here