ActiveFence is now Alice

Blog

Why Most GenAI Red Teaming Misses the Real Threats

Phillip Johnston

Mar 6, 2025

TL;DR

Effective GenAI red teaming depends on deep threat expertise. Without understanding real-world adversaries, AI testing misses the subtle vulnerabilities that matter most. Combining ML knowledge with threat intelligence built from actual attack data, not simulated scenarios, is essential to secure GenAI systems against the threats evolving right now.

As the use of Generative AI models continues to expand across enterprise systems and daily applications, new risks are introduced that must be rigorously tested and mitigated. Red teaming has become the critical mechanism for doing this. The problem is that most organizations are running red team exercises that look thorough on paper while missing the attacks that will actually hit them in production.

The gap is not tooling. It is threat expertise

What is GenAI Red Teaming?

GenAI red teaming involves stress-testing AI models by simulating adversarial attacks and uncovering vulnerabilities. Red teaming has been used for decades by groups of ethical hackers focused on uncovering software security flaws, red teaming for AI delves into model-specific risks such as prompt injection, training data poisoning, adversarial attacks, and hallucination exploitation.

Given the unique nature of AI safety and security, effective red teaming requires a multidisciplinary approach that blends machine learning(ML) knowledge with threat expertise. Threat actors continuously adapt their methods, and an AI red team must be even more agile, anticipating and neutralizing these risks before they become real-world threats.

Why Threat Expertise is Essential

While AI developers and engineers understand the inner workings of GenAI models, they often lack the adversarial mindset necessary to predict how real-world attackers might exploit vulnerabilities. Threat expertise is a foundation of GenAI red teaming that consists of several key pillars:

1. Understanding Adversarial Tactics

Threat actors range from script kiddies experimenting with public AI models to sophisticated nation-state hackers exploiting AI for disinformation and cyberwarfare. A red team with deep threat intelligence expertise understands the motives, techniques, and tactics used by these adversaries. This allows them to design more realistic and comprehensive attack simulations that reflect real-world threats.

2. Recognizing Lesser-Known AI Vulnerabilities

AI systems are prone to subtle, emergent vulnerabilities that can be exploited in unexpected ways. For instance, an AI chatbot designed for customer service may inadvertently leak sensitive company data when manipulated through carefully crafted prompts. Without expertise in social engineering and cyber threats, such vulnerabilities might go unnoticed during standard AI testing.

3. Enhancing Threat Modeling

Traditional security models often fail to account for AI-specific risks. Threat expertise enables red teams to create more effective threat models tailored to GenAI systems. By analyzing attack surfaces such as training data integrity, model responses, and adversarial prompt injection, red teams can better predict and mitigate potential exploits.

4. Simulating Real-World Attack Scenarios

A generic AI safety test looks for basic failure modes. A red team operating from real-world threat intelligence constructs scenarios that mirror active attacker behavior, including multi-turn jailbreaks, role-play-based guardrail bypasses, and rhyme-driven prompt manipulation that generic safety filters consistently miss. The difference between these two approaches is the difference between finding vulnerabilities before deployment and discovering them after an incident.

5. Adapting to Emerging AI Threats

Threat landscapes evolve rapidly. From disinformation campaigns to AI-generated phishing emails, new risks emerge constantly. Red teams with deep threat expertise stay ahead of these developments by embedding themselves into the threat landscape and leveraging the latest intelligence on how attackers are exploiting AI in the wild. This proactive approach ensures that AI safety and security measures remain robust against evolving threats.

‍What's Changed in 2025–2026: The Agentic AI Red Teaming Gap

Red teaming methodology that was adequate for standalone LLMs is no longer sufficient for the AI systems being deployed today. The shift to agentic architectures where AI models take autonomous actions, call external tools, spawn sub-agents, and execute multi-step workflows has introduced an entirely new class of vulnerability that most existing red team playbooks do not cover.

The OWASP Agentic AI Top 10 (published 2026) formalises ten risk categories specific to these systems. Three are particularly relevant to red teaming practitioners:

Agent Goal Hijack. An attacker embeds instructions in content the agent is asked to process an email, a retrieved document, a webpage that redirect the agent's objectives without any visible prompt manipulation. Because the agent is acting on behalf of a user with real system permissions, a successful goal hijack can result in data exfiltration, unauthorised transactions, or cascading downstream actions. Traditional prompt injection testing does not account for the indirect, multi-hop paths through which these attacks arrive.‍

Identity and Privilege Abuse. Agents frequently operate with elevated permissions to perform legitimate tasks. Red teaming must now include scenarios where an attacker attempts to impersonate a trusted sub-agent, escalate privileges through a chain of tool calls, or extract credentials stored in the agent's context window. These are not theoretical risks, they have been demonstrated in live agentic deployments across enterprise environments.

Cascading Failures. In multi-agent pipelines, a single compromised node can propagate errors or malicious instructions downstream. A red team that tests only individual agents in isolation will miss the systemic failure modes that emerge when agents interact. Effective red teaming for agentic AI requires end-to-end pipeline testing under adversarial conditions, not unit-level safety evaluations.

Standard red team exercises conducted against a chat interface will not surface these risks. Agentic AI red teaming requires access to the full system architecture; tool definitions, memory configurations, inter-agent communication protocols — and adversaries who understand how those components interact under stress.

The Challenges of Building a Threat-Savvy Red Team

Despite the clear need for threat expertise in GenAI red teaming, building a team with the right blend of skills is challenging. Some of the main hurdles include:

Talent Shortage: Professionals with both GenAI and adversarial exposure are rare. Finding and onboarding individuals with these skill sets requires significant investment. Training a new team to acquire the necessary expertise would be a prolonged and resource-intensive effort, leaving organizations struggling to match the speed at which threat actors continuously refine their tactics.
Constantly Shifting Attack Vectors: AI fields are fast moving, and AI security is no exception. Red teams must continuously update their knowledge and techniques to stay ahead of attackers. Add to this the non-deterministic nature of GenAI, which can produce different responses to the same prompt, and ensuring safe outcomes becomes a more difficult challenge that demands adaptive strategies and rigorous evaluation.
Lack of Standardized AI Security Frameworks: Unlike traditional cybersecurity, AI security lacks universally accepted frameworks, making red teaming approaches more variable and experimental. OWASP's LLM Top 10, NIST AI RMF, and the EU AI Act each approach risk differently, and red teaming methodology must be adapted to meet each standard consistently.

Best Practices for Integrating Threat Expertise in GenAI Red Teaming

To maximize the effectiveness of red teaming in AI security, organizations should consider the following best practices:

Recruit from Diverse Backgrounds: Build a red team that includes AI researchers, ethical hackers, and threat intelligence analysts to ensure a well-rounded perspective.
Leverage Real-World Experience and Abuse Intelligence: Continuously monitor the threat landscape and check in on AI-related misuse reports to inform red team strategies.
Use Adversarial Machine Learning Techniques: Incorporate methods such as evasion attacks, model inversion, and data poisoning to test AI defenses comprehensively.
Employ manual and automated processes: Use a combination of human expertise and automated tools to more quickly identify vulnerabilities, evaluate AI behavior, and ensure comprehensive safety assessments.
Simulate Sophisticated Attackers: Conduct exercises that mimic well-resourced adversaries, such as state-sponsored hackers or cybercriminal organizations.
Develop AI-Specific Security Frameworks: Standardize security assessments to ensure consistent and repeatable red teaming practices.
Invest in Continuous Training: Provide ongoing education for red team members to stay ahead of emerging threats and trending AI misuses.

‍

Why Third-Party Expertise is Crucial for GenAI Red Teaming

While some AI developers may consider building an in-house red team, outsourcing to a third-party expert such as Alice offers distinct advantages. First, third-party red teams bring an objective and unbiased perspective, free from internal assumptions that may overlook critical vulnerabilities. Their external positioning allows them to think like real-world adversaries, ensuring more comprehensive threat assessments.

Alice's red teaming capability, delivered through WonderBuild, is built on the Rabbit Hole adversarial intelligence engine. Rabbit Hole was developed from a decade of real-world attack data across billions of users. This is not simulated threat intelligence. It is the actual attack patterns, manipulation techniques, and adversarial behaviors that have been used against live AI systems at scale.

Category	In-House Red Team	Alice WonderBuild
Threat intelligence source	Internal knowledge + public datasets	Decade of real-world attack data across billions of users
Coverage of agentic AI risks	Limited — most teams lack agentic architecture exposure	Full pipeline testing including OWASP Agentic Top 10

Related Reading

Additionally, building and maintaining an in-house red team requires significant time, talent, and financial resources. Given the current talent shortage in AI security, hiring the right mix of AI researchers, threat landscape analysts, cybersecurity specialists, and ethical hackers can be costly.

By leveraging Alice's WonderBuild for AI red teaming, AI developers and enterprises developing AI agents and tools can ensure that their GenAI systems receive rigorous, up-to-date security evaluations. This allows internal teams to focus on innovation while mitigating potential threats.

Talk to an expert to discover how Alice can help safeguard your AI.

What’s New from Alice

The Former Google Cloud CISO's Take on AI, Agents, and What Comes Next

There's a lot of noise around AI and security right now, and not many people who can cut through it the way Phil Venables can. He was CISO at Goldman Sachs, then the first CISO for Google Cloud, and he's now a partner at Ballistic Ventures. In this episode, he tells us why attackers scaling up worries him more than the vulnerabilities themselves, what trust even means when an agent is acting in your environment, and why the answer to most of this comes back to the same fundamentals we've leaned on for years.

Listen Now

It Takes AI to Break AI: The Case for AI Red Teaming

webinar

May 25, 2026

This is some text inside of a div block.

min read

May 25, 2026

This is some text inside of a div block.

min watch

As AI systems gain autonomy, organizations need security approaches built specifically for AI behavior. Learn why AI-driven red teaming is becoming a critical defense layer.

Learn More