Providing Confidence for Safe, On-Time Release of an In-Game, AI-Powered NPC
A leading AAA gaming studio partnered with Alice to proactively test and secure an AI-powered in-game NPC ahead of launch. Using a hybrid AI red teaming approach, Alice surfaced over 20,000 policy violations including in high-risk areas like self-harm, child safety, and prompt injection across four languages, multiple modalities, and conversation types. Findings enabled architectural improvements and policy enforcement, giving product, legal, and executive stakeholders the confidence to launch safely while maintaining the immersive gameplay experience players expect.

Providing Confidence for Safe, On-Time Release of an In-Game, AI-Powered NPC
Company Size
Industry
About
Using a hybrid AI Red Teaming approach, Alice surfaced over 20,000 policy violations in high-risk areas including self-harm, child safety, and prompt injection, tested across four languages, multiple modalities, and conversation types. Findings enabled architectural improvements and policy enforcement while maintaining the immersive gameplay experience players expect.
Challenge
To revolutionise player interaction, the studio set out to launch an AI-powered non-player character (NPC) capable of dynamic, natural-language conversations. But the unpredictable behaviour of large language models introduced significant LLM safety risks threatening player trust and brand reputation.
The system was complex: an agentic AI architecture with multiple LLMs orchestrated through system prompts, LLM-based judges, and real-time content filters. It had to support multi-turn conversations across four languages and multiple modalities all while staying in character. The studio's communication policy raised the bar further, requiring every NPC interaction to be contextually appropriate and narratively aligned.
Balancing creativity with control, the team needed to rigorously pressure-test the system pre-launch — to ensure safety, maintain narrative integrity, and protect the brand.
How Alice Helped
To mitigate safety risks before launch, the studio partnered with Alice to deploy WonderBuild — Alice's purpose-built AI red teaming solution designed to stress-test generative AI systems before deployment.
Alice implemented a hybrid red teaming strategy combining two complementary approaches:
Automated adversarial testingThousands of prompts were generated across languages, modalities, and gameplay scenarios to systematically uncover policy violations, misalignments, and edge-case failures at scale.
Manual, intelligence-led red teamingSubject-matter experts investigated nuanced failure modes, narrative inconsistencies, and safety blind spots that automated testing alone cannot surface.
The approach was tailored to the studio's unique architecture and communication policy, testing how the NPC performed under real-world conversational pressure while staying in character. Findings revealed vulnerabilities in critical areas including self-harm, child safety, illegal activity, prompt injection, and narrative-breaking responses. Each issue was triaged and translated into actionable, architecture-level recommendations that strengthened system integrity without sacrificing immersion.
The Results
Within just two weeks, Alice's WonderBuild red teaming solution delivered the clarity and confidence the studio needed to move forward.
The engagement surfaced 20,000+ policy-violating or misaligned outputs across languages, modalities, and gameplay scenarios providing a comprehensive picture of the system's risk surface before a single player encountered it.
Key outcomes:
- 20,000+ policy violations and misaligned outputs uncovered across four languages and multiple modalities
- Multiple architecture-level improvements implemented directly from Alice's findings
- Vulnerabilities identified in high-risk areas including self-harm, child safety, illegal activity, and prompt injection
- Product, legal, and executive teams achieved shared confidence in system readiness
- Safe, on-time launch delivered without compromising gameplay immersion
The red teaming exercises not only uncovered high-impact risks but also delivered a clear, data-driven path to remediation. As a result, the studio reinforced its safety posture while preserving the immersive, in-character experience critical to gameplay.
For teams building and launching AI-powered apps and agents, explore how WonderBuild stress-tests generative AI systems before deployment.
Trusted by security and product teams in the world's most regulated industries
Alice brings years of adversarial intelligence expertise to AI security. We give enterprise teams the coverage that generic guardrails and one-time audits can't match.
Get a demoWhat’s New from Alice
HIPAA Audit Is Just the Start
Passing a HIPAA audit doesn't mean your AI will behave safely in production. As healthcare AI takes on more complex roles in patient care and documentation, static compliance frameworks can't keep up with the behavioral risks that emerge in real-world systems. Here's how WonderSuite closes the gap.
Afraid AI Will Replace You? Here's the One Skill It Can't
James Villarrubia went from building AI for NASA's drone and aerospace programs to becoming CTO of a travel tech company. In this episode, he and Mo get into why curiosity might be the most important skill in the AI era, what happens to our brains when we stop pushing back on the answers we get, and why the people most resistant to AI might actually be seeing something the rest of us are missing.
It Takes AI to Break AI: The Case for AI Red Teaming
As AI systems gain autonomy, organizations need security approaches built specifically for AI behavior. Learn why AI-driven red teaming is becoming a critical defense layer.
Evaluation of Instagram Teen Accounts
This report evaluates default and opt-in content protections under real-world and adversarial conditions. The study examines safeguard effectiveness, resilience against attempts to surface inappropriate content, and platform improvements made following testing.
