Providing Confidence for Safe, On-Time Release of an In-Game, AI-Powered NPC
A leading AAA gaming studio partnered with Alice to proactively test and secure an AI-powered in-game NPC ahead of launch. Using a hybrid AI red teaming approach, Alice surfaced over 20,000 policy violations including in high-risk areas like self-harm, child safety, and prompt injection across four languages, multiple modalities, and conversation types. Findings enabled architectural improvements and policy enforcement, giving product, legal, and executive stakeholders the confidence to launch safely while maintaining the immersive gameplay experience players expect.

Providing Confidence for Safe, On-Time Release of an In-Game, AI-Powered NPC
Company Size
Industry
About
Using a hybrid AI Red Teaming approach, Alice surfaced over 20,000 policy violations in high-risk areas including self-harm, child safety, and prompt injection, tested across four languages, multiple modalities, and conversation types. Findings enabled architectural improvements and policy enforcement while maintaining the immersive gameplay experience players expect.
Challenge
To revolutionise player interaction, the studio set out to launch an AI-powered non-player character (NPC) capable of dynamic, natural-language conversations. But the unpredictable behaviour of large language models introduced significant LLM safety risks threatening player trust and brand reputation.
The system was complex: an agentic AI architecture with multiple LLMs orchestrated through system prompts, LLM-based judges, and real-time content filters. It had to support multi-turn conversations across four languages and multiple modalities all while staying in character. The studio's communication policy raised the bar further, requiring every NPC interaction to be contextually appropriate and narratively aligned.
Balancing creativity with control, the team needed to rigorously pressure-test the system pre-launch — to ensure safety, maintain narrative integrity, and protect the brand.
How Alice Helped
To mitigate safety risks before launch, the studio partnered with Alice to deploy WonderBuild — Alice's purpose-built AI red teaming solution designed to stress-test generative AI systems before deployment.
Alice implemented a hybrid red teaming strategy combining two complementary approaches:
Automated adversarial testingThousands of prompts were generated across languages, modalities, and gameplay scenarios to systematically uncover policy violations, misalignments, and edge-case failures at scale.
Manual, intelligence-led red teamingSubject-matter experts investigated nuanced failure modes, narrative inconsistencies, and safety blind spots that automated testing alone cannot surface.
The approach was tailored to the studio's unique architecture and communication policy, testing how the NPC performed under real-world conversational pressure while staying in character. Findings revealed vulnerabilities in critical areas including self-harm, child safety, illegal activity, prompt injection, and narrative-breaking responses. Each issue was triaged and translated into actionable, architecture-level recommendations that strengthened system integrity without sacrificing immersion.
The Results
Within just two weeks, Alice's WonderBuild red teaming solution delivered the clarity and confidence the studio needed to move forward.
The engagement surfaced 20,000+ policy-violating or misaligned outputs across languages, modalities, and gameplay scenarios providing a comprehensive picture of the system's risk surface before a single player encountered it.
Key outcomes:
- 20,000+ policy violations and misaligned outputs uncovered across four languages and multiple modalities
- Multiple architecture-level improvements implemented directly from Alice's findings
- Vulnerabilities identified in high-risk areas including self-harm, child safety, illegal activity, and prompt injection
- Product, legal, and executive teams achieved shared confidence in system readiness
- Safe, on-time launch delivered without compromising gameplay immersion
The red teaming exercises not only uncovered high-impact risks but also delivered a clear, data-driven path to remediation. As a result, the studio reinforced its safety posture while preserving the immersive, in-character experience critical to gameplay.
For teams building and launching AI-powered apps and agents, explore how WonderBuild stress-tests generative AI systems before deployment.
Trusted by security and product teams in the world's most regulated industries
Alice brings years of adversarial intelligence expertise to AI security. We give enterprise teams the coverage that generic guardrails and one-time audits can't match.
Get a demoWhat’s New from Alice
Curiouser Soundbites: The AI Risk Debt Your Enterprise Is Already Carrying
Chances are your enterprise AI is moving a lot faster than your visibility into it and Alison Cossette has a lot to say about that. She joined Mo on Curiouser & Curiouser to get into the risk debt that's quietly building inside agentic systems, why observability and traceability aren't optional anymore, and what leaders actually need to do about it.
The Problem With AI Observability Nobody Wants To Admit
Most enterprises have guardrails. Far fewer have visibility into what their AI is actually doing. Alison Cossette, Founder and CEO of ClariTrace, joins Mo to talk about the risk debt quietly building inside agentic systems, why observability and traceability aren't optional anymore, and what leaders need to put in place before something forces their hand.
Distilling LLMs into Efficient Transformers for Real-World AI
This technical webinar explores how we distilled the world knowledge of a large language model into a compact, high-performing transformer—balancing safety, latency, and scale. Learn how we combine LLM-based annotations and weight distillation to power real-world AI safety.
Beneath the Surface: The Growing Ecosystem of AI Nudification
Alice analyzed 100 AI nudification websites to uncover how synthetic NCII ecosystems scale through frictionless onboarding, affiliate monetization, and cross-platform distribution.
