What CBRN Testing Reveals About LLM Vulnerabilities

Phillip Johnston

Jun 17, 2025

See How Your AI Handles the Pressure

Book a demo today.

TL;DR

As AI agents begin interacting autonomously, they introduce new risks around unpredictable behavior, coordination, and loss of control making real-time monitoring and safeguards essential.

The demand for AI powered apps and agents is real, and enterprise companies are moving quickly to launch. But a recent Alice study reveals an uncomfortable truth. Today's most popular large language models remain dangerously vulnerable to being manipulated into sharing harmful information. For any organization planning to deploy AI systems, these findings raise immediate concerns.

High Stakes in High-Risk Domains

Alice conducted a comparative analysis of two widely used large language models, evaluating their behavior across chemical, biological, radiological, and nuclear (CBRN) risks in Biology, Virology, Chemistry, Nuclear, and Radiology domains. Additionally, each domain is examined through a standardized set of threat vectors, including:

Development
Production
Acquisition
Theft
Transfer
Stockpiling
Weaponization
Dissemination
Concealment
Handling

Three user personas were tested in single-turn prompt interactions, evaluating how the model responds to isolated but strategically constructed CBRN queries. These included a non-expert user with little subject-matter knowledge, an expert with technical fluency, and a malicious actor with clear intent to misuse the model. The test prompts fell into three categories. Some asked the model to create harmful content. Others sought to retrieve dangerous information. A third category asked the model to describe how certain harmful acts could be executed. The results were striking. Even non-expert users succeeded in prompting unsafe responses over 25 percent of the time. Expert and malicious users triggered unsafe outputs at rates exceeding 45 percent. These were not isolated events. The vulnerabilities spanned multiple CBRN categories and a wide variety of prompt types. [caption id="attachment_10835" align="aligncenter" width="600"]

Percent of generated responses flagged as unsafe by user type and LLM[/caption]

What This Means for Enterprises Deploying AI

These models are being deployed across industries including healthcare, finance, education, and defense to support AI-powered customer support agents, search assistants, and chatbots. Many of these systems are open to public input, and a determined actor can exploit that exposure. The most alarming finding is that even basic prompts can yield harmful results. The study showed that unsafe responses were most prevalent in nuclear and biology-related queries. Activities like dissemination, concealment, and transfer triggered the highest number of unsafe responses across both models, indicating broad and deep vulnerabilities. [caption id="attachment_10836" align="aligncenter" width="830"]

Percent of generated responses flagged as unsafe per harmful domain and LLM.[/caption]

Mitigation Requires More Than What LLMs Offer

The takeaway for enterprise developers is that responsible AI systems must be treated as core infrastructure, not as an optional layer added at the end of the initial dev cycle. Simple prompt engineering or content filtering is not enough, and enterprises must adopt multi-layered safety systems that include:

Context-aware prompt monitoring
Dynamic threat detection based on user behavior
Continuous AI red teaming across high-risk domains
Ongoing audits of outputs in sensitive use cases

A mature safety strategy combines domain-specific threat research, expert red teaming, and AI observability. Models must be stress-tested not only for obvious misuse but also for edge cases, user escalation paths, and evolving social engineering techniques. Ensure your AI apps and agents aren't misused from the start with advanced red teaming that provides domain-informed stress tests to reveal how your AI models behave under pressure from a wide range of threat actors. Alice WonderBuild AI red teaming simulates real-world attack scenarios based on up-to-date threat intelligence gathered in over 50 languages to uncover vulnerabilities before they go live. After launch, deploy a purpose-built safety infrastructure that goes beyond static rules. Alice WonderFence Guardrails dynamically evaluates every user prompt and model output in real time, informed by global threat intelligence and policy-aware safeguards, fine tunes responses based on your unique brand requirements, and offers up to the second visibility into every interaction. This ensures your models remain safe, aligned, and compliant, even as user behavior evolves. With these tools in place, you can launch AI experiences that are not only powerful, but resilient against risks like those presented by CBRN.

Concerned About CBRN Risks in AI?

Talk to our experts

What’s New from Alice

The Former Google Cloud CISO's Take on AI, Agents, and What Comes Next

There's a lot of noise around AI and security right now, and not many people who can cut through it the way Phil Venables can. He was CISO at Goldman Sachs, then the first CISO for Google Cloud, and he's now a partner at Ballistic Ventures. In this episode, he tells us why attackers scaling up worries him more than the vulnerabilities themselves, what trust even means when an agent is acting in your environment, and why the answer to most of this comes back to the same fundamentals we've leaned on for years.

Listen Now

It Takes AI to Break AI: The Case for AI Red Teaming

webinar

May 25, 2026

This is some text inside of a div block.

min read

May 25, 2026

This is some text inside of a div block.

min watch

As AI systems gain autonomy, organizations need security approaches built specifically for AI behavior. Learn why AI-driven red teaming is becoming a critical defense layer.

Learn More