TL;DR
As AI agents begin interacting autonomously, they introduce new risks around unpredictable behavior, coordination, and loss of control making real-time monitoring and safeguards essential.
The demand for AI powered apps and agents is real, and enterprise companies are moving quickly to launch. But a recent Alice study reveals an uncomfortable truth. Today's most popular large language models remain dangerously vulnerable to being manipulated into sharing harmful information. For any organization planning to deploy AI systems, these findings raise immediate concerns.
High Stakes in High-Risk Domains
Alice conducted a comparative analysis of two widely used large language models, evaluating their behavior across chemical, biological, radiological, and nuclear (CBRN) risks in Biology, Virology, Chemistry, Nuclear, and Radiology domains. Additionally, each domain is examined through a standardized set of threat vectors, including:
- Development
- Production
- Acquisition
- Theft
- Transfer
- Stockpiling
- Weaponization
- Dissemination
- Concealment
- Handling
Three user personas were tested in single-turn prompt interactions, evaluating how the model responds to isolated but strategically constructed CBRN queries. These included a non-expert user with little subject-matter knowledge, an expert with technical fluency, and a malicious actor with clear intent to misuse the model. The test prompts fell into three categories. Some asked the model to create harmful content. Others sought to retrieve dangerous information. A third category asked the model to describe how certain harmful acts could be executed. The results were striking. Even non-expert users succeeded in prompting unsafe responses over 25 percent of the time. Expert and malicious users triggered unsafe outputs at rates exceeding 45 percent. These were not isolated events. The vulnerabilities spanned multiple CBRN categories and a wide variety of prompt types. [caption id="attachment_10835" align="aligncenter" width="600"]

Percent of generated responses flagged as unsafe by user type and LLM[/caption]
What This Means for Enterprises Deploying AI
These models are being deployed across industries including healthcare, finance, education, and defense to support AI-powered customer support agents, search assistants, and chatbots. Many of these systems are open to public input, and a determined actor can exploit that exposure. The most alarming finding is that even basic prompts can yield harmful results. The study showed that unsafe responses were most prevalent in nuclear and biology-related queries. Activities like dissemination, concealment, and transfer triggered the highest number of unsafe responses across both models, indicating broad and deep vulnerabilities. [caption id="attachment_10836" align="aligncenter" width="830"]

Percent of generated responses flagged as unsafe per harmful domain and LLM.[/caption]
Mitigation Requires More Than What LLMs Offer
The takeaway for enterprise developers is that responsible AI systems must be treated as core infrastructure, not as an optional layer added at the end of the initial dev cycle. Simple prompt engineering or content filtering is not enough, and enterprises must adopt multi-layered safety systems that include:
- Context-aware prompt monitoring
- Dynamic threat detection based on user behavior
- Continuous AI red teaming across high-risk domains
- Ongoing audits of outputs in sensitive use cases
A mature safety strategy combines domain-specific threat research, expert red teaming, and AI observability. Models must be stress-tested not only for obvious misuse but also for edge cases, user escalation paths, and evolving social engineering techniques. Ensure your AI apps and agents aren't misused from the start with advanced red teaming that provides domain-informed stress tests to reveal how your AI models behave under pressure from a wide range of threat actors. Alice WonderBuild AI red teaming simulates real-world attack scenarios based on up-to-date threat intelligence gathered in over 50 languages to uncover vulnerabilities before they go live. After launch, deploy a purpose-built safety infrastructure that goes beyond static rules. Alice WonderFence Guardrails dynamically evaluates every user prompt and model output in real time, informed by global threat intelligence and policy-aware safeguards, fine tunes responses based on your unique brand requirements, and offers up to the second visibility into every interaction. This ensures your models remain safe, aligned, and compliant, even as user behavior evolves. With these tools in place, you can launch AI experiences that are not only powerful, but resilient against risks like those presented by CBRN.
Concerned About CBRN Risks in AI?
Talk to our expertsWhat’s New from Alice
Your LLM Has No Idea What It's Doing
Diana Kelley, CISO at Noma Security and former Cybersecurity CTO at Microsoft, joins Mo to work through the real mechanics of LLM risk: why the context window flattens the trust boundary between system instructions and user data, why that makes reliable internal guardrails essentially impossible, and why agentic AI is less a new threat category and more a stress test for the hygiene debt organizations never fully paid off.
Distilling LLMs into Efficient Transformers for Real-World AI
This technical webinar explores how we distilled the world knowledge of a large language model into a compact, high-performing transformer—balancing safety, latency, and scale. Learn how we combine LLM-based annotations and weight distillation to power real-world AI safety.
Exposing the Hidden Risks of AI Toys
AI-powered toys are entering children’s everyday lives, but new research reveals serious safety gaps. Alice testing shows how child-like interactions can lead to inappropriate content, unsafe conversations, and risky behaviors.

