TL;DR
WonderFence can now train a personalized per-policy guardrail, using real-world adversarial data collected from years of protecting the world's biggest tech platforms. Deployed in WonderFence at sub-99ms latency, giving you guardrails shaped by your policy and not the other way around.
Generic guardrails can prevent some unwanted agent behavior in client interactions, but data leaks, compliance violations, and off-brand responses still slip through. That happens because out-of-the-box guardrails fall short on the two factors that determine effectiveness: the quality of their training data, and how closely their policies match your needs.
WonderFence fills those gaps by training a personalized guardrail for each policy, using real-world adversarial data Alice has collected across industries, languages, and threat actor communities. This way, guardrails are shaped by your policies, instead of policies being shaped by the available guardrails.

Guardrails Trained on Real Threats for Real Policies
WonderFence allows you to train policy-specific guardrails within hours. Just configure your policy definition and upload a few labeled examples of wanted and unwanted behavior. We will take it from there, training the detector models inside your guardrails on real-world adversarial examples from Rabbit Hole, our collection of adversarial data curated from years of protecting the biggest tech platforms in the world. This ensures accurate policy detection without overreach or false positives.
For example, a retailer might define a policy that agents never knock competitors. Upload a few examples of on-brand and off-brand responses, and WonderFence produces a detector tuned to that exact line.

Continuous Protection for Reliable Behavior
The detectors trained by WonderFence are then deployed alongside your predefined WonderFence guardrails. These evaluate every step agents take. In block mode, malicious prompts are intercepted, internal actions are monitored, and inappropriate outputs are replaced with predefined responses in line with your policies. In detect-only mode, the same policies flag interactions for internal review without interrupting users, useful for monitoring, testing, and gradual rollout.
Policies can run on incoming prompts, outgoing responses, or both, and apply to one agent or many. All this at sub-100ms latency, so user experience stays intact and you get confidence in agents you can count on with your customers.
For instance, a customer-facing banking agent demands strict controls on financial guidance and layered regulatory compliance. A scheduling assistant, by contrast, has a far narrower scope but still needs to handle personal information correctly. WonderFence lets you set different policies for each agent and each stage of the interaction.
Ship With Confidence
WonderFence complements your existing guardrails with another layer of personalized protection. With fit-to-policy guardrails trained on real-world adversarial data collected during years of protecting the world's biggest tech platforms. WonderFence joins WonderSuite alongside automated red-teaming, continuous preset guardrails, and more to give you the confidence to deploy consumer-facing AI.
Book a Demo
Learn moreWhat’s New from Alice
HIPAA Audit Is Just the Start
Passing a HIPAA audit doesn't mean your AI will behave safely in production. As healthcare AI takes on more complex roles in patient care and documentation, static compliance frameworks can't keep up with the behavioral risks that emerge in real-world systems. Here's how WonderSuite closes the gap.
Afraid AI Will Replace You? Here's the One Skill It Can't
James Villarrubia went from building AI for NASA's drone and aerospace programs to becoming CTO of a travel tech company. In this episode, he and Mo get into why curiosity might be the most important skill in the AI era, what happens to our brains when we stop pushing back on the answers we get, and why the people most resistant to AI might actually be seeing something the rest of us are missing.
It Takes AI to Break AI: The Case for AI Red Teaming
As AI systems gain autonomy, organizations need security approaches built specifically for AI behavior. Learn why AI-driven red teaming is becoming a critical defense layer.
Evaluation of Instagram Teen Accounts
This report evaluates default and opt-in content protections under real-world and adversarial conditions. The study examines safeguard effectiveness, resilience against attempts to surface inappropriate content, and platform improvements made following testing.

