ActiveFence is now Alice
x
Back
Blog

Introducing Guardrails Trained for Your Policies

Dean Issacharoff
-
May 13, 2026

TL;DR

WonderFence can now train a personalized per-policy guardrail, using real-world adversarial data collected from years of protecting the world's biggest tech platforms. Deployed in WonderFence at sub-99ms latency, giving you guardrails shaped by your policy and not the other way around.

Generic guardrails can prevent some unwanted agent behavior in client interactions, but data leaks, compliance violations, and off-brand responses still slip through. That happens because out-of-the-box guardrails fall short on the two factors that determine effectiveness: the quality of their training data, and how closely their policies match your needs.

WonderFence fills those gaps by training a personalized guardrail for each policy, using real-world adversarial data Alice has collected across industries, languages, and threat actor communities. This way, guardrails are shaped by your policies, instead of policies being shaped by the available guardrails.

Wonderfence Guardrails: Customize Your Policies

Guardrails Trained on Real Threats for Real Policies

WonderFence allows you to train policy-specific guardrails within hours. Just configure your policy definition and upload a few labeled examples of wanted and unwanted behavior. We will take it from there, training the detector models inside your guardrails on real-world adversarial examples from Rabbit Hole, our collection of adversarial data curated from years of protecting the biggest tech platforms in the world. This ensures accurate policy detection without overreach or false positives.

For example, a retailer might define a policy that agents never knock competitors. Upload a few examples of on-brand and off-brand responses, and WonderFence produces a detector tuned to that exact line.

Wonderfence Guardrails: Block Keywords

Continuous Protection for Reliable Behavior

The detectors trained by WonderFence are then deployed alongside your predefined WonderFence guardrails. These evaluate every step agents take. In block mode, malicious prompts are intercepted, internal actions are monitored, and inappropriate outputs are replaced with predefined responses in line with your policies. In detect-only mode, the same policies flag interactions for internal review without interrupting users, useful for monitoring, testing, and gradual rollout.

Policies can run on incoming prompts, outgoing responses, or both, and apply to one agent or many. All this at sub-100ms latency, so user experience stays intact and you get confidence in agents you can count on with your customers.

For instance, a customer-facing banking agent demands strict controls on financial guidance and layered regulatory compliance. A scheduling assistant, by contrast, has a far narrower scope but still needs to handle personal information correctly. WonderFence lets you set different policies for each agent and each stage of the interaction.

Ship With Confidence

WonderFence complements your existing guardrails with another layer of personalized protection. With fit-to-policy guardrails trained on real-world adversarial data collected during years of protecting the world's biggest tech platforms. WonderFence joins WonderSuite alongside automated red-teaming, continuous preset guardrails, and more to give you the confidence to deploy consumer-facing AI.

Book a Demo

Learn more
Share

What’s New from Alice

Introducing Guardrails Trained for Your Policies

blog
May 13, 2026
,
 
May 13, 2026
 -
3
 min read
May 13, 2026

Generic guardrails weren't built for your policies. WonderFence trains a custom detector for each one, using adversarial data from years of protecting the world's largest tech platforms, so you can deploy consumer-facing AI without compromise.

Learn More

Building AI Applications in Financial Services

whitepaper
Apr 27, 2026
,
 
Apr 27, 2026
 -
This is some text inside of a div block.
 min read
April 27, 2026

A practical guide to building safe, compliant AI applications in financial services, covering governance, model risk, and regulatory obligations across the full development lifecycle.

Learn More