ActiveFence is now Alice

Blog

Introducing Guardrails Trained for Your Policies

Dean Issacharoff

May 13, 2026

TL;DR

WonderFence can now train a personalized per-policy guardrail, using real-world adversarial data collected from years of protecting the world's biggest tech platforms. Deployed in WonderFence at sub-99ms latency, giving you guardrails shaped by your policy and not the other way around.

Generic guardrails can prevent some unwanted agent behavior in client interactions, but data leaks, compliance violations, and off-brand responses still slip through. That happens because out-of-the-box guardrails fall short on the two factors that determine effectiveness: the quality of their training data, and how closely their policies match your needs.

WonderFence fills those gaps by training a personalized guardrail for each policy, using real-world adversarial data Alice has collected across industries, languages, and threat actor communities. This way, guardrails are shaped by your policies, instead of policies being shaped by the available guardrails.

Wonderfence Guardrails: Customize Your Policies

Guardrails Trained on Real Threats for Real Policies

WonderFence allows you to train policy-specific guardrails within hours. Just configure your policy definition and upload a few labeled examples of wanted and unwanted behavior. We will take it from there, training the detector models inside your guardrails on real-world adversarial examples from Rabbit Hole, our collection of adversarial data curated from years of protecting the biggest tech platforms in the world. This ensures accurate policy detection without overreach or false positives.

For example, a retailer might define a policy that agents never knock competitors. Upload a few examples of on-brand and off-brand responses, and WonderFence produces a detector tuned to that exact line.

Continuous Protection for Reliable Behavior

The detectors trained by WonderFence are then deployed alongside your predefined WonderFence guardrails. These evaluate every step agents take. In block mode, malicious prompts are intercepted, internal actions are monitored, and inappropriate outputs are replaced with predefined responses in line with your policies. In detect-only mode, the same policies flag interactions for internal review without interrupting users, useful for monitoring, testing, and gradual rollout.

Policies can run on incoming prompts, outgoing responses, or both, and apply to one agent or many. All this at sub-100ms latency, so user experience stays intact and you get confidence in agents you can count on with your customers.

For instance, a customer-facing banking agent demands strict controls on financial guidance and layered regulatory compliance. A scheduling assistant, by contrast, has a far narrower scope but still needs to handle personal information correctly. WonderFence lets you set different policies for each agent and each stage of the interaction.

Ship With Confidence

WonderFence complements your existing guardrails with another layer of personalized protection. With fit-to-policy guardrails trained on real-world adversarial data collected during years of protecting the world's biggest tech platforms. WonderFence joins WonderSuite alongside automated red-teaming, continuous preset guardrails, and more to give you the confidence to deploy consumer-facing AI.

Book a Demo

Learn more

What’s New from Alice

AI in Finance: From Money Laundering to Deepfakes

Dr. Janet Bastiman has been making convincing deepfakes since 2017, long before most people knew the word. Now the Chief Data Scientist at Napier AI, she joins Mo to get into why fraud is actually easier to catch than money laundering, how a deepfake already talked a finance team out of millions, and why the human analysts checking AI matter more than ever.

Listen Now

It Takes AI to Break AI: The Case for AI Red Teaming

webinar

May 25, 2026

This is some text inside of a div block.

min read

May 25, 2026

This is some text inside of a div block.

min watch

As AI systems gain autonomy, organizations need security approaches built specifically for AI behavior. Learn why AI-driven red teaming is becoming a critical defense layer.

Learn More

Evaluation of Instagram Teen Accounts

whitepaper

Jun 1, 2026

This is some text inside of a div block.

min read

Jun 1, 2026

This is some text inside of a div block.

min watch

This report evaluates default and opt-in content protections under real-world and adversarial conditions. The study examines safeguard effectiveness, resilience against attempts to surface inappropriate content, and platform improvements made following testing.

Learn More

Introducing Guardrails Trained for Your Policies

Table of Contents

TL;DR

WonderFence can now train a personalized per-policy guardrail, using real-world adversarial data collected from years of protecting the world's biggest tech platforms. Deployed in WonderFence at sub-99ms latency, giving you guardrails shaped by your policy and not the other way around.

Guardrails Trained on Real Threats for Real Policies

Continuous Protection for Reliable Behavior

Ship With Confidence

Book a Demo

What’s New from Alice

Policy Once, Enforced Everywhere: Alice WonderFence Joins Databricks Unity AI Gateway

AI in Finance: From Money Laundering to Deepfakes

It Takes AI to Break AI: The Case for AI Red Teaming

Evaluation of Instagram Teen Accounts