TL;DR
WonderFence can now train a personalized per-policy guardrail, using real-world adversarial data collected from years of protecting the world's biggest tech platforms. Deployed in WonderFence at sub-99ms latency, giving you guardrails shaped by your policy and not the other way around.
Generic guardrails can prevent some unwanted agent behavior in client interactions, but data leaks, compliance violations, and off-brand responses still slip through. That happens because out-of-the-box guardrails fall short on the two factors that determine effectiveness: the quality of their training data, and how closely their policies match your needs.
WonderFence fills those gaps by training a personalized guardrail for each policy, using real-world adversarial data Alice has collected across industries, languages, and threat actor communities. This way, guardrails are shaped by your policies, instead of policies being shaped by the available guardrails.

Guardrails Trained on Real Threats for Real Policies
WonderFence allows you to train policy-specific guardrails within hours. Just configure your policy definition and upload a few labeled examples of wanted and unwanted behavior. We will take it from there, training the detector models inside your guardrails on real-world adversarial examples from Rabbit Hole, our collection of adversarial data curated from years of protecting the biggest tech platforms in the world. This ensures accurate policy detection without overreach or false positives.
For example, a retailer might define a policy that agents never knock competitors. Upload a few examples of on-brand and off-brand responses, and WonderFence produces a detector tuned to that exact line.

Continuous Protection for Reliable Behavior
The detectors trained by WonderFence are then deployed alongside your predefined WonderFence guardrails. These evaluate every step agents take. In block mode, malicious prompts are intercepted, internal actions are monitored, and inappropriate outputs are replaced with predefined responses in line with your policies. In detect-only mode, the same policies flag interactions for internal review without interrupting users, useful for monitoring, testing, and gradual rollout.
Policies can run on incoming prompts, outgoing responses, or both, and apply to one agent or many. All this at sub-100ms latency, so user experience stays intact and you get confidence in agents you can count on with your customers.
For instance, a customer-facing banking agent demands strict controls on financial guidance and layered regulatory compliance. A scheduling assistant, by contrast, has a far narrower scope but still needs to handle personal information correctly. WonderFence lets you set different policies for each agent and each stage of the interaction.
Ship With Confidence
WonderFence complements your existing guardrails with another layer of personalized protection. With fit-to-policy guardrails trained on real-world adversarial data collected during years of protecting the world's biggest tech platforms. WonderFence joins WonderSuite alongside automated red-teaming, continuous preset guardrails, and more to give you the confidence to deploy consumer-facing AI.
Book a Demo
Learn moreWhat’s New from Alice
Introducing Guardrails Trained for Your Policies
Generic guardrails weren't built for your policies. WonderFence trains a custom detector for each one, using adversarial data from years of protecting the world's largest tech platforms, so you can deploy consumer-facing AI without compromise.
What Does It Actually Take to Build Unbiased AI?
Nobody told Tennisha Martin the importance of having a mentor, so she built a community of tens of thousands instead. As the Founder and Chairwoman of BlackGirlsHack, her whole mission has been making sure nobody else has to figure it out alone. In this episode, she and Mo get into AI bias, why it's already showing up in places that matter far beyond tech, and why the real fix starts with getting the right people in the room when these systems get built.
Distilling LLMs into Efficient Transformers for Real-World AI
This technical webinar explores how we distilled the world knowledge of a large language model into a compact, high-performing transformer—balancing safety, latency, and scale. Learn how we combine LLM-based annotations and weight distillation to power real-world AI safety.
Building AI Applications in Financial Services
A practical guide to building safe, compliant AI applications in financial services, covering governance, model risk, and regulatory obligations across the full development lifecycle.

