ActiveFence is now Alice
x
Back
Benchmark

Alice Financial Benchmark

We put GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro through 126 realistic financial conversations. No jailbreaks, no adversarial prompts, just the kind of pressure a hurried client might naturally apply. By the seventh exchange, all three were naming specific stocks, issuing transaction instructions, and/or dropping their disclaimers. Your regulator won't care that the model's own policy prohibited it. Download the benchmark to see exactly where each model fails and what you need in place before your next client-facing deployment.

Apr 16, 2026

Overview

In this report, you'll learn:

  • Where each model breaks: GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro each have a distinct vulnerability profile, and knowing which pressure type triggers yours lets you build the right layer of protection before deployment
  • Why model-level guardrails aren't enough: Policy violations occurred consistently in multi-turn scenarios under realistic, non-adversarial conversations, meaning your standard pre-launch testing won't catch them
  • How to stress-test, protect, and monitor your deployment: With red-teaming, runtime guardrails, and continuous post-launch monitoring you can move forward in financial AI with confidence

Use this benchmark to close the gap between your AI's stated policies and what it actually does when a client pushes back. Download it now and give your compliance, legal, and product teams the evidence they need to act.

Download the Full Report

What’s New from Alice

Secure AWS Strands Agents with Alice WonderFence

blog
Mar 29, 2026
,
 
Mar 29, 2026
 -
9
 min read
March 29, 2026

Learn how to use Strands Agent hooks to enforce safety and security policies with Alice WonderFence in production-oriented agent workflows.

Learn More

Alice Financial Benchmark

whitepaper
Apr 16, 2026
,
 
Apr 16, 2026
 -
This is some text inside of a div block.
 min read
April 16, 2026

See which models tested gave unauthorized financial advice with no jailbreak needed. Get the benchmark and protect your deployment.

Learn More

Secure the keys to GenAI wonderland?

Get a demo
Intelligence Desk
Red-Team Lab
Guardrails