Alice Financial Benchmark
We put GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro through 126 realistic financial conversations. No jailbreaks, no adversarial prompts, just the kind of pressure a hurried client might naturally apply. By the seventh exchange, all three were naming specific stocks, issuing transaction instructions, and/or dropping their disclaimers. Â Download the benchmark to see exactly where each model fails and what you need in place before your next client-facing deployment.

What’s New from Alice
Curiouser Soundbites: The AI Risk Debt Your Enterprise Is Already Carrying
Chances are your enterprise AI is moving a lot faster than your visibility into it and Alison Cossette has a lot to say about that. She joined Mo on Curiouser & Curiouser to get into the risk debt that's quietly building inside agentic systems, why observability and traceability aren't optional anymore, and what leaders actually need to do about it.
The Problem With AI Observability Nobody Wants To Admit
Most enterprises have guardrails. Far fewer have visibility into what their AI is actually doing. Alison Cossette, Founder and CEO of ClariTrace, joins Mo to talk about the risk debt quietly building inside agentic systems, why observability and traceability aren't optional anymore, and what leaders need to put in place before something forces their hand.
Distilling LLMs into Efficient Transformers for Real-World AI
This technical webinar explores how we distilled the world knowledge of a large language model into a compact, high-performing transformer—balancing safety, latency, and scale. Learn how we combine LLM-based annotations and weight distillation to power real-world AI safety.
Beneath the Surface: The Growing Ecosystem of AI Nudification
Alice analyzed 100 AI nudification websites to uncover how synthetic NCII ecosystems scale through frictionless onboarding, affiliate monetization, and cross-platform distribution.
