Benchmark

The LLM Safety Review: Benchmarks & Analysis

As GenAI tools and the LLMs behind them impact the daily lives of billions, this report examines whether these technologies can be trusted to keep users safe.

What you’ll learn:

How LLMs respond to risky prompts from bad actors and vulnerable users
Where current models show safety strengths and weaknesses
Actionable steps to improve LLM safety and reduce harmful outcomes

Aug 1, 2023

Download the Full Report

Overview

In this first independent benchmarking report on the LLM safety landscape, ActiveFence’s subject-matter experts put leading models to the test. More than 20,000 prompts were used to analyze how six LLMs respond across seven major languages and four high-risk abuse areas: child exploitation, hate speech, self-harm, and misinformation. The report provides comparative insight into each model’s relative safety strengths and weaknesses, helping teams understand where gaps exist and where additional resources may be required.

What’s New from Alice

Making Sense of AI: Trust, Scale, and the Human Role

podcast

February 4, 2026

min read

Curiosity might be our most important security tool. In the first episode of Curiouser & Curiouser, Mo Sadek sits down with longtime security leader Julie Tsai to explore AI, security, and the human judgment that still matters most. Together, they cut through hype and fear to talk about what’s actually changing, what isn’t, and how we build systems we can truly trust.

Listen Now

Distilling LLMs into Efficient Transformers for Real-World AI

webinar

Sep 25, 2025

This is some text inside of a div block.

min read

This technical webinar explores how we distilled the world knowledge of a large language model into a compact, high-performing transformer—balancing safety, latency, and scale. Learn how we combine LLM-based annotations and weight distillation to power real-world AI safety.

Learn More