The LLM Safety Review: Benchmarks & Analysis
As GenAI tools and the LLMs behind them impact the daily lives of billions, this report examines whether these technologies can be trusted to keep users safe.
What you’ll learn:
- How LLMs respond to risky prompts from bad actors and vulnerable users
- Where current models show safety strengths and weaknesses
- Actionable steps to improve LLM safety and reduce harmful outcomes

Overview
In this first independent benchmarking report on the LLM safety landscape, ActiveFence’s subject-matter experts put leading models to the test. More than 20,000 prompts were used to analyze how six LLMs respond across seven major languages and four high-risk abuse areas: child exploitation, hate speech, self-harm, and misinformation. The report provides comparative insight into each model’s relative safety strengths and weaknesses, helping teams understand where gaps exist and where additional resources may be required.
Download the Full Report
What’s New from Alice
"Okay, Here is How to Build a Bomb": Millions Download Dangerous LLMs
Thousands of abliterated LLMs have flooded open-source platforms with millions of downloads. These models comply with virtually any request, from bomb-making to malware, and run fully offline on consumer devices.
Your LLM Has No Idea What It's Doing
Diana Kelley, CISO at Noma Security and former Cybersecurity CTO at Microsoft, joins Mo to work through the real mechanics of LLM risk: why the context window flattens the trust boundary between system instructions and user data, why that makes reliable internal guardrails essentially impossible, and why agentic AI is less a new threat category and more a stress test for the hygiene debt organizations never fully paid off.
Distilling LLMs into Efficient Transformers for Real-World AI
This technical webinar explores how we distilled the world knowledge of a large language model into a compact, high-performing transformer—balancing safety, latency, and scale. Learn how we combine LLM-based annotations and weight distillation to power real-world AI safety.
